In this paper we applied multilabel classification algorithms to the EUR-Lex database of legal documents of the European Union.
On this document collection, we studied three different multilabel classification problems, the largest being the categorization
into the EUROVOC concept hierarchy with almost 4000 classes. We evaluated three algorithms: (i) the binary relevance approach
which independently trains one classifier per label; (ii) the multiclass multilabel perceptron algorithm, which respects dependencies
between the base classifiers; and (iii) the multilabel pairwise perceptron algorithm, which trains one classifier for each
pair of labels. All algorithms use the simple but very efficient perceptron algorithm as the underlying classifier, which
makes them very suitable for large-scale multilabel classification problems. The main challenge we had to face was that the
almost 8,000,000 perceptrons that had to be trained in the pairwise setting could no longer be stored in memory. We solve
this problem by resorting to the dual representation of the perceptron, which makes the pairwise approach feasible for problems
of this size. The results on the EUR-Lex database confirm the good predictive performance of the pairwise approach and demonstrates
the feasibility of this approach for large-scale tasks.