Human adaptive immune response relies on the recognition of short peptides through proteins of the major histocompatibility
complex (MHC). MHC class II molecules are responsible for the recognition of antigens external to a cell. Understanding their
specificity is an important step in the design of peptide-based vaccines. The high degree of polymorphism in MHC class II
makes the prediction of peptides that bind (and then usually cause an immune response) a challenging task. Typically, these
predictions rely on machine learning methods, thus a sufficient amount of data points is required. Due to the scarcity of
data, currently there are reliable prediction models only for about 7% of all known alleles available.
We show how to transform the problem of MHC class II binding peptide prediction into a well-studied machine learning problem
called multiple instance learning. For alleles with sufficient data, we show how to build a well-performing predictor using
standard kernels for multiple instance learning. Furthermore, we introduce a new method for training a classifier of an allele
without the necessity for binding allele data of the target allele. Instead, we use binding peptide data from other alleles
and similarities between the structures of the MHC class II alleles to guide the learning process. This allows for the first
time constructing predictors for about two thirds of all known MHC class II alleles. The average performance of these predictors
on 14 test alleles is 0.71, measured as area under the ROC curve.