Abstract In the setting of multi-instance learning, each object is represented by a
bag composed of multiple instances instead of by a single instance in a traditional learning setting. Previous works in this
area only concern multi-instance
prediction problems where each bag is associated with a binary (classification) or real-valued (regression) label. However,
unsupervised multi-instance learning where bags are without labels has not been studied. In this paper, the problem of unsupervised multi-instance
learning is addressed where a multi-instance clustering algorithm named
Bamic is proposed. Briefly, by regarding bags as atomic data items and using some form of distance metric to measure distances
between bags,
Bamic adapts the popular
k
-Medoids algorithm to partition the unlabeled training bags into
k disjoint
groups of bags. Furthermore, based on the clustering results, a novel multi-instance prediction algorithm named
Bartmip is developed. Firstly, each bag is re-represented by a
k-dimensional feature vector, where the value of the
i-th feature is set to be the distance between the bag and the medoid of the
i-th group. After that, bags are transformed into feature vectors so that common supervised learners are used to learn from
the transformed feature vectors each associated with the original bag’s label. Extensive experiments show that
Bamic could effectively discover the underlying structure of the data set and
Bartmip works quite well on various kinds of multi-instance prediction problems.