Most structured data in real-life applications are stored in relational databases containing multiple semantically linked
relations. Unlike clustering in a single table, when clustering objects in relational databases there are usually a large
number of features conveying very different semantic information, and using all features indiscriminately is unlikely to generate
meaningful results. Because the user knows her goal of clustering, we propose a new approach called C
rossC
lus, which performs multi-relational clustering under user’s guidance. Unlike semi-supervised clustering which requires the user
to provide a training set, we minimize the user’s effort by using a very simple form of user guidance. The user is only required
to select one or a small set of features that are pertinent to the clustering goal, and C
rossC
lus searches for other pertinent features in multiple relations. Each feature is evaluated by whether it clusters objects in
a similar way with the user specified features. We design efficient and accurate approaches for both feature selection and
object clustering. Our comprehensive experiments demonstrate the effectiveness and scalability of C
rossC
lus.
Keywords Relational data mining - Clustering
Responsible editor: Eamonn Keogh.
The work was supported in part by the U.S. National Science Foundation NSF IIS-03-13678 and NSF BDI-05-15813, and an IBM Faculty
Award. Any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do
not necessarily reflect views of the funding agencies.