In this paper, we present a three tier clustering method where data objects are described by a number of feature dimensions.
Using the approach, similarity along each feature dimension of objects are first computed. The inter-objects similarity are
then computed from inter-feature-dimension similarity using a Bayesian multi-causal model. Objects are finally clustered based
on the computed similarity. An online citation entry clustering system was built using the approach. It accepts user queries
in the form of name of authors. Such queries are sent to citation/bibliography search engines. The returned entries are clustered
based on feature dimensions such as authors, title, place of publication, etc. After clustering, entries from different authors
with the similar name form different clusters, that are presented to the user. Preliminary experiment results indicated the
effectiveness of the proposed clustering approach. The architecture of three-tire clustering framework, feature representation
of a citation entry, a brief network model for inter-object similarity computation, and a special cluster evaluation technique
are discussed in detail.
This work is partially supported by a grant from the Research Grant Council of the Hong Kong Special Administrative Region,
China (AOE97/98.EG05) and a grant from the National 973 project of China (No. G1998030414)