View Related Documents

Abstract

In this paper, we present a three tier clustering method where data objects are described by a number of feature dimensions. Using the approach, similarity along each feature dimension of objects are first computed. The inter-objects similarity are then computed from inter-feature-dimension similarity using a Bayesian multi-causal model. Objects are finally clustered based on the computed similarity. An online citation entry clustering system was built using the approach. It accepts user queries in the form of name of authors. Such queries are sent to citation/bibliography search engines. The returned entries are clustered based on feature dimensions such as authors, title, place of publication, etc. After clustering, entries from different authors with the similar name form different clusters, that are presented to the user. Preliminary experiment results indicated the effectiveness of the proposed clustering approach. The architecture of three-tire clustering framework, feature representation of a citation entry, a brief network model for inter-object similarity computation, and a special cluster evaluation technique are discussed in detail.
This work is partially supported by a grant from the Research Grant Council of the Hong Kong Special Administrative Region, China (AOE97/98.EG05) and a grant from the National 973 project of China (No. G1998030414)

Fulltext Preview

Image of the first page of the fulltext document