Lecture Notes in Computer Science, 2002, Volume 2453/2002, 487-566, DOI: 10.1007/3-540-46146-9_89

On Combining Link and Contents Information for Web Page Clustering

Yitong Wang and Masaru Kitsuregawa

View Related Documents

Abstract

Clustering is currently one of the most crucial techniques for dealing (e.g. resources locating, information interpreting) with massive amount of heterogeneous information on the web, which is beyond human being’s capacity to digest. In this paper, we discuss the shortcomings of pervious approaches and present a unifying clustering algorithm to cluster web search results for a specific query topic by combining link and contents information. Especially, we investigate how to combine link and contents analysis in clustering process to improve the quality and interpretation of web search results.The proposed approach automatically clusters the web search results into high quality, semantically meaningful groups in a concise, easy-to-interpret hierarchy with tagging terms. Preliminary experiments and evaluations are conducted and the experimental results show that the proposed approach is effective and promising. Keywords: co-citation, coupling, anchor window, snippet

Fulltext Preview

Image of the first page of the fulltext document