View Related Documents

Abstract

Automatic document categorization plays a key role in the development of future interfaces for Web-based search. Clustering algorithms are considered as a technology that is capable of mastering this “ad-hoc” categorization task. This paper presents results of a comprehensive analysis of clustering algorithms in connection with document categorization. The contributions relate to exemplarbased, hierarchical, and density-based clustering algorithms. In particular, we contrast ideal and real clustering settings and present runtime results that are based on efficient implementations of the investigated algorithms.

Keywords  Document Categorization - Clustering - Clustering Quality Measures - Information Retrieval

Fulltext Preview

Image of the first page of the fulltext document