Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Browsing and Analysis of Web Elements

An Incremental Document Clustering Algorithm Based on a Hierarchical Agglomerative Approach

Kil Hong JooContact Information and SooJung LeeContact Information

(1)  Dept. of Computer Education, Gyeongin National University of Education, Gyodae Street, 45, Gyeyang-gu, Inchon, 407-753, Korea
Abstract
Document clustering is classifying a data set of documents into groups of closely related documents, so that its resulting clusters can be used in browsing and searching the documents of a specific topic. In most cases of such as application, a set of new documents are incrementally added to the data set and there can be a large variation in the number of words in each document. This paper proposes an incremental document clustering method for an incrementally increasing data set of documents. The normalized inverse document frequency of a word in the data set is introduced to cope with the variation of the number of words in each document. Furthermore, an average link method for document clustering instead of using one similarity measure used in two similarity measures: a cluster cohesion rate and a cluster participation rate. Furthermore, a category tree for a set of identified clusters is introduced to assist the incremental document clustering of newly added documents. In this paper, the performance of the proposed method is analyzed by a series of experiments to identify their various characteristics.

Contact Information Kil Hong Joo
Email: khjoo@ginue.ac.kr

Contact Information SooJung Lee
Email: sjlee@ginue.ac.kr
Fulltext Preview (Small, Large)
Image of the first page of the fulltext


Export this chapter
Export this chapter as RIS | Text
 
Referenced by
1 newer article

  1. Dai, Weidi (2008) Document clustering based on constructing density tree?. Transactions of Tianjin University 14(1)
    [CrossRef]
Remote Address: 38.107.191.110 • Server: mpweb20
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)