With the rapid development of on-line information services, information technologies for on-line information processing have
been receiving much attention recently. Clustering plays important roles in various on-line applications such as extraction
of useful information from news feeding services and selection of relevant documents from the incoming scientific articles
in digital libraries. In on-line environments, users generally have interests on newer documents than older ones and have
no interests on obsolete old documents.
Based on this observation, we propose an on-line document clustering method F
2ICM (Forgetting-Factor-based Incremental Clustering Method) that incorporates the notion of a forgetting factor to calculate document similarities. The idea is that every document gradually losses its weight (or memory) as time passes
according to this factor. Since F2ICM generates clusters using a document similarity measure based on the forgetting factor, newer documents have much effects
on the resulting cluster structure than older ones. In this paper, we present the fundamental idea of the F2ICM method and describe its details such as the similarity measure and the clustering algorithm. Also, we show an efficient
incremental statistics maintenance method of F2ICM which is indispensable for on-line dynamic environments.
Keywords: clustering - on-line information processing - incremental algorithms - forgetting factors