Clustering a document collection is the current approach to automatically derive underlying document categories. The categorization
performance of a document clustering algorithm can be captured by the F-Measure, which quantifies how close a human-defined categorization has been resembled.
However, a bad F-Measure value tells us nothing about the reason why a clustering algorithm performs poorly. Among several possible explanations
the most interesting question is the following: Are the implicit assumptions of the clustering algorithm admissible with respect
to a document categorization task?
Though the use of clustering algorithms for document categorization is widely accepted, no foundation or rationale has been
stated for this admissibility question. The paper in hand is devoted to this gap. It presents considerations and a measure
to quantify the sensibility of a clustering process with regard to geometric distortions of the data space. Along with the
method of multidimensional scaling, this measure provides an instrument for accessing a clustering algorithm’s adequacy.
Keywords Document Categorization - Clustering -
F-Measure - Multidimensional Scaling - Information Visualization