Volume 8, Number 2, 331-349, DOI: 10.1007/s10791-005-5666-8

Data Driven Similarity Measures for k-Means Like Clustering Algorithms

Jacob Kogan, Marc Teboulle and Charles Nicholas

View Related Documents

Abstract

We present an optimization approach that generates k-means like clustering algorithms. The batch k-means and the incremental k-means are two well known versions of the classical k-means clustering algorithm (Duda et al. 2000). To benefit from the speed of the batch version and the accuracy of the incremental version we combine the two in a ldquoping–pongrdquo fashion. We use a distance-like function that combines the squared Euclidean distance with relative entropy. In the extreme cases our algorithm recovers the classical k-means clustering algorithm and generalizes the Divisive Information Theoretic clustering algorithm recently reported independently by Berkhin and Becher (2002) and Dhillon1 et al. (2002). Results of numerical experiments that demonstrate the viability of our approach are reported.

Keywords  clustering algorithms - optimization - entropy

This research was supported in part by the US Department of Defense, the United States–Israel Binational Science Foundation (BSF), and Northrop Grumman Mission Systems (NG/MS).

Fulltext Preview

Image of the first page of the fulltext document