Lecture Notes in Computer Science, 2008, Volume 5179/2008, 755-763, DOI: 10.1007/978-3-540-85567-5_94

A Fuzzy Extension of Some Classical Concordance Measures and an Efficient Algorithm for Their Computation

Michele Ceccarelli and Antonio Maratea

View Related Documents

Abstract

Many indexes have been proposed in literature for the comparison of two crisp data partitions, as resulting from two different classifications attempts, two different clustering solutions or the comparison of a predicted vs. a true labeling. Most of these indexes implementations have a computational cost of O(N 2) (where N is the number of data points) and this fact may limit their usage in very big datasets or their integration in computational-intensive validation strategies. Furthermore, their extension to fuzzy partitions is not obvious. In this paper we analyze efficient algorithms to compute many classical indexes (most notably the Jaccard coefficient and the Rand index) in O(d 2 + N) (where d is the number of different classes/clusters) and propose a straightforward procedure to extend their computation to fuzzy partitions. The fuzzy extension is based on a pseudo-count concept and provides a natural framework for including memberships in computation of binary similarity indexes, not limited to the ones here revised. Results on simulated data using the Jaccard coefficient highlight a higher consistence of its proposed fuzzy extension with respect to its crisp counterpart.

Keywords  Cluster Stability - Concordance Measure - Validity Index - Efficient Algorithm

Fulltext Preview

Image of the first page of the fulltext document