Lecture Notes in Computer Science, 2001, Volume 2101/2001, 63-66, DOI: 10.1007/3-540-48229-6_9

Improving Identification of Difficult Small Classes by Balancing Class Distribution

Jorma Laurikkala

View Related Documents

Abstract

We studied three methods to improve identification of difficult small classes by balancing imbalanced class distribution with data reduction. The new method, neighborhood cleaning rule (NCL), outperformed simple random and one-sided selection methods in experiments with ten data sets. All reduction methods improved identification of small classes (20–30%), but the differences were insignificant. However, significant differences in accuracies, true-positive rates and true-negative rates obtained with the 3-nearest neighbor method and C4.5 from the reduced data favored NCL. The results suggest that NCL is a useful method for improving the modeling of difficult small classes, and for building classifiers to identify these classes from the real-world data.

Fulltext Preview

Image of the first page of the fulltext document