In microarray data, clustering is the fundamental task for separating genes into biologically functional groups or for classifying
tissues and phenotypes. Recently, with innovative gene expression microarray data technologies, thousands of expression levels
of genes (features) can be measured simultaneously in a single experiment. The large number of genes with a lot of noise causes
high complexity for cluster analysis. This challenge has raised the demand for feature selection – an effective dimensionality
reduction technique that removes noisy features. In this paper we propose a novel filter method for feature selection. The
suggested method, called ClosestFS, is based on a distance measure. For each feature, the distance is evaluated by computing
its impact on the histogram for the whole data. Our experimental results show that the quality of clustering results (evaluated
by several widely used measures) of K-means algorithm using ClosestFS as the pre-processing step is significantly better than
that of the pure K-means.
Keywords Feature Selection - Clustering - Distance Function - Microarray Data