To satisfy the ever growing need for effective screening and diagnostic tests, medical practitioners have turned their attention
to high resolution, high throughput methods. One approach is to use mass spectrometry based methods for disease diagnosis.
Effective diagnosis is achieved by classifying the mass spectra as belonging to healthy or diseased individuals. Unfortunately,
the high resolution mass spectrometry data contains a large degree of noisy, redundant and irrelevant information, making
accurate classification difficult. To overcome these obstacles, feature extraction methods are used to select or create small
sets of relevant features. This paper compares existing feature selection methods to a novel wrapper-based feature selection
and centroid-based classification method. A key contribution is the exposition of different feature extraction techniques,
which encompass dimensionality reduction and feature selection methods. The experiments, on two cancer data sets, indicate
that feature selection algorithms tend to both reduce data dimensionality and increase classification accuracy, while the
dimensionality reduction techniques sacrifice performance as a result of lowering the number of features. In order to evaluate
the dimensionality reduction and feature selection techniques, we use a simple classifier, thereby making the approach tractable.
In relation to previous research, the proposed algorithm is very competitive in terms of (i) classification accuracy, (ii)
size of feature sets, (iii) usage of computational resources during both training and classification phases.
Keywords feature extraction - classification - mining bio-medical data - mass spectrometry - dimensionality reduction