In many domains, the data objects are described in terms of a large number of features. The pipelined data mining approach
introduced in [1] using two clustering algorithms in combination with rough sets and extended with genetic programming, is
investigated with the purpose of discovering important subsets of attributes in high dimensional data. Their classification
ability is described in terms of both collections of rules and analytic functions obtained by genetic programming (gene expression
programming). The Leader and several k-means algorithms are used as procedures for attribute set simplification of the information
systems later presented to rough sets algorithms. Visual data mining techniques including virtual reality were used for inspecting
results. The data mining process is setup using high throughput distributed computing techniques. This approach was applied
to Breast Cancer microarray data and it led to subsets of genes with high discrimination power with respect to the decision
classes.
Keywords Clustering - rough sets - reducts - rules - cross-validation - gene expression programming - virtual reality - grid computing - breast cancer - microarray data