Recent times have seen an explosive growth in the availability of various kinds of data. It has resulted in an unprecedented
opportunity to develop automated data-driven techniques of extracting useful knowledge. Data mining, an important step in
this process of knowledge discovery, consists of methods that discover interesting, non-trivial, and useful patterns hidden
in the data [SAD+93, CHY96]. The field of data mining builds upon the ideas from diverse fields such as machine learning,
pattern recognition, statistics, database systems, and data visualization. But, techniques developed in these traditional
disciplines are often unsuitable due to some unique characteristics of today’s data-sets, such as their enormous sizes, high-dimensionality,
and heterogeneity. There is a necessity to develop effective parallel algorithms for various data mining techniques. However,
designing such algorithms is challenging, and the main focus of the paper is a description of the parallel formulations of
two important data mining algorithms: discovery of association rules, and induction of decision trees for classification.
We also briefly discuss an application of data mining to the analysis of large data sets collected by Earth observing satellites
that need to be processed to better understand global scale changes in biosphere processes and patterns.
This work was supported by NSF CCR-9972519, by NASA grant # NCC 2 1231, by Army Research Office contract DA/DAAG55-98-1-0441,
by the DOE grant LLNL/DOE B347714, and by Army High Performance Computing Research Center cooperative agreement number DAAD19-01-2-0014.
Access to computing facilities was provided by AHPCRC and the Minnesota Supercomputer Institute. Related papers are available
via WWW at URL: http://www.cs.umn.edu/~Rkumar.