One problem in many fields is knowledge discovery in heterogeneous, high-dimensional data. As an example, in text mining an
analyst often wishes to identify meaningful, implicit, and previously unknown information in an unstructured corpus. Lack
of metadata and the complexities of document space make this task difficult. We describe Iterative Denoising, a methodology
for knowledge discovery in large heterogeneous datasets that allows a user to visualize and to discover potentially meaningful
relationships and structures. In addition, we demonstrate the features of this methodology in the analysis of a heterogeneous
Science News corpus.
Keywords Knowledge discovery - Text mining - Classification - Clustering