Lecture Notes in Computer Science, 2002, Volume 2557/2002, 143-154, DOI: 10.1007/3-540-36187-1_13

MML Clustering of Continuous-Valued Data Using Gaussian and t Distributions

Yudi Agusta and David L. Dowe

View Related Documents

Abstract

Clustering, also known as mixture modelling or intrinsic classification, is the problem of identifying and modelling components (or clusters, or classes) in a body of data. We consider here the application of the Minimum Message Length (MML) principle to a clustering problem of Gaussian and t distributions. Earlier work in the MML clustering was conducted in regards to the multinomial and Gaussian distributions (Wallace and Boulton, 1968) and in addition, the von Mises circular and Poisson distributions (Wallace and Dowe, 1994, 2000). Our current work extends this by applying the Gaussian distribution to the more general t distribution. Point estimation of the t distribution is performed using the MML approximation proposed by Wallace and Freeman (1987). A comparison of the MML estimations of the t distribution to those of the Maximum Likelihood (ML) method in terms of their Kullback-Leibler (KL) distances is also provided. Within each component, our application also performs a model selection on whether a particular group of data is best modelled as a Gaussian or a t distribution. The proposed modelling method is then applied to several artificially generated datasets. The modelling results are compared to the results obtained when using the MML clustering of Gaussian distributions. Our modelling method compares quite well to an alternative clustering program (EMMIX) which uses various modelling criteria such as the Akaike Information Criterion (AIC) and Schwarz’s Bayesian Information Criterion (BIC).

Keywords  Clustering - Machine Learning - Knowledge Discovery - Data Mining - Unsupervised Learning - Minimum Message Length - MML - Mixture Modelling - Classification - Intrinsic Classification - Numerical Taxonomy - Information Theory - Statistical Inference

Fulltext Preview

Image of the first page of the fulltext document