View Related Documents

Abstract

We propose an algorithm to build decision trees when the observed data are probability distributions. This is of interest when one deals with massive database or with probabilistic models. We illustrate our method with a dataset describing districts of Great Britain. Our decision tree yields rules which explain the unemployment rate.
The decision tree in our case is built by replacing the test X > α, which is used to split the nodes in the usual case of real numbers, by the test P(X > α) < β, where α and β are determined through an algorithm based on probabilistic split evaluation criteria.

Fulltext Preview

Image of the first page of the fulltext document