View Related Documents

Abstract

This paper addresses the problem of learning a statistical distribution of data in a relational database. Data we want to focus on are represented with trees which are a quite natural way to represent structured information. These trees are used afterwards to infer a stochastic tree automaton, using a well-known grammatical inference algorithm. We propose two extensions of this algorithm: use of sorts and generalization of the infered automaton according to a local criterion. We show on some experiments that our approach scales with large databases and both improves the predictive power of the learned model and the convergence of the learning algorithm.

Keywords  Stochastic tree automata - multi-relational data mining - generalization - sorts

Fulltext Preview

Image of the first page of the fulltext document