Foreign keys form one of the most fundamental constraints for relational databases. Since they are not always defined in existing
databases, the discovery of foreign keys turns out to be an important and challenging task. The underlying problem is known
to be the inclusion dependency (IND) inference problem. In this paper, data-mining algorithms are devised for IND inference
in a given database. We propose a two-step approach. In the first step, unary INDs are discovered thanks to a new preprocessing
stage which leads to a new algorithm and to an efficient implementation. In the second step, n-ary IND inference is achieved.
This step fits in the framework of levelwise algorithms used in many data-mining algorithms. Since real-world databases can
suffer from some data inconsistencies, approximate INDs, i.e. INDs which almost hold, are considered. We show how they can
be safely integrated into our unary and n-ary discovery algorithms. An implementation of these algorithms has been achieved
and tested against both synthetic and real-life databases. Up to our knowledge, no other algorithm does exist to solve this
data-mining problem.
Keywords Inclusion dependency discovery - Relational databases