Lecture Notes in Computer Science, 2003, Volume 2583/2003, 301-316, DOI: 10.1007/3-540-36468-4_20

Experimental Investigation of Pruning Methods for Relational Pattern Discovery

Irene Weber

View Related Documents

Abstract

Finding all interesting patterns in a database is a data mining task that typically requires a complete search through the hypothesis space. Several ILP systems address this task, e.g., [Deh98],[Wro97],[FL01]. Safe pruning techniques that reduce the size of the hypothesis space without the risk of missing interesting patterns are very important for this task. This paper is concerned with the effectiveness of pruning techniques in this setting. The addressed pruning techniques are (1) optimum estimates, (2) a pruning technique based on subset tests that is derived from the Apriori search algorithm, (3) pruning based on taxonomies, and (4) to consider only most general patterns as interesting. Methods (1) to (3) are safe pruning techniques that find all interesting patterns; method (4) reduces the number of accepted patterns. The effect of these pruning methods is investigated by experiments within a range of different specific task settings and two databases.
Experimental results indicate that optimum estimates and Apriori-style pruning are effective and reliable pruning techniques that produce little additional cost. The effect of taxonomies for pruning is smaller, and it varies over different task settings. In the experiments, the restriction to most general patterns considerably reduces the search costs as well as the set of accepted patterns.

Fulltext Preview

Image of the first page of the fulltext document