The output of boolean association rule mining algorithms is often too large for manual examination. For dense datasets, it
is often impractical to even generate all frequent itemsets. The closed itemset approach handles this information overload
by pruning “uninteresting” rules following the observation that most rules can be derived from other rules. In this paper,
we propose a new framework, namely, the generalized closed (or g-closed) itemset framework. By allowing for a small tolerance in the accuracy of itemset supports, we show that the number
of such redundant rules is far more than what was previously estimated. Our scheme can be integrated into both levelwise algorithms
(Apriori) and two-pass algorithms (ARMOR). We evaluate its performance by measuring the reduction in output size as well as
in response time. Our experiments show that incorporating g-closed itemsets provides significant performance improvements
on a variety of databases.
A poster of this paper appeared in Proc. of IEEE Intl. Conf. on Data Engineering (ICDE), March 2003, Bangalore, India.