In this paper, we explore extending association analysis to non-traditional types of patterns and non-binary data by generalizing
the notion of confidence. We begin by describing a general framework that measures the strength of the connection between
two association patterns by the extent to which the strength of one association pattern provides information about the strength
of another. Although this framework can serve as the basis for designing or analyzing measures of association, the focus in
this paper is to use the framework as the basis for extending the traditional concept of confidence to error-tolerant itemsets
(ETIs) and continuous data. To that end, we provide two examples. First, we (1) describe an approach to defining confidence
for ETIs that preserves the interpretation of confidence as an estimate of a conditional probability, and (2) show how association
rules based on ETIs can have better coverage (at an equivalent confidence level) than rules based on traditional itemsets.
Next, we derive a confidence measure for continuous data that agrees with the standard confidence measure when applied to
binary transaction data. Further analysis of this result exposes some of the important issues involved in constructing a confidence
measure for continuous data.
Keywords Confidence - Support - Association rules - Error-tolerant itemsets - Data mining
Michael Steinbach earned the B.S. degree in mathematics, the M.S. degree in statistics, and the M.S. and Ph.D. degrees in computer science,
all from the University of Minnesota. He also has held a variety of software engineering, analysis, and design positions in
industry at Silicon Biology, Racotek, and NCR. Steinbach is currently a research associate in the Department of Computer Science
and Engineering at the University of Minnesota, Twin Cities. He is a co-author of the textbook,Introduction to Data Mining and has published numerous technical papers in peer-reviewed journals and conference proceedings. His research interests
include data mining, statistics, and bioinformatics. He is a member of the IEEE and the ACM.
Vipin Kumar is currently William Norris Professor and Head of the Computer Science and Engineering Department at the University of Minnesota.
He received the B.E. degree in electronics and communication engineering from the University of Roorkee, India, in 1977, the
M.E. degree in electronics engineering from Philips International Institute, Eindhoven, The Netherlands, in 1979, and the
Ph.D. degree in computer science from the University of Maryland, College Park, in 1982. Kumar’s current research interests
include high-performance computing and data mining. His research has resulted in the development of the concept of isoefficiency
metric for evaluating the scalability of parallel algorithms, as well as highly efficient parallel algorithms and software
for sparse matrix factorization (PSPASES), graph partitioning (METIS, ParMetis, hMetis), and dense hierarchical solvers. He
has authored over 200 research articles, and has coedited or coauthored 9 books including the widely used text booksIntroduction to Parallel Computing andIntroduction to Data Mining, both published by Addison Wesley. Kumar has served as chair/co-chair for many conferences/workshops in the area of data
mining and parallel computing, including theIEEE International Conference on Data Mining (2002) and the 15th International Parallel and Distributed Processing Symposium (2001). Currently, Kumar is the Chair of the steering committee of theSIAM International Conference on Data Mining, and a member of the steering committee of theIEEE International Conference on Data Mining. Kumar serves or has served on the editorial boards ofData Mining and Knowledge Discovery,Knowledge and Information Systems,IEEE Computational Intelligence Bulletin,Annual Review of Intelligent Informatics, Parallel Computing,Journal of Parallel and Distributed Computing,IEEE Transactions of Data and Knowledge Engineering (1993–1997),IEEE Concurrency (1997–2000), andIEEE Parallel and Distributed Technology (1995–1997). He is a Fellow of the ACM and IEEE and a member of SIAM.