View Related Documents

Abstract

In this paper, we explore extending association analysis to non-traditional types of patterns and non-binary data by generalizing the notion of confidence. We begin by describing a general framework that measures the strength of the connection between two association patterns by the extent to which the strength of one association pattern provides information about the strength of another. Although this framework can serve as the basis for designing or analyzing measures of association, the focus in this paper is to use the framework as the basis for extending the traditional concept of confidence to error-tolerant itemsets (ETIs) and continuous data. To that end, we provide two examples. First, we (1) describe an approach to defining confidence for ETIs that preserves the interpretation of confidence as an estimate of a conditional probability, and (2) show how association rules based on ETIs can have better coverage (at an equivalent confidence level) than rules based on traditional itemsets. Next, we derive a confidence measure for continuous data that agrees with the standard confidence measure when applied to binary transaction data. Further analysis of this result exposes some of the important issues involved in constructing a confidence measure for continuous data.

Keywords  Confidence - Support - Association rules - Error-tolerant itemsets - Data mining

Michael Steinbach earned the B.S. degree in mathematics, the M.S. degree in statistics, and the M.S. and Ph.D. degrees in computer science, all from the University of Minnesota. He also has held a variety of software engineering, analysis, and design positions in industry at Silicon Biology, Racotek, and NCR. Steinbach is currently a research associate in the Department of Computer Science and Engineering at the University of Minnesota, Twin Cities. He is a co-author of the textbook,Introduction to Data Mining and has published numerous technical papers in peer-reviewed journals and conference proceedings. His research interests include data mining, statistics, and bioinformatics. He is a member of the IEEE and the ACM.
Vipin Kumar is currently William Norris Professor and Head of the Computer Science and Engineering Department at the University of Minnesota. He received the B.E. degree in electronics and communication engineering from the University of Roorkee, India, in 1977, the M.E. degree in electronics engineering from Philips International Institute, Eindhoven, The Netherlands, in 1979, and the Ph.D. degree in computer science from the University of Maryland, College Park, in 1982. Kumar’s current research interests include high-performance computing and data mining. His research has resulted in the development of the concept of isoefficiency metric for evaluating the scalability of parallel algorithms, as well as highly efficient parallel algorithms and software for sparse matrix factorization (PSPASES), graph partitioning (METIS, ParMetis, hMetis), and dense hierarchical solvers. He has authored over 200 research articles, and has coedited or coauthored 9 books including the widely used text booksIntroduction to Parallel Computing andIntroduction to Data Mining, both published by Addison Wesley. Kumar has served as chair/co-chair for many conferences/workshops in the area of data mining and parallel computing, including theIEEE International Conference on Data Mining (2002) and the 15th International Parallel and Distributed Processing Symposium (2001). Currently, Kumar is the Chair of the steering committee of theSIAM International Conference on Data Mining, and a member of the steering committee of theIEEE International Conference on Data Mining. Kumar serves or has served on the editorial boards ofData Mining and Knowledge Discovery,Knowledge and Information Systems,IEEE Computational Intelligence Bulletin,Annual Review of Intelligent Informatics, Parallel Computing,Journal of Parallel and Distributed Computing,IEEE Transactions of Data and Knowledge Engineering (1993–1997),IEEE Concurrency (1997–2000), andIEEE Parallel and Distributed Technology (1995–1997). He is a Fellow of the ACM and IEEE and a member of SIAM.

Fulltext Preview

Image of the first page of the fulltext document