Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Class Noise vs. Attribute Noise: A Quantitative Study

Xingquan Zhu1 and Xindong Wu1

(1) Department of Computer Science, University of Vermont, VT, USA

Abstract  Real-world data is never perfect and can often suffer from corruptions (noise) that may impact interpretations of the data, models created from the data and decisions made based on the data. Noise can reduce system performance in terms of classification accuracy, time in building a classifier and the size of the classifier. Accordingly, most existing learning algorithms have integrated various approaches to enhance their learning abilities from noisy environments, but the existence of noise can still introduce serious negative impacts. A more reasonable solution might be to employ some preprocessing mechanisms to handle noisy instances before a learner is formed. Unfortunately, rare research has been conducted to systematically explore the impact of noise, especially from the noise handling point of view. This has made various noise processing techniques less significant, specifically when dealing with noise that is introduced in attributes. In this paper, we present a systematic evaluation on the effect of noise in machine learning. Instead of taking any unified theory of noise to evaluate the noise impacts, we differentiate noise into two categories: class noise and attribute noise, and analyze their impacts on the system performance separately. Because class noise has been widely addressed in existing research efforts, we concentrate on attribute noise. We investigate the relationship between attribute noise and classification accuracy, the impact of noise at different attributes, and possible solutions in handling attribute noise. Our conclusions can be used to guide interested readers to enhance data quality by designing various noise handling mechanisms.

attribute noise - class noise - machine learning - noise impacts


Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this article
Export this article as RIS | Text
 
Referenced by
11 newer articles

  1. Zhu, Bing (2010) A robust missing value imputation method for noisy data. Applied Intelligence
    [CrossRef]
  2. Sabzekar, Mostafa (2010) Relaxed constraints support vector machines for noisy data. Neural Computing and Applications
    [CrossRef]
  3. Nettleton, David F. (2010) A study of the effect of different types of noise on the precision of supervised learning techniques. Artificial Intelligence Review
    [CrossRef]
  4. Wu, Xindong (2008) . IEEE Transactions on Systems Man and Cybernetics - Part A Systems and Humans 38(4)
    [CrossRef]
  5. Khoshgoftaar, Taghi M. (2009) . IEEE Transactions on Systems Man and Cybernetics Part C (Applications and Reviews)
    [CrossRef]
  6. Wu, Xindong, 2009
    [CrossRef]
  7. Hashemi, Sattar (2009) Flexible decision tree for data stream classification in the presence of concept change, noise and missing values. Data Mining and Knowledge Discovery
    [CrossRef]
  8. Khoshgoftaar, Taghi M. (2008) Imputation techniques for multivariate missingness in software measurement data. Software Quality Journal
    [CrossRef]
  9. Hulse, Jason D. (2006) The pairwise attribute noise detection algorithm. Knowledge and Information Systems
    [CrossRef]
  10. ZHU, XINGQUAN (2006) Bridging Local and Global Data Cleansing: Identifying Class Noise in Large, Distributed Data Datasets. Data Mining and Knowledge Discovery
    [CrossRef]
First | Next | Last
Remote Address: 38.107.191.115 • Server: MPWEB36
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)