Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Distributed Pasting of Small Votes

N. V. ChawlaContact Information, L. O. HallContact Information, K. W. BowyerContact Information, T. E. Moore Jr.Contact Information and W. P. KegelmeyerContact Information

(6)  Department of Computer Science and Engineering, University of South Florida, 4202 E. Fowler Avenue, Tampa, Florida 33620, USA
(7)  Department of Computer Science and Engineering, University of Notre Dame, 384 Fitzpatrick Hall, Notre Dame, IN 46556, USA
(8)  Biosystems Research Department, Sandia National Labs, P.O. Box 969, MS 9951, Livermore, CA 94551-0969, USA
Abstract
Bagging and boosting are two popular ensemble methods that achieve better accuracy than a single classifier. These techniques have limitations on massive datasets, as the size of the dataset can be a bottleneck. Voting many classifiers built on small subsets of data (“pasting small votes”) is a promising approach for learning from massive datasets. Pasting small votes can utilize the power of boosting and bagging, and potentially scale up to massive datasets. We propose a framework for building hundreds or thousands of such classifiers on small subsets of data in a distributed environment. Experiments show this approach is fast, accurate, and scalable to massive datasets.

Contact Information N. V. Chawla
Email: chawla@csee.usf.edu

Contact Information L. O. Hall
Email: hall@csee.usf.edu

Contact Information K. W. Bowyer
Email: kwb@cse.nd.edu

Contact Information T. E. Moore Jr.
Email: tmoore4@csee.usf.edu

Contact Information W. P. Kegelmeyer
Email: wpk@ca.sandia.gov
Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.105 • Server: mpweb06
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)