Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Short Papers

Naive Bayes for Text Classification with Unbalanced Classes

Eibe FrankContact Information and Remco R. Bouckaert1, 2 Contact Information

(1)  Computer Science Department, University of Waikato, New Zealand
(2)  Xtal Mountain Information Technology, Auckland, New Zealand
Abstract
Multinomial naive Bayes (MNB) is a popular method for document classification due to its computational efficiency and relatively good predictive performance. It has recently been established that predictive performance can be improved further by appropriate data transformations [1,2]. In this paper we present another transformation that is designed to combat a potential problem with the application of MNB to unbalanced datasets. We propose an appropriate correction by adjusting attribute priors. This correction can be implemented as another data normalization step, and we show that it can significantly improve the area under the ROC curve. We also show that the modified version of MNB is very closely related to the simple centroid-based classifier and compare the two methods empirically.

Contact Information Eibe Frank
Email: eibe@cs.waikato.ac.nz

Contact Information Remco R. Bouckaert
Email: remco@cs.waikato.ac.nz
Fulltext Preview (Small, Large)
Image of the first page of the fulltext


Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.113 • Server: mpweb07
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)