Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Hierarchical Classification of Documents with Error Control

Chun-hung ChengContact Information, Jian TangContact Information, Ada Wai-chee FuContact Information and Irwin KingContact Information

(4)  Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
(5)  Department of Computer Science, Memorial University of Newfoundland, St. John’s, NF A1B 3X5, Canada
Abstract
Classification is a function that matches a new object with one of the predefined classes. Document classification is characterized by the large number of attributes involved in the objects (documents). The traditional method of building a single classifier to do all the classification work would incur a high overhead. Hierarchical classification is a more efficient method — instead of a single classifier, we use a set of classifiers distributed over a class taxonomy, one for each internal node. However, once a misclassification occurs at a high level class, it may result in a class that is far apart from the correct one. An existing approach to coping with this problem requires terms also to be arranged hierarchically. In this paper, instead of overhauling the classifier itself, we propose mechanisms to detect misclassification and take appropriate actions. We then discuss an alternative that masks the misclassification based on a well known software fault tolerance technique. Our experiments show our algorithms represent a good trade-off between speed and accuracy in most applications.

Keywords  Hierarchical document classification - naive Bayesian classifier - error control - class taxonomy - parallel algorithm


Contact Information Chun-hung Cheng
Email: chcheng@cse.cuhk.edu.hk

Contact Information Jian Tang
Email: jian@cs.mun.ca

Contact Information Ada Wai-chee Fu
Email: adafu@cse.cuhk.edu.hk

Contact Information Irwin King
Email: king@cse.cuhk.edu.hk
Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.107 • Server: mpweb07
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)