Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
|
 |
Hierarchical Classification of Documents with Error Control
| |
|
Hierarchical Classification of Documents with Error Control
Chun-hung Cheng4 , Jian Tang5 , Ada Wai-chee Fu4 and Irwin King4 
| (4) |
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong |
| (5) |
Department of Computer Science, Memorial University of Newfoundland, St. John’s, NF A1B 3X5, Canada |
Abstract
Classification is a function that matches a new object with one of the predefined classes. Document classification is characterized
by the large number of attributes involved in the objects (documents). The traditional method of building a single classifier
to do all the classification work would incur a high overhead. Hierarchical classification is a more efficient method — instead
of a single classifier, we use a set of classifiers distributed over a class taxonomy, one for each internal node. However, once a misclassification occurs at a high level class, it may result in a class that
is far apart from the correct one. An existing approach to coping with this problem requires terms also to be arranged hierarchically.
In this paper, instead of overhauling the classifier itself, we propose mechanisms to detect misclassification and take appropriate
actions. We then discuss an alternative that masks the misclassification based on a well known software fault tolerance technique.
Our experiments show our algorithms represent a good trade-off between speed and accuracy in most applications.
Keywords Hierarchical document classification - naive Bayesian classifier - error control - class taxonomy - parallel algorithm
Fulltext Preview (Small, Large)
 References secured to subscribers.
|
|
|
|
|
|