Lecture Notes in Computer Science, 2000, Volume 1846/2000, 215-226, DOI: 10.1007/3-540-45151-X_20

Hierarchically Classifying Chinese Web Documents without Dictionary Support and Segmentation Procedure1

Shuigeng Zhou, Ye Fan, Jiangtao Hu, Fang Yu and Yunfa Hu

View Related Documents

Abstract

This paper reports a system that hierarchically classifies Chinese web documents without dictionary support and segmentation procedure. In our classifier, Web documents are represented by N-grams (N≤4) that are easy to be extracted. A boosting machine learning approach is applied to classifying Web Chinese documents that share a topic hierarchy. The open and modularized system architecture makes our classifier be extendible. Experimental results show that our system can effectively and efficiently classify Chinese Web documents.
This work is supported by the 973 High-Tech Projects Foundation of China and partially supported by a grant (No. 69933010) from NSFC.

Fulltext Preview

Image of the first page of the fulltext document