Lecture Notes in Computer Science, 2004, Volume 3129/2004, 499-508, DOI: 10.1007/978-3-540-27772-9_50

PLD: A Distillation Algorithm for Misclassified Documents

Ding-Yi Chen and Xue Li

View Related Documents

Abstract

We observed that in interactive text classification, user tends to point out only the misclassified documents, not the correct ones. It is unlikely that a user would be diligent enough to identify all the misclassified documents. In this case, a classifier is expected to deal with misclassified documents. Among them it is possible that only a small proportion has been identified. We propose the Prediction-Learning-Distillation (PLD) framework for distilling the misclassified documents. Whenever a user points out an error, the PLD learns from the mistake and identifies the same mistake from all other classified documents. The PLD then enforces this learning for future classifications. Our experiment results have demonstrated that the proposed algorithm can learn from user identified misclassified documents, then distills the rest successfully.

Keywords  Document Classification - SVM - Winnow Algorithm

Fulltext Preview

Image of the first page of the fulltext document