Lecture Notes in Computer Science, 2003, Volume 2838/2003, 120-131, DOI: 10.1007/978-3-540-39804-2_13

Using Belief Networks and Fisher Kernels for Structured Document Classification

Ludovic Denoyer and Patrick Gallinari

View Related Documents

Abstract

We consider the classification of structured (e.g. XML) textual documents. We first propose a generative model based on Belief Networks which allows us to simultaneously take into account structure and content information. We then show how this model can be extended into a more efficient classifier using the Fisher kernel method. In both cases model parameters are learned from a labelled training set of representative documents. We present experiments on two collections of structured documents: WebKB which has become a reference corpus for HTML page classification and the new INEX corpus which has been developed for the evaluation of XML information retrieval systems.

Keywords  textual document classification - structured document - XML corpus - Belief Networks - Fisher Kernel

Fulltext Preview

Image of the first page of the fulltext document