We consider the classification of structured (e.g. XML) textual documents. We first propose a generative model based on Belief
Networks which allows us to simultaneously take into account structure and content information. We then show how this model
can be extended into a more efficient classifier using the Fisher kernel method. In both cases model parameters are learned
from a labelled training set of representative documents. We present experiments on two collections of structured documents:
WebKB which has become a reference corpus for HTML page classification and the new INEX corpus which has been developed for
the evaluation of XML information retrieval systems.
Keywords textual document classification - structured document - XML corpus - Belief Networks - Fisher Kernel