A Proposal for Annotation, Semantic Similarity and Classification of Textual Documents

Emmanuel Nauer and Amedeo Napoli

View Related Documents

Abstract

In this paper, we present an approach for classifying documents based on the notion of a semantic similarity and the effective representation of the content of the documents. The content of a document is annotated and the resulting annotation is represented by a labeled tree whose nodes and edges are represented by concepts lying within a domain ontology. A reasoning process may be carried out on annotation trees, allowing the comparison of documents between each others, for classification or information retrieval purposes. An algorithm for classifying documents with respect to semantic similarity and a discussion conclude the paper.

Keywords  content-based classification of documents - domain ontology - document annotation - semantic similarity

Fulltext Preview

Image of the first page of the fulltext document