Lecture Notes in Computer Science, 2005, Volume 3248/2005, 32-41, DOI: 10.1007/978-3-540-30211-7_4

Combining Labeled and Unlabeled Data for Learning Cross-Document Structural Relationships

Zhu Zhang and Dragomir Radev

View Related Documents

Abstract

Multi-document discourse analysis has emerged with the potential of improving various NLP applications. Based on the newly proposed Cross-document Structure Theory (CST), this paper describes an empirical study that classifies CST relationships between sentence pairs extracted from topically related documents, exploiting both labeled and unlabeled data. We investigate a binary classifier for determining existence of structural relationships and a full classifier using the full taxonomy of relationships. We show that in both cases the exploitation of unlabeled data helps improve the performance of learned classifiers.

Fulltext Preview

Image of the first page of the fulltext document