Lecture Notes in Computer Science, 2002, Volume 2417/2002, 472-481, DOI: 10.1007/3-540-45683-X_51

Wrapper Generation by Using XML-Based Domain Knowledge for Intelligent Information Extraction

Jaeyoung Yang, Jungsun Kim, Kyoung-Goo Doh and Joongmin Choi

View Related Documents

Abstract

This paper discusses some of the issues in Web information extraction, focusing on automatic extraction methods that exploit wrapper induction. In particular, we point out the limitations of traditional heuristic-based wrapper generation systems, and as a solution, emphasize the importance of the domain knowledge in the process of wrapper generation.
We demonstrate the effectiveness of domain knowledge by presenting our scheme of knowledge-based wrapper generation for semi-structured and labeled documents. Our agent-oriented information extraction system, XTROS, represents both the domain knowledge and the wrappers by XML documents to increase modularity, flexibility, and interoperability. XTROS shows good performance on several Web sites in the domain of real estate, and it is expected to be easily adaptable to different domains by plugging in appropriate XML-based domain knowledge.

Fulltext Preview

Image of the first page of the fulltext document