Lecture Notes in Computer Science, 2001, Volume 2118/2001, 145-159, DOI: 10.1007/3-540-47714-4_14

Identification of Syntactically Similar DTD Elements for Schema Matching

Hong Su, Sriram Padmanabhan and Ming-Ling Lo

View Related Documents

Abstract

XML Document Type Definition (DTD) enforces the structure of XML documents. XML applications such as data translation, schema integration, and wrapper generation require DTD schema matching as a core procedure. While schema matching usually relies on a human arbiter, we are aiming at an automated system that can give the arbiter a starting point for designing a matching that can best meet the requirements of the given application. We present an approach that identifies the syntactically similar DTD elements that can be potential matching components. We first describe DTD element graph, a data model for the DTD elements. We then define the distance between two DTD element graphs. We introduce the concept of syntactically equivalent and syntactically similar graphs. Then, we describe the algorithm to detect both schema equivalent and similar DTD elements. We have implemented the matching detection algorithm and several heuristics which improve performance. Our experimental results show reasonable precision of the algorithm in terms of recognition of correct matches.

Fulltext Preview

Image of the first page of the fulltext document