Information contained in XML documents cannot properly be interpreted without an appropriate DTD. However, XML documents collected
from the web may not always be accompanied by the corresponding DTD, so that extracting information from such sources may
not be easy. In this study, we reverse construct a DTD from DTD-unknown XML sources, and use it to extract information from
XML inputs. The DTD construction module developed is designed to scan input XML files in 1-path, where most other implementations
use 2-path approach. Developed modules provide clean Java programming interfaces as well, so that it can be integrated with
other web applications seamlessly.
This works is supported in part by the Ministry of Information & Communication of Korea