Data instances integration, specially on the web, involves analyzing and matching data from two or more sources, including
XML sources. XML sources, in particular, introduce new challenges to the integration process, given their dynamic and irregular
structure. In this context, one of the hardest steps is to find out which XML instances are similar. This paper presents a
group of algorithms to prepare XML instances for comparison. We analyse the benefit of these algorithms over existing XML
comparison approaches.
This work is partially supported by the DIGITEX Project of CNPq Foundation. CTInfo Process Nr.: 550.845/2005-4.