Record extraction from data-rich, unstructured, multiplerecord Web documents works well [9], but only if the text for each
record can be located and isolated. Although some multiple-record Web documents present records as contiguous, delineated
chunks of text (which can thus be located and isolated [10]), many do not. When some values of textual records are factored
out, are split unnaturally across boundaries, are joined unnaturally within boundaries, or are linked by off-page connectors,
or when desired records are interspersed with records that are not of interest, it is dificult to automatically cull records
and piece values together to form clean, delineated chunks of text that each represent a single record of interest. In this
paper we address this problem and propose an algorithm to find and rearrange (if necessary) records in an HTML document. The
essential idea is to attempt to maximize a record-recognition heuristic with respect to a given application ontology. Tests
we conducted for two widely differing applications show that this technique properly locates and reconfigures records.