The quality of search engines depends usually on the content of the returned documents rather than on the text used to express
this content. So ideally, search techniques should be directed more toward the semantic dependencies underlying documents
than toward the texts themselves. The most visible examples in this direction are Latent Semantic Analysis (LSA), and the
Hyperspace Analog to Language (HAL). If these techniques are really based on semantic dependencies, as they contend, then
they should be applicable across languages.
To investigate this contention we used electronic versions of two kinds of material with their translations: a novel, and
a popular treatise about cosmology. We used the analogy of fingerprinting as employed in forensics to establish whether individuals
are related. Genetic fingerprinting uses enzymes to split the DNA and then compare the resulting band patterns. Likewise,
in our research we used queries to split a document into fragments. If a search technique really isolates fragments semantically
related to the query, then a document and its translation should have similar band patterns.
In this paper we (1) present the fingerprinting technique, (2) introduce the material used, and (3) report results of an evaluation
for two semantic indexing techniques.