Lecture Notes in Computer Science, 1998, Volume 1513/1998, 58, DOI: 10.1007/3-540-49653-X_36

Comparing the Effect of Syntactic vs. Statistical Phrase Indexing Strategies for Dutch

Wessel Kraaij and Renée Pohlmann

View Related Documents

Abstract

In this paper we describe the results of experiments contrasting syntactic phrase indexing with statistical phrase indexing for Dutch texts. Our results showed that we at least need a compound splitting algorithm for good quality retrieval for Dutch texts. If we then add either syntactic or statistical phrases, performance generally improves, but this efiect is never statistically significant. If we compare syntactic vs. statistical phrase indexing, syntactic phrases are slightly superior to statistical phrases, particularly at high precision. At higher recall levels syntactic and statistical phrases are equally efiective. However, since a compound splitting algorithm requires a dictionary and knowledge about constraints on compound formation, a purely non-linguistic indexing strategy, with or without phrases, does not seem to be very efiective for Dutch.

Fulltext Preview

Image of the first page of the fulltext document