View Related Documents

Abstract

In this paper we describe a stemming algorithm for Galician language, which supports, at the same time, the four current orthographic regulations for Galician. The algorithm has already been implemented, and we have started to use it for its improvement. But this stemming algorithm cannot be applied over documents previous to the appearance of the first Galician orthographic regulation in 1977; therefore we have adopted an exhaustive approach, consisting in defining a huge collection of wordsets for allowing systematic word comparisons, to stem documents written before that date. We also describe here a tool to build the wordsets needed in this approach.

Keywords  Stemming - Digital Libraries - Text Retrieval

This work was partially granted by CICYT (TEL99-0335-C04-02) and the Vicerrectorado de Innovation Tecnoloxica (University of A Coruña).

Fulltext Preview

Image of the first page of the fulltext document