Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Chapter 9 - Text Mining and Applications (TEMA 2005)

STEMBR: A Stemming Algorithm for the Brazilian Portuguese Language

Reinaldo Viana AlvaresContact Information, Ana Cristina Bicharra GarciaContact Information and Inhaúma FerrazContact Information

(1)  UFF – Universidade Federal Fluminense, Instituto de Computação, Rua Passo da Pátria, 156 Bloco E - 3º Andar, São Domingos, Niterói, RJ 24210-240,  
Abstract
Stemming algorithms have traditionally been utilized in information retrieval systems as they generate a more concise word representation. However, the efficiency of these algorithms varies according to the language they are used with. This paper presents STEMBR, a stemmer for Brazilian Portuguese whereby the suffix treatment is based on a statistical study of the frequency of the last letter for words found in Brazilian web pages. The proposed stemmer is compared with another algorithm specifically developed for Portuguese. The results show the efficiency of our stemmer.

Contact Information Reinaldo Viana Alvares
Email: ralvares@ic.uff.br

Contact Information Ana Cristina Bicharra Garcia
Email: bicharra@ic.uff.br

Contact Information Inhaúma Ferraz
Email: ferraz@ic.uff.br
Fulltext Preview (Small, Large)
Image of the first page of the fulltext


Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.112 • Server: mpweb15
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)