Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
|
 |
Tokenising, Stemming and Stopword Removal on Anti-spam Filtering Domain
| Book Series | Lecture Notes in Computer Science |
| Publisher | Springer Berlin / Heidelberg |
| ISSN | 0302-9743 (Print) 1611-3349 (Online) |
| Volume | Volume 4177/2006 |
| Book | Current Topics in Artificial Intelligence |
| DOI | 10.1007/11881216 |
| Copyright | 2006 |
| ISBN | 978-3-540-45914-9 |
| Category | Selected Papers from the 11th Conference of the Spanish Association for Artificial Intelligence (CAEPIA 2005) |
| DOI | 10.1007/11881216_47 |
| Pages | 449-458 |
| Subject Collection | Computer Science |
| SpringerLink Date | Friday, October 13, 2006 |
| |
|
Selected Papers from the 11th Conference of the Spanish Association for Artificial Intelligence (CAEPIA 2005)
Tokenising, Stemming and Stopword Removal on Anti-spam Filtering Domain
J. R. Méndez1 , E. L. Iglesias1 , F. Fdez-Riverola1 , F. Díaz2 and J. M. Corchado3 
| (1) |
Dept. Informática, University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario
As Lagoas s/n, 32004, Ourense, Spain |
| (2) |
Dept. Informática, University of Valladolid, Escuela Universitaria de Informática, Plaza Santa Eulalia, 9-11, 40005, Segovia, Spain |
| (3) |
Dept. Informática y Automática, University of Salamanca, Plaza de la Merced s/n, 37008, Salamanca, Spain |
Abstract
Junk e-mail detection and filtering can be considered a cost-sensitive classification problem. Nevertheless, preprocessing
methods and noise reduction strategies used to enhance the computational efficiency in text classification cannot be so efficient
in e-mail filtering. This fact is demonstrated here where a comparative study of the use of stopword removal, stemming and
different tokenising schemes is presented. The final goal is to preprocess the training e-mail corpora of several content-based
techniques for spam filtering (machine approaches and case-based systems). Soundness conclusions are extracted from the experiments
carried out where different scenarios are taken into consideration.
Fulltext Preview (Small, Large)
|
|
|
|
|
|