Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
|
 |
Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization
| |
|
Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization
Luigi Galavotti6 , Fabrizio Sebastiani7 and Maria Simi8 
| (6) |
AUTON S.R.L., Via Jacopo Nardi, 2 - 50132 Firenze, Italy |
| (7) |
Consiglio Nazionale delle Ricerche, Istituto di Elaborazione dell’Informazione, 56100 Pisa, Italy |
| (8) |
Dipartimento di Informatica, Università di Pisa, 56125 Pisa, Italy |
Abstract
We tackle two different problems of text categorization (TC), namely feature selection and classifier induction. Feature selection (FS) refers to the activity of selecting, from the set of r distinct features (i.e. words) occurring in the collection, the subset of r′ ≪ r features that are most useful for compactly representing the meaning of the documents. We propose a novel FS technique, based
on a simplified variant of the X
2 statistics. Classifier induction refers instead to the problem of auto- matically building a text classifier by learning from a set of documents pre-classified
under the categories of interest. We propose a novel variant, based on the exploitation of negative evidence, of the well-known
k-NN method. We report the results of systematic experimentation of these two methods performed on the standard Reuters-21578 benchmark.
We here make the assumptions that a document d
j
can belong to zero, one or many of the categories in C; this assumption is verified in the Reuters-21578 benchmark we use for our experiments. All the techniques we discuss here can be straightforwardly adapted to the other
case in which each document belongs to exactly one category.
Fulltext Preview (Small, Large)
 References secured to subscribers.
|
|
|
|
|
|