Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization

Luigi GalavottiContact Information, Fabrizio SebastianiContact Information and Maria SimiContact Information

(6)  AUTON S.R.L., Via Jacopo Nardi, 2 - 50132 Firenze, Italy
(7)  Consiglio Nazionale delle Ricerche, Istituto di Elaborazione dell’Informazione, 56100 Pisa, Italy
(8)  Dipartimento di Informatica, Università di Pisa, 56125 Pisa, Italy
Abstract
We tackle two different problems of text categorization (TC), namely feature selection and classifier induction. Feature selection (FS) refers to the activity of selecting, from the set of r distinct features (i.e. words) occurring in the collection, the subset of r′ ≪ r features that are most useful for compactly representing the meaning of the documents. We propose a novel FS technique, based on a simplified variant of the X 2 statistics. Classifier induction refers instead to the problem of auto- matically building a text classifier by learning from a set of documents pre-classified under the categories of interest. We propose a novel variant, based on the exploitation of negative evidence, of the well-known k-NN method. We report the results of systematic experimentation of these two methods performed on the standard Reuters-21578 benchmark.
We here make the assumptions that a document d j can belong to zero, one or many of the categories in C; this assumption is verified in the Reuters-21578 benchmark we use for our experiments. All the techniques we discuss here can be straightforwardly adapted to the other case in which each document belongs to exactly one category.

Contact Information Luigi Galavotti
Email: galavott@tin.it

Contact Information Fabrizio Sebastiani
Email: fabrizio@iei.pi.cnr.it

Contact Information Maria Simi
Email: simi@di.unipi.it
Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this chapter
Export this chapter as RIS | Text
 
Referenced by
3 newer articles

  1. Combarro, E.F. (2005) . IEEE Transactions on Knowledge and Data Engineering 17(9)
    [CrossRef]
  2. Montanes, E. (2005) Scoring and Selecting Terms for Text Categorization. IEEE Intelligent Systems 20(3)
    [CrossRef]
  3. Ceci, Michelangelo (2007) Classifying web documents in a hierarchy of categories: a comprehensive study. Journal of Intelligent Information Systems 28(1)
    [CrossRef]
Remote Address: 38.107.191.105 • Server: mpweb01
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)