This paper introduces a new criterium for term selection, which is based on the notion of Uncertainty. Term selection according
to this criterium is performed by the elimination of noisy terms on a class-by-class basis, rather than by selecting the most
significant ones. Uncertainty-based term selection (UC) is compared to a number of other criteria like Information Gain (IG),
simplified χ2 (SX), Term Frequency (TF) and Document Frequency (DF) in a Text Categorization setting. Experiments on data sets with different
properties (Reuters- 21578, patent abstracts and patent applications) and with two different algorithms (Winnow and Rocchio)
show that UC-based term selection is not the most aggressive term selection criterium, but that its effect is quite stable
across data sets and algorithms. This makes it a good candidate for a general “install-and-forget” term selection mechanism.
We also describe and evaluate a hybrid Term Selection technique, first applying UC to eliminate noisy terms and then using
another criterium to select the best terms.