The objective of this study is to shed more light on the dependence between the performance of WSD feature-based classifiers
and the specific features that may be chosen to represent a word context. In this paper we show that the features commonly
used to discriminate among different senses of a word (words, keywords, POS tags) are overly sparse to enable the acquisition
of truly predictive rules or probabilistic models. Experimental analysis demonstrates (with some surprising result) that the
acquired rules are mostly tied to surface phenomena occurring in the learning set data, and do not generalize across hyponimys
of a word nor across language domains. This experiment, as conceived, has no practical application in WSD, but clearly shows
the positive influence of a more semantically oriented approach to WSD. Our conclusion is that feature-based WSD is at a dead-end,
as also confirmed by the recent results of Senseval 2001.