Information retrieval addresses the problem of finding those documents whose content matches a user’s request from among a
large collection of documents. Currently, the most successful general purpose retrieval methods are statistical methods that
treat text as little more than a bag of words. However, attempts to improve retrieval performance through more sophisticated
linguistic processing have been largely unsuccessful. Indeed, unless done carefully, such processing can degrade retrieval
effectiveness.
Several factors contribute to the dificulty of improving on a good statistical baseline including: the forgiving nature but
broad coverage of the typical retrieval task; the lack of good weighting schemes for compound index terms; and the implicit
linguistic processing inherent in the statistical methods. Natural language processing techniques may be more important for
related tasks such as question answering or document summarization.