Much recent research activity has focused toward automatically extracting linguistic information from on-line corpora. There
is no question that great progress has been made applying machine learning to computational linguistics. We believe now that
the field has matured, it is time to look inwards and carefully examine the basic tenets of the corpus-based learning paradigm.
The goal of this paper is to raise a number of issues that challenge the paradigm in hopes of stimulating introspection and
discussion that will make the field even stronger.