Multiword expressions are a key problem for the development of large-scale, linguistically sound natural language processing
technology. This paper surveys the problem and some currently available analytic techniques. The various kinds of multiword
expressions should be analyzed in distinct ways, including listing “words with spaces”, hierarchically organized lexicons,
restricted combinatoric rules, lexical selection, “idiomatic constructions” and simple statistical affinity. An adequate comprehensive
analysis of multiword expressions must employ both symbolic and statistical techniques.
The research reported here was conducted in part under the auspices of the LinGO project, an international collaboration centered
around the lkb system and related resources (see http://lingo.stanford.edu). This research was supported in part by the Research Collaboration between NTT Communication Science Laboratories, Nippon
Telegraph and Telephone Corporation and CSLI, Stanford University. We would like to thank Emily Bender and Tom Wasow for their
contributions to our thinking. However, we alone are responsible for any errors that remain.