Lecture Notes in Computer Science, 2005, Volume 3513/2005, 351-363, DOI: 10.1007/11428817_12

Web-Assisted Detection and Correction of Joint and Disjoint Malapropos Word Combinations

Igor A. Bolshakov and Sofia N. Galicia-Haro

View Related Documents

Abstract

An experiment on Web-assisted detection and correction of malapropism is reported. Malapropos words semantically destroy collocations they are in, usually with retention of syntactical links with other words. A hundred English malapropisms were gathered, each supplied with its correction candidates, i.e. word combinations with one word equal to an editing variant of the corresponding word in the malapropism. Google statistics of occurrences and co-occurrences were gathered for each malapropism and correcting candidate. The collocation components may be adjacent or separated by other words in a sentence, so statistics were accumulated for the most probable distance between them. The raw Google occurrence statistics are then recalculated to numeric values of a specially defined Semantic Compatibility Index (SCI). Heuristic rules are proposed to signal malapropisms when SCI values are lower than a predetermined threshold and to retain a few highly SCI-ranked correction candidates. Within certain limitations, the experiment gave promising results.
Work done under partial support of Mexican Government (CONACyT, SNI) and CGEPI-IPN, Mexico. Many thanks to Denis Filatov, Alexander Gelbukh, and Patrick Cassidy for their help with manuscript preparation.

Fulltext Preview

Image of the first page of the fulltext document