The paper demonstrates that the addition of automatically selected word-pairs substantially increases the accuracy of text
classification which is contrary to most previously reported research. The word-pairs are selected automatically using a technique
based on frequencies of
n-grams (sequences of characters), which takes into account both the frequencies of word-pairs as well as the context in which
they occur.
These improvements are reported for two different classifiers, support vector machines (SVM) and k-nearest neighbours (kNN), and two different text corpora. For the first of them, a collection of articles from PC Week magazine, the addition of
word-pairs increases micro-averaged breakeven accuracy by more than 6% point from a baseline accuracy (without pairs) of around
40%. For second one, the standard Reuters benchmark, SVM classifier using augmentation with pairs outperforms all previously
reported results.