Text categorization refers to the task of assigning the pre-defined classes to text documents based on their content.
k-NN algorithm is one of top performing classifiers on text data. However, there is little research work on the use of different
voting methods over text data. Also, when a huge number of training data is available online, the response speed slows down,
since a test document has to obtain the distance with each training data. On the other hand, min–max-modular
k-NN (M
3-
k-NN) has been applied to large-scale text categorization. M
3-
k-NN achieves a good performance and has faster response speed in a parallel computing environment. In this paper, we investigate
five different voting methods for
k-NN and M
3-
k-NN. The experimental results and analysis show that the Gaussian voting method can achieve the best performance among all
voting methods for both
k-NN and M
3-
k-NN. In addition, M
3-
k-NN uses less
k-value to achieve the better performance than
k-NN, and thus is faster than
k-NN in a parallel computing environment.
Keywords Text categorization -
k-NN algorithm - Min–max-modular k-NN - Parallel computing
The work of K. Wu and B. L. Lu was supported in part by the National Natural Science Foundation of China under the grants
NSFC 60375022 and NSFC 60473040, and the Microsoft Laboratory for Intelligent Computing and Intelligent Systems of Shanghai
Jiao Tong University.