Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Research Papers

Automatic Feature Extraction for Question Classification Based on Dissimilarity of Probability Distributions

David TomásContact Information, José L. VicedoContact Information, Empar BisbalContact Information and Lidia MorenoContact Information

(1)  Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Spain
(2)  Departamento de Sistemas Informáticos y Computación, Universidad Politécnica de Valencia, Spain
Abstract
Question classification is one of the first tasks carried out in a Question Answering system. In this paper we present a multilingual question classification system based on machine learning techniques. We use Support Vector Machines to classify the questions. All the features needed to train and test this method are automatically extracted through statistical information in an unsupervised way, comparing Poisson distributions of single words in two plain corpora of questions and documents. Thus, we need nothing but plain text to train the system, obtaining a flexible approach easy to adapt to new languages and domains. We have tested it on a bilingual corpus of questions in English and Spanish.
This work has been developed in the framework of the project CICYT R2D2 (TIC2003-07158-C04).

Contact Information David Tomás
Email: dtomas@dlsi.ua.es

Contact Information José L. Vicedo
Email: vicedo@dlsi.ua.es

Contact Information Empar Bisbal
Email: ebisbal@dsic.upv.es

Contact Information Lidia Moreno
Email: lmoreno@dsic.upv.es
Fulltext Preview (Small, Large)
Image of the first page of the fulltext


Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.110 • Server: mpweb21
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)