Lecture Notes in Computer Science, 2008, Volume 5246/2008, 169-176, DOI: 10.1007/978-3-540-87391-4_23

Dealing with Small, Noisy and Imbalanced Data
Machine Learning or Manual Grammars?

Adam Przepiórkowski, Michał Marcińczuk and Łukasz Degórski

View Related Documents

Abstract

This paper deals with the task of definition extraction with the training corpus suffering from the problems of small size, high noise and heavy imbalance. A previous approach, based on manually constructed shallow grammars, turns out to be hard to better even by such robust classifiers as SVMs, AdaBoost and simple ensembles of classifiers. However, a linear combination of various such classifiers and manual grammars significantly improves the results of the latter.

Fulltext Preview

Image of the first page of the fulltext document