Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Learning Word Segmentation Rules for Tag Prediction

Dimitar KazakovContact Information, Suresh ManandharContact Information and Tomaž ErjavecContact Information

(3)  University of York, Heslington, York, YO10 5DD, UK
(4)  Department for Intelligent Systems, Jožef Stefan Institute, Ljubljana, Slovenia
Abstract
In our previous work we introduced a hybrid, GA&ILP-based approach for learning of stem-suffix segmentation rules from an unmarked list of words. Evaluation of the method was made difficult by the lack of word corpora annotated with their morphological segmentation. Here the hybrid approach is evaluated indirectly, on the task of tag prediction. A pair of stem-tag and suffix-tag lexicons is obtained by the application of that approach to an annotated lexicon of word-tag pairs. The two lexicons are then used to predict the tags of unseen words in two ways, (1) by using only the stem and suffix generated by the segmentation rules, and (2) for all matching combinations of stem and suffix present in the lexicons. The results show high correlation between the constituents generated by the segmentation rules, and the tags of the words in which they appear, thereby demonstrating the linguistic relevance of the segmentations produced by the hybrid approach.

Contact Information Dimitar Kazakov
Email: kazakov@cs.york.ac.uk
URL: http://www.cs.york.ac.uk/~kazakov/

Contact Information Suresh Manandhar
Email: suresh@cs.york.ac.uk
URL: http://www.cs.york.ac.uk/~suresh/

Contact Information Tomaž Erjavec
Email: Tomaz.Erjavec@ijs.si
URL: http://nl.ijs.si/tomaz/
Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.106 • Server: mpweb03
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)