Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

A Trigram Statistical Language Model Algorithm for Chinese Word Segmentation

Jun Mao1, Gang Cheng1, Yanxiang He1 and Zehuan Xing2

(1)  Computer School, Wuhan University, Wuhan 430072, P. R. China
(2)  Department of Linguistics, Central China Normal University, Wuhan 430079, P. R. China
Abstract
We address the problem of segmenting a Chinese text into words. In this paper, we propose a trigram model algorithm for segmenting a Chinese text. We also discuss why statistical language model is appropriate to be applied to Chinese word segmentation and give an algorithm for segmenting a Chinese text into words. In particular, we solve the problem of searching which often leads to low performance brought by trigram model. Finally, the issue of OOV word identification is discussed and merged to trigram model based method in order to improve the accuracy of segmentation.

Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.112 • Server: mpweb24
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)