Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Linguistic Resources and Tools

Automatic Acquisition of Basic Katakana Lexicon from a Given Corpus

Toshiaki NakazawaContact Information, Daisuke KawaharaContact Information and Sadao KurohashiContact Information

(1)  University of Tokyo, 7-3-1 Hongo Bunkyo-ku, Tokyo, 113-8656, Japan
Abstract
Katakana, Japanese phonogram mainly used for loan words, is a troublemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana compounds, which makes it possible to construct precise and concise Katakana word dictionary automatically, given only a medium or large size of Japanese corpus of some domain.

Contact Information Toshiaki Nakazawa
Email: nakazawa@kc.t.u-tokyo.ac.jp

Contact Information Daisuke Kawahara
Email: kawahara@kc.t.u-tokyo.ac.jp

Contact Information Sadao Kurohashi
Email: kuro@kc.t.u-tokyo.ac.jp
Fulltext Preview (Small, Large)
Image of the first page of the fulltext


Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.110 • Server: mpweb19
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)