Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

A Dialectal Chinese Speech Recognition Framework

Jing LiContact Information, Thomas Fang ZhengContact Information, William Byrne2, 3 Contact Information and Dan JurafskyContact Information

(1)  Center for Speech Technology, State Key Laboratory of Intelligent Technology and Systems Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, P.R. China
(2)  Machine Intelligence Laboratory, Cambridge University, U.K.
(3)  Center for Language and Speech Processing, The Johns Hopkins University, U.S.A.
(4)  Department of Linguistics, Stanford University, U.S.A.

Received: 20 December 2004  Accepted: 17 June 2005  

Abstract  A framework for dialectal Chinese speech recognition is proposed and studied, in which a relatively small dialectal Chinese (or in other words Chinese influenced by the native dialect) speech corpus and dialect-related knowledge are adopted to transform a standard Chinese (or Putonghua, abbreviated as PTH) speech recognizer into a dialectal Chinese speech recognizer. Two kinds of knowledge sources are explored: one is expert knowledge and the other is a small dialectal Chinese corpus. These knowledge sources provide information at four levels: phonetic level, lexicon level, language level, and acoustic decoder level. This paper takes Wu dialectal Chinese (WDC) as an example target language. The goal is to establish a WDC speech recognizer from an existing PTH speech recognizer based on the Initial-Final structure of the Chinese language and a study of how dialectal Chinese speakers speak Putonghua. The authors propose to use context-independent PTH-IF mappings (where IF means either a Chinese Initial or a Chinese Final), context-independent WDC-IF mappings, and syllable-dependent WDC-IF mappings (obtained from either experts or data), and combine them with the supervised maximum likelihood linear regression (MLLR) acoustic model adaptation method. To reduce the size of the multi-pronunciation lexicon introduced by the IF mappings, which might also enlarge the lexicon confusion and hence lead to the performance degradation, a Multi-Pronunciation Expansion (MPE) method based on the accumulated uni-gram probability (AUP) is proposed. In addition, some commonly used WDC words are selected and added to the lexicon. Compared with the original PTH speech recognizer, the resulting WDC speech recognizer achieves 10–18% absolute Character Error Rate (CER) reduction when recognizing WDC, with only a 0.62% CER increase when recognizing PTH. The proposed framework and methods are expected to work not only for Wu dialectal Chinese but also for other dialectal Chinese languages and even other languages.

Keywords  dialectal Chinese speech recognition - initial or final (IF) - IF-mapping rule - pronunciation modeling - small quantity of speech data

This paper is based upon a study supported by the US National Science Foundation under Grant No.0121285. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Jing Li is currently a Ph.D. candidate of Center for Speech Technology, the State Key Laboratory of Intelligent Technology and Systems, Department of Computer Science and Technology, Tsinghua University. He received his B.S. degree in computer science and technology from Tsinghua University, in 1999. He is now focusing on dialectal Chinese speech recognition, acoustic modeling, and keyword spotting.
Thomas Fang Zheng graduated from the Department of Computer Science & Technology of Tsinghua University and received his B.S., M.S. and Ph.D. degrees from Tsinghua University, in 1990, 1992 and 1997 respectively. Dr. Zheng is currently a professor at Tsinghua University. He is Vice Dean of Research Institute of Information Technology of Tsinghua University, and the Director of Center of Speech Technology, State Laboratory of Intelligent Technology and Systems. Dr. Zheng is now the Council Chair of the Chinese Corpus Consortium, an IEEE member, an ISCA member, a senior member of China Computer Federation, a member of the Artificial Intelligence and Pattern Recognition Technical Commission of China Computer Federation, a member of the editorial committee of the Journal of Chinese Information Processing, and a key member of Oriental-COCOSDA. He was a senior member and a co-leader at the Johns Hopkins University's Summer Workshop of Language and Speech Processing, in 2000 and 2004, working on pronunciation modeling and dialectal Chinese recognition, respectively. His main research interests are speech recognition, natural language understanding, and speaker recognition.
William Byrne received the B.S. degree in electrical engineering from Cornell University, Ithaca, NY in 1982, and the Ph.D. degree in electrical engineering from the University of Maryland, College Park, MA in 1993. He has worked at Entropic Research Laboratory, Washington DC, and the National Institutes of Health, Bethesda, MD. He is currently a research associate professor in the Department of Electrical Engineering and the Center for Language and Speech Processing at the Johns Hopkins University, Baltimore, MD, and a university lecturer in the Machine Intelligence Laboratory and a member of the Speech Research Group, Cambridge University, UK. His main research interests are in statistical modeling techniques for speech and language processing, with a recent interest in statistical machine translation.
Dan Jurafsky is an associate professor in the Department of Linguistics, Stanford University, where he just arrived in January of 2004. He received his B.A. degree in Linguistics in 1983, and his Ph.D. degree in computer science in 1992, both from UC Berkeley. He then worked for 8 years at the University of Colorado at Boulder, where he was an assistant and associate professor in the Department of Linguistics, the Institute of Cognitive Science, the Department of Computer Science, and the Center for Spoken Language Research. He still maintains an adjunct position at the University of Colorado, and continues to work closely with colleagues there. His research focuses on statistical models of human and machine language processing, especially computational linguistics, automatic speech recognition and understanding, computational psycholinguistics, and natural language processing. He received the National Science Foundation CAREER award in 1998, the MacArthur Fellowship in 2002. His most recent book, with James H. Martin, is the widely-used textbook “Speech and Language Processing”.

Contact Information Jing Li (Corresponding author)
Email: lijing@cst.cs.tsinghua.edu.cn

Contact Information Thomas Fang Zheng
Email: fzheng@tsinghua.edu.cn

Contact Information William Byrne
Email: wjb31@hermes.cam.ac.uk

Contact Information Dan Jurafsky
Email: jurafsky@stanford.edu
Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this article
Export this article as RIS | Text
 
Referenced by
1 newer article

  1. Chatzichrisafis, Nikos (2007) . IEEE Transactions on Audio Speech and Language Processing 15(3)
    [CrossRef]
Remote Address: 38.107.191.110 • Server: mpweb05
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)