Volume 41, Number 3, 255-269, DOI: 10.1007/s11265-005-4150-4

A Speech-Centric Perspective for Human-Computer Interface: A Case Study

Li Deng and Dong Yu

From the issue entitled "Special Issue on Multimedia Signal Processing"

View Related Documents

Abstract

Speech technology has been playing a central role in enhancing human-machine interactions, especially for small devices for which graphical user interface has obvious limitations. The speech-centric perspective for human-computer interface advanced in this paper derives from the view that speech is the only natural and expressive modality to enable people to access information from and to interact with any device. In this paper, we describe some recent work conducted at Microsoft Research, aimed at the development of enabling technologies for speech-centric multimodal human-computer interaction. In particular, we present a case study of a prototype system, called MapPointS, which is a speech-centric multimodal map-query application for North America. This prototype navigation system provides rich functionalities that allow users to obtain map-related information through speech, text, and pointing devices. Users can verbally query for state maps, city maps, directions, places, nearby businesses and other useful information within North America. They can also verbally control applications such as changing the map size and panning the map moving interactively through speech. In the current system, the results of the queries are presented back to users through graphical user interface. An overview and major components of the MapPointS system will be presented in detail first. This will be followed by software design engineering principles and considerations adopted in developing the MapPointS system, and by a description of some key robust speech processing technologies underlying general speech-centric human-computer interaction systems.

Keywords  human-computer interaction - speech-centric multimodal interface - robust speech processing - MapPointS - speech-driven mobile navigation system

Li Deng received the B.S. degree from University of Science and Technology of China in 1982, Master from University of Wisconsin-Madison in 1984, and Ph.D. from University of Wisconsin-Madison in 1986. He worked on large vocabulary automatic speech recognition in Montreal, Canada, 1986–1989. In 1989, he joined Dept. Electrical and Computer Engineering, University of Waterloo, Ontario, Canada as Assistant Professor, where he became tenured. Full Professor in 1996. From 1992 to 1993, he conducted sabbatical research at Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Mass, and from 1997–1998, at ATR Interpreting Telecommunications Research Laboratories, Kyoto, Japan. Since 1999, he joined Microsoft Research, Redmond, WA as Senior Researcher, and as affiliate full professor in Electrical Engineering at University of Washington, Seattle. His research interests include acoustic-phonetic modeling of speech, speech and speaker recognition, speech synthesis and enhancement, speech production and perception, auditory speech processing, noise robust speech processing, statistical methods and machine learning, nonlinear signal processing, spoken language systems, multimedia signal processing, and multimodal human-computer interaction. In these areas, he has published over 200 technical papers and book chapters, and is inventor and co-inventor of numerous patents. He co-authored the book “Speech Processing—A Dynamic and Optimization-Oriented Approach” (2003, Marcel Dekker Publishers, New York).
He served on Education Committee and Speech Processing Technical Committee of the IEEE Signal Processing Society 1996–2000, and was Associate Editor for IEEE Transactions on Speech and Audio Processing 2002–2005. He currently serves on Multimedia Signal Processing Technical Committee. He was a Technical Chair of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP04). He is Fellow of the Acoustical Society of America and Fellow of the IEEE.
Dong Yu, who joined Microsoft in 1998, holds a BS degree on Electrical Engineering from Zhejiang University, China, an MS degree on Electrical Engineering from Chinese Academy of Sciences, and an MS degree on Computer Science from Indiana University at Bloomington, USA. He is currently a Ph.D. candidate on Computer Science at University of Idaho, USA.
Dong Yu's research interests are in areas of speech recognition and processing, and computer and network security. He has published more than 20 papers in journals and conferences in above areas, and has applied for more than 10 US and international patents.
Mr. Dong Yu has served as reviewers of many journals and conferences, including Journal of Computer Security, ICASSP, and InterSpeech.

Fulltext Preview

Image of the first page of the fulltext document