Speech technology has been playing a central role in enhancing human-machine interactions, especially for small devices for
which graphical user interface has obvious limitations. The speech-centric perspective for human-computer interface advanced
in this paper derives from the view that speech is the only natural and expressive modality to enable people to access information
from and to interact with any device. In this paper, we describe some recent work conducted at Microsoft Research, aimed at
the development of enabling technologies for speech-centric multimodal human-computer interaction. In particular, we present
a case study of a prototype system, called MapPointS, which is a speech-centric multimodal map-query application for North
America. This prototype navigation system provides rich functionalities that allow users to obtain map-related information
through speech, text, and pointing devices. Users can verbally query for state maps, city maps, directions, places, nearby
businesses and other useful information within North America. They can also verbally control applications such as changing
the map size and panning the map moving interactively through speech. In the current system, the results of the queries are
presented back to users through graphical user interface. An overview and major components of the MapPointS system will be
presented in detail first. This will be followed by software design engineering principles and considerations adopted in developing
the MapPointS system, and by a description of some key robust speech processing technologies underlying general speech-centric
human-computer interaction systems.
Li Deng received the B.S. degree from University of Science and Technology of China in 1982, Master from University of Wisconsin-Madison
in 1984, and Ph.D. from University of Wisconsin-Madison in 1986. He worked on large vocabulary automatic speech recognition
in Montreal, Canada, 1986–1989. In 1989, he joined Dept. Electrical and Computer Engineering, University of Waterloo, Ontario,
Canada as Assistant Professor, where he became tenured. Full Professor in 1996. From 1992 to 1993, he conducted sabbatical
research at Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Mass, and from 1997–1998, at
ATR Interpreting Telecommunications Research Laboratories, Kyoto, Japan. Since 1999, he joined Microsoft Research, Redmond,
WA as Senior Researcher, and as affiliate full professor in Electrical Engineering at University of Washington, Seattle. His
research interests include acoustic-phonetic modeling of speech, speech and speaker recognition, speech synthesis and enhancement,
speech production and perception, auditory speech processing, noise robust speech processing, statistical methods and machine
learning, nonlinear signal processing, spoken language systems, multimedia signal processing, and multimodal human-computer
interaction. In these areas, he has published over 200 technical papers and book chapters, and is inventor and co-inventor
of numerous patents. He co-authored the book “Speech Processing—A Dynamic and Optimization-Oriented Approach” (2003, Marcel
Dekker Publishers, New York).
He served on Education Committee and Speech Processing Technical Committee of the IEEE Signal Processing Society 1996–2000,
and was Associate Editor for IEEE Transactions on Speech and Audio Processing 2002–2005. He currently serves on Multimedia
Signal Processing Technical Committee. He was a Technical Chair of IEEE International Conference on Acoustics, Speech, and
Signal Processing (ICASSP04). He is Fellow of the Acoustical Society of America and Fellow of the IEEE.
Dong Yu, who joined Microsoft in 1998, holds a BS degree on Electrical Engineering from Zhejiang University, China, an MS degree
on Electrical Engineering from Chinese Academy of Sciences, and an MS degree on Computer Science from Indiana University at
Bloomington, USA. He is currently a Ph.D. candidate on Computer Science at University of Idaho, USA.
Dong Yu's research interests are in areas of speech recognition and processing, and computer and network security. He has
published more than 20 papers in journals and conferences in above areas, and has applied for more than 10 US and international
patents.
Mr. Dong Yu has served as reviewers of many journals and conferences, including Journal of Computer Security, ICASSP, and
InterSpeech.