Lecture Notes in Computer Science, 2008, Volume 5264/2008, 830-839, DOI: 10.1007/978-3-540-87734-9_94

Using the Tandem Approach for AF Classification in an AVSR System

Tian Gan, Wolfgang Menzel and Jianwei Zhang

View Related Documents

Abstract

This paper describes an audio visual speech recognition (AVSR) system based on articulatory features (AF). It implements a tandem approach where artificial neural networks (ANN), in particular multi-layer perceptrons (MLP), are used as posterior probability estimators for transforming raw input data into the more abstract articulatory features. Such an approach is particularly well suited if relatively few training data are available, a situation which is typical for AVSR. In addition, the MLP feature extraction results and some analysis in terms of recognition accuracy and confusions are presented. Our AF-based AVSR system has been trained on the audio-visual speech corpus VIDTIMIT, which contains conversational speech based on a medium size vocabulary including more than 1200 words.

Keywords  MLP - Articulatory Features - Audio Visual Speech Recognition

Fulltext Preview

Image of the first page of the fulltext document