Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Multilevel Integration of Vision and Speech Understanding Using Bayesian Networks

Sven WachsmuthContact Information, Hans Brandt-Pook5, Gudrun Socher5, 6 Contact Information, Franz Kummert5 and Gerhard Sagerer5

(5)  Technical Faculty, University of Bielefeld, P.O. Box 100131, 33501 Beilefeld, Germany
(6)  Vidam Communications Inc., 2 N 1st St., San Jose, CA, 95113
Abstract
The interaction of image and speech processing is a crucial property of multimedia systems. Classical systems using inferences on pure qualitative high level descriptions miss a lot of information when concerned with erroneous, vague, or incomplete data. We propose a new architecture that integrates various levels of processing by using multiple representations of the visually observed scene. They are vertically connected by Bayesian networks in order to find the most plausible interpretation of the scene.
The interpretation of a spoken utterance naming an object in the visually observed scene is modeled as another partial representation of the scene. Using this concept, the key problem is the identification of the verbally specified object instances in the visually observed scene. Therefore, a Bayesian network is generated dynamically from the spoken utterance and the visual scene representation. In this network spatial knowledge as well as knowledge extracted from psycholinguistic experiments is coded. First results show the robustness of our approach.
The work of G. Socher has been supported by the German Research Foundation (DFG).

Contact Information Sven Wachsmuth
Email: swachsmu@techfak.uni-bielefeld.de

Contact Information Gudrun Socher
Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this chapter
Export this chapter as RIS | Text
 
Referenced by
1 newer article

  1. Liang, J.M. (2005) A Framework for Generic Object Recognition with Bayesian Networks. International Journal of Computers and Applications 27(3)
    [CrossRef]
Remote Address: 38.107.191.109 • Server: MPWEB26
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)