This paper addresses
explicit correlation and
implicit correlation between various media streams in a composite multimedia document, the so-called navigated hypermedia document in our language learning system, in order to facilitate document retrieval and synchronized presentation. For replaying a recorded lecture in a form as close as possible to the original classroom experience, we devised a capturing mechanism to explicitly record all the lecturing media streams and relations between them, including instructor

s voice, slide change of the HTML lectures, and various guiding actions (e.g., tele-pointers, pen strokes, document scrolling, keyword highlighting, and text annotations) on HTML-based slides. In addition, for more effective learning, we study three different aspects - temporal, spatial, and content relation - of the implicit correlations that are inherently hidden between the media involved. The implicit relations are discovered by three designed processes: the speech-text alignment process for temporally synchronized speech-text presentation, the automatic scrolling process for the viewing window

s spatial synchronization, and the content dependency checking process to ensure consistency of the content processed and the relations involved. The experimental results show that exploring cross-media correlations is helpful for system development in document presentation and retrieving. Users are allowed to replay a vivid and learning-effective multimedia lecture and to access the desired part of the document very easily via cross-media indexing. Hence the results have been applied to the development of online multimedia language learning systems aimed at improving students

English and Chinese language capabilities.
Keywords: Cross-media correlation - Multimedia synchronization - Content access and presentation
Published online: 14 December 2004