View Related Documents

Abstract

Digital document archives are increasingly derived from various different media sources. At present such archives are stored and searched independently. The Information Retrieval from Mixed-Media Collections (IRMMC) project is investigating retrieval from combined document collections composed of items originating from differing media forms. Experimentalin vestigation of a “mixed-media” retrieval task based on the existing TREC Spoken Document Retrieval task combining Text, Spoken and Scanned Image is described. Results show that nontext media perform well within the mixed-media collection. Also while pseudo relevance feedback is extremely effective for spoken documents, its behaviour for document image retrievalis more complex.

Fulltext Preview

Image of the first page of the fulltext document