Digital document archives are increasingly derived from various different media sources. At present such archives are stored
and searched independently. The Information Retrieval from Mixed-Media Collections (IRMMC) project is investigating retrieval
from combined document collections composed of items originating from differing media forms. Experimentalin vestigation of
a “mixed-media” retrieval task based on the existing TREC Spoken Document Retrieval task combining Text, Spoken and Scanned
Image is described. Results show that nontext media perform well within the mixed-media collection. Also while pseudo relevance
feedback is extremely effective for spoken documents, its behaviour for document image retrievalis more complex.