The indexing and retrieval of multimedia items is difficult due to the semantic gap between the user’s perception of the data
and the descriptions we can derive automatically from the data using computer vision, speech recognition, and natural language
processing. In this contribution we consider the nature of the semantic gap in more detail and show examples of methods that
help in limiting the gap. These methods can be automatic, but in general the indexing and retrieval of multimedia items should
be a collaborative process between the system and the user. We show how to employ the user’s interaction for limiting the
semantic gap.
This work is supported by the ICES-KIS MIA project.