In recent years multimedia researchers have attempted to design content-based image retrieval systems. However, despite the
development of these systems, the term “content” has still remained rather ill defined, and this has made the evaluation of
such systems problematic. This paper proposes a method for the creation of a reference image set in which the similarity of
each image pair is estimated by two independent methods — by the subjective evaluation of human observers, and by the use
of “visual content words” as basis vectors that allow the multidimensional content of each image to be represented with a
content vector. The similarity measure computed with these content vectors is shown to correlate with the subjective judgment
of human observers, and thus provides both a more objective method for evaluating and expressing image content, and a possible
path to automating the process of content-based indexing in the future.