Annotating digital imagery of historical materials for the purpose of computer-based retrieval is a labor-intensive task for
many historians and digital collection managers. We have explored the possibilities of automated annotation and retrieval
of images from collections of art and cultural images. In this paper, we introduce the application of the ALIP (Automatic
Linguistic Indexing of Pictures) system, developed at Penn State, to the problem of machine-assisted annotation of images
of historical materials. The ALIP system learns the expertise of a human annotator on the basis of a small collection of annotated
representative images. The learned knowledge about the domain-specific concepts is stored as a dictionary of statistical models
in a computer-based knowledge base. When an un-annotated image is presented to ALIP, the system computes the statistical likelihood
of the image resembling each of the learned statistical models and the best concept is selected to annotate the image. Experimental
results, obtained using the Emperor image collection of the
Chinese Memory Net project, are reported and discussed. The system has been trained using subsets of images and metadata from the Emperor collection.
Finally, we introduce an integration of wavelet-based annotation and wavelet-based progressive displaying of very high resolution
copyright-protected images.
Keywords Content-based image retrieval - Statistical modeling - Hidden Markov models - Image annotation - Machine learning
A preliminary version of this work has been presented at the DELOS-NSF Workshop on Multimedia in Digital Libraries, Crete, Greece, June 2003. The work was completed when Kurt Grieb and Ya Zhang were students of The Pennsylvania State University.
James Z. Wang and Jia Li are also affiliated with Department of Computer Science and Engineering, The Pennsylvania State University.
Yixin Chen is also with the Research Institute for Children, Children's Hospital, New Orleans.