Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Tutorial

Translating Images to Keywords: Problems, Applications and Progress

Latifur KhanContact Information

(1)  Department of Computer Science, University of Texas at Dallas Richardson, Texas 75083-0688, USA
Abstract
The development of technology generates huge amounts of non-textual information, such as images. An efficient image annotation and retrieval system is highly desired. Clustering algorithms make it possible to represent visual features of images with finite symbols. Based on this, many statistical models, which analyze correspondence between visual features and words and discover hidden semantics, have been published. These models improve the annotation and retrieval of large image databases. However, image data usually have a large number of dimensions. Traditional clustering algorithms assign equal weights to these dimensions, and become confounded in the process of dealing with these dimensions.
In this tutorial, first, we will present current state of the art and its shortcomings. We will present some classical models (e.g., translation model (TM), cross-media relevance model etc.). Second, we will present weighted feature selection algorithm as a solution to the existing problem. For a given cluster, we determine relevant features based on histogram analysis and assign greater weight to relevant features as compared to less relevant features. Third, we will exploit spatial correlation to disambiguate visual features, and spatial relationship will be constructed by spatial association rule mining. Fourth, we will present the continuous relevance model and multiple Bernoulli model for avoiding clustering. We will present mechanisms to link visual tokens with keywords based on these models. Fifth, we will present mechanisms to improve accuracy of classical model, TM by exploiting the WordNet knowledge-base. Sixth, we will present a framework to model semantic visual concept in video/images by fusing multiple evidence with the usage of an ontology. Seventh, we will show that weighted feature selection is better than traditional ones (TM) for automatic image annotation and retrieval. Finally, we will discuss open problems and future directions in the domain of image and video.

Contact Information Latifur Khan
Email: lkhan@utdallas.edu
Fulltext Preview (Small, Large)
Image of the first page of the fulltext


Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.112 • Server: mpweb01
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)