Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Comparison of Normalization Techniques for Metasearch

Hayri Sever5 and Mehmet R. Tolun6

(5)  Computer Science Department, University of Massachusetts, 01003 Amherst, MA, USA
(6)  Department of Computer Engineering, Eastern Mediterranean University, Gazimagusa, TRNC, via Mersin 10, Turkey
Abstract
It is well-known fact that the combination of the retrieval outputs of different search systems in response to a query, known as metasearch, improves performance on average, provided that these combined systems (1) have compatible outputs, (2) produce accurate probability of relevance estimates of documents, and (3) be independent of each other. The objective of a normalization technique is to target the first requirement, i.e., document scores of different retrieval outputs are brought into a common scale so that document scores can be comparable across combined retrieval outputs. This has been a recent subject of researches in metasearch and information filtering fields. In this paper, we present a different perspective on multiple evidence combination and investigate various normalization techniques, mostly ad-hoc in nature, with a special focus on the SUM, which shifts minimum scores to zero and then scales their summation to one. This formal approach is equivalent to normalize the distribution of scores of all documents in a retrieval output by dividing them by their sample mean. We have made extensive experiments using ad hoc tracks of third and fifth TREC collections and CLEF’00 database. We argue that (1) the normalization method SUM is consistently better than the other traditionally proposed ones when combining outputs of search systems operating on a single database; (2) the SUM for combination of outputs of search systems operating on mutually exclusive databases is still valuable alternative to the one weighting score distributions of documents by their databases’ size.
This material is based on work supported in general by the Center for Intelligent Information Retrieval. Any opinions, findings and conclusions or recommendations expressed in this material are the author(s) and do not necessarily reflect those of the sponsor(s).

Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.109 • Server: mpweb24
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)