Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
|
 |
Comparison of Normalization Techniques for Metasearch
| |
|
Comparison of Normalization Techniques for Metasearch
Hayri Sever5 and Mehmet R. Tolun6
| (5) |
Computer Science Department, University of Massachusetts, 01003 Amherst, MA, USA |
| (6) |
Department of Computer Engineering, Eastern Mediterranean University, Gazimagusa, TRNC, via Mersin 10, Turkey |
Abstract
It is well-known fact that the combination of the retrieval outputs of different search systems in response to a query, known
as metasearch, improves performance on average, provided that these combined systems (1) have compatible outputs, (2) produce
accurate probability of relevance estimates of documents, and (3) be independent of each other. The objective of a normalization
technique is to target the first requirement, i.e., document scores of different retrieval outputs are brought into a common
scale so that document scores can be comparable across combined retrieval outputs. This has been a recent subject of researches
in metasearch and information filtering fields. In this paper, we present a different perspective on multiple evidence combination
and investigate various normalization techniques, mostly ad-hoc in nature, with a special focus on the SUM, which shifts minimum
scores to zero and then scales their summation to one. This formal approach is equivalent to normalize the distribution of
scores of all documents in a retrieval output by dividing them by their sample mean. We have made extensive experiments using
ad hoc tracks of third and fifth TREC collections and CLEF’00 database. We argue that (1) the normalization method SUM is
consistently better than the other traditionally proposed ones when combining outputs of search systems operating on a single
database; (2) the SUM for combination of outputs of search systems operating on mutually exclusive databases is still valuable
alternative to the one weighting score distributions of documents by their databases’ size.
This material is based on work supported in general by the Center for Intelligent Information Retrieval. Any opinions, findings
and conclusions or recommendations expressed in this material are the author(s) and do not necessarily reflect those of the
sponsor(s).
Fulltext Preview (Small, Large)
 References secured to subscribers.
|
|
|
|
|
|