In this paper, we examine a number of newly applied methods for combining pre-retrieval query performance predictors in order
to obtain a better prediction of the query’s performance. However, in order to adequately and appropriately compare such techniques,
we critically examine the current evaluation methodology and show how using linear correlation coefficients (i) do not provide
an intuitive measure indicative of a method’s quality, (ii) can provide a misleading indication of performance, and (iii)
overstate the performance of combined methods. To address this, we extend the current evaluation methodology to include cross
validation, report a more intuitive and descriptive statistic, and apply statistical testing to determine significant differences.
During the course of a comprehensive empirical study over several TREC collections, we evaluate nineteen pre-retrieval predictors
and three combination methods.