IR research has a strong tradition of laboratory evaluation of systems. Such research is based on test collections, pre-defined
test topics, and standard evaluation metrics. While recent research has emphasized the user viewpoint by proposing user-based
metrics and non-binary relevance assessments, the methods are insufficient for truly user-based evaluation. The common assumption
of a single query per topic and session poorly represents real life. On the other hand, one well-known metric for multiple
queries per session, instance recall, does not capture early (within session) retrieval of (highly) relevant documents. We
propose an extension to the Discounted Cumulated Gain (DCG) metric, the Session-based DCG (sDCG) metric for evaluation scenarios
involving multiple query sessions, graded relevance assessments, and open-ended user effort including decisions to stop searching.
The sDCG metric discounts relevant results from later queries within a session. We exemplify the sDCG metric with data from
an interactive experiment, we discuss how the metric might be applied, and we present research questions for which the metric
is helpful.
Keywords Interactive IR - evaluation metrics - cumulated gain