With the growth of interest in data mining, there has been increasing interest in applying machine learning algorithms to
real-world problems. This raises the question of how to evaluate the performance of machine learning algorithms. The standard
procedure performs random sampling of predictive accuracy until a statistically significant difference arises between competing
algorithms. That procedure fails to take into account the calibration of predictions. An alternative procedure uses an information reward measure (from I.J. Good) which is sensitive both to domain
knowledge (predictive accuracy) and calibration. We analyze this measure, relating it to Kullback-Leibler distance. We also
apply it to five well-known machine learning algorithms across a variety of problems, demonstrating some variations in their
assessments using accuracy vs. information reward. We also look experimentally at information reward as a function of calibration
and accuracy.
Keywords Evaluation - information reward - predictive accuracy - scoring rules - machine learning - Kullback-Leibler distance