This paper studies the performance of four alternative evaluation methods; two instances of the Exponential Moving average,
the Elo-rating and the Glicko-rating method. These methods are tested in a co-evolutionary setup using the LINT-game, which
is known to be problematic under co-evolutionary conditions. Besides the different evaluation approaches, two methods aimed
at preserving diversity are tested. By using the Objective Fitness Correlation as an analytical tool for monitoring accuracy
of evaluation, it is shown that actual performance of an evaluation method strongly depends on whether co-evolutionary failure
occurs and that a multi-modal approach to the LINT-problem is effective in maintaining stable progress over time.