Graph partitioning is an important subproblem in many applications. To solve it efficiently, the multilevel strategy in combination
with a matching algorithm and a local refinement heuristic has proven to be a powerful method, and several libraries exist
providing such an implementation. Due to the large involvement of heuristics, the evaluation of these libraries is usually
based on experiments. In this paper we show that single experiments are usually not sufficient to judge the quality of an
algorithm, since even results obtained for graphs of and identical structure show high variations. This is still true, even
if the applied algorithms do not contain any nondeterminism. We propose a scheme that considers these variations and therefore
makes evaluations and comparisons of different implementations more meaningful. We have applied this technique to evaluate
the improvements of the Helpful-Set 2-partitioning implementation and present the obtained results.
This work was partly supported-by the German Science Foundation (DFG) project SFB-376.