Naive code generation from high-level languages that encourage modularity can give rise to large numbers of simple loops for
array-based programs. Collective loop fusion and array contraction can be used on such codes to improve temporal locality
and performance. The problem is typically formalised using a loop dependence graph (LDG), with solutions denoted by fusion partitions. Much previous work has concentrated on approaches to the abstract formulation. We present our technique called iterative collective loop fusion based on empirically evaluating different transformations, and show how it can provide speedups over existing approaches
of up to 1.38. We also give results showing that applying such techniques to high-level languages can provide speedups of
up to 2.45 over the original code, and outperforms an equivalent code in Fortran.