Empirical optimizers like ATLAS have been very effective in optimizing computational kernels in libraries. The best choice
of parameters such as tile size and degree of loop unrolling is determined by executing different versions of the computation.
In contrast, optimizing compilers use a model-driven approach to program transformation. While the model-driven approach of
optimizing compilers is generally orders of magnitude faster than ATLAS-like library generators, its effectiveness can be
limited by the accuracy of the performance models used. In this paper, we describe an approach where a class of computations
is modeled in terms of constituent operations that are empirically measured, thereby allowing modeling of the overall execution
time. The performance model with empirically determined cost components is used to perform data layout optimization in the
context of the Tensor Contraction Engine, a compiler for a high-level domain-specific language for expressing computational
models in quantum chemistry. The effectiveness of the approach is demonstrated through experimental measurements on some representative
computations from quantum chemistry.
Supported in part by the National Science Foundation through the Information Technology Research program (CHE-0121676 and
CHE-0121706), by NSF grant CCF-0073800 and by a grant from the Environmental Protection Agency.