Heterogeneous multi-core processors invest the most significant portion of their transistor budget in customized “accelerator”
cores, while using a small number of conventional low-end cores for supplying computation to accelerators. To maximize performance
on heterogeneous multi-core processors, programs need to expose multiple dimensions of parallelism simultaneously. Unfortunately,
programming with multiple dimensions of parallelism is to date an ad hoc process, relying heavily on the intuition and skill
of programmers. Formal techniques are needed to optimize multi-dimensional parallel program designs. We present a model of
multi-dimensional parallel computation for steering the parallelization process on heterogeneous multi-core processors. The
model predicts with high accuracy the execution time and scalability of a program using conventional processors and accelerators
simultaneously. More specifically, the model reveals optimal degrees of multi-dimensional, task-level and data-level concurrency,
to maximize performance across cores. We use the model to derive mappings of two full computational phylogenetics applications
on a multi-processor based on the IBM Cell Broadband Engine.