In this paper we provide quantitative information about the performance differences between the OpenMP and the MPI version
of a large-scale application benchmark suite, SPECseis. We have gathered extensive performance data using hardware counters
on a 4-processor Sun Enterprise system. For the presentation of this information we use a Speedup Component Model, which is able to precisely show the impact of various overheads on the program speedup. We have found that overall, the
performance figures of both program versions match closely. However, our analysis also shows interesting differences in individual
program phases and in overhead categories incurred. Our work gives initial answers to a largely unanswered research question:
what are the sources of inefficiencies of OpenMP programs relative to other programming paradigms on large, realistic applications.
Our results indicate that the OpenMP and MPI models are basically performance-equivalent on shared-memory architectures. However,
we also found interesting differences in behavioral details, such as the number of instructions executed, and the incurred
memory latencies and processor stalls.
This work was supported i part by NSF grants #9703180-CCR and #9872516-EIA. This work is to necessarily representative of
the positions or policies of the U.S. Government.