Understanding and optimizing the synchronization operations of parallel programs in distributed shared memory multiprocessors
(
dsm), is one of the most important factors leading to significant reductions in execution time.
This paper introduces a new methodology for tuning performance of parallel programs. We focus on the critical sections used
to assure exclusive access to critical resources and data structures, proposing a specific dynamic characterization of every
critical section in order to a) measure the lock contention, b) measure the degree of data sharing in consecutive executions,
and c) break down the execution time, reflecting the different overheads that can appear. All the required measurements are
taken using a multiprocessor simulator with a detailed timing model of the processor and memory system.
We propose also a static classification of critical sections that takes into account how locks are associated with their protected
data. The dynamic characterization and the static classification are correlated to identify key critical sections and infer
code optimization opportunities (e.g. data layout), which when applied can lead to significant reductions in execution time
(up to 33 % in the SPLASH-2 scientific benchmark suite). By using the simulator we can also evaluate whether the performance
of the applied code optimizations is sensitive to common hardware optimizations or not.
This work was supported in part by Diputación General de Aragón grant “gaZ: Grupo Consolidado de Investigación”, Spanish Ministry
of Education and Science grants TIN2007-66423, TIN2007-60625, Consolider CSD2007-00050, and the european HiPEAC-2 NoE.