OpenMP is emerging as a viable high-level programming model for shared memory parallel systems. Although it has also been
implemented on ccNUMA architectures, it is hard to obtain high performance on such systems. In this paper, we discuss various
ways in which OpenMP may be used on ccNUMA and NUMA architectures, and describe a programming style that can provide scalable
high performance on such systems. We give an example of its use on the SGI Origin 2000, and on TreadMarks, a Software DSM
system from Rice University. These results have encouraged us to work on a programming environment that provides general support
for OpenMP application development and incorporates a system to translate standard loop-level parallel OpenMP code, with additionaluser
input in the form of directives, into an equivalent OpenMP program relying on our alternative programming style. The equivalent
program does not use constructs external to OpenMP.
Keywords shared memory parallel programming - OpenMP - ccNUMA - architectures - restructuring - data locality - data distribution - software distributed - shared memory - programming environments
This work was partially supported by the DOE under the Los Alamos Computer Science Institute and by NSF under grant number
NSF ACI 99-82160. These sources of support are gratefully acknowledged.