As part of the recent focus on increasing the productivity of parallel application developers, Co-array Fortran (CAF) has
emerged as an appealing alternative to the Message Passing Interface (MPI). CAF belongs to the family of global address space
parallel programming languages; such languages provide the abstraction of globally addressable memory accessed using one-sided
communication. At Rice University we are developing caf c, an open source, multiplatform CAF compiler. Our earlier studies
show that caf c-compiled CAF programs achieve similar performance to that of corresponding MPI codes for the NAS Parallel
Benchmarks. In this paper, we present a study of several CAF implementations of Sweep3D on four modern architectures. We analyze
the impact of using one-sided communication in Sweep3D, identify potential sources of inefficiencies and suggest ways to address
them. Our results show that we achieve comparable performance to that of the MPI version on three cluster-based architectures
and outperform it by up to 10 % on the SGI Altix 3000.
This work was supported in part by the Department of Energy under Grant DE-FC03-01ER25504/A000, the Los Alamos Computer Science
Institute (LACSI) through LANL contract number 03891-99-23 as part of the prime contract (W-7405-ENG-36) between the DOE and
the Regents of the University of California, Texas Advanced Technology Program under Grant 003604-0059-2001, and Compaq Computer
Corporation under a cooperative research agreement. This research was performed in part using the Molecular Science Computing
Facility (MSCF) in the William R. Wiley Environmental Molecular Sciences Laboratory, a national scientific user facility sponsored
by the U.S. Department of Energy’s Office of Biological and Environmental Research and located at the Pacific Northwest National
Laboratory. Pacific Northwest is operated for the Department of Energy by Battelle. The computations were performed in part
on an Itanium cluster purchased with support from the NSF under Grant EIA-0216467, Intel, and Hewlett Packard and on the National
Science Foundation Terascale Computing System at the Pittsburgh Supercomputing Center.
Cristian Coarfa and Yuri Dotsenko contributed equally to this work.