View Related Documents

Abstract

When running parallel programs on clusters of individual computers or workstations, network communication is often the performance bottleneck. Since the round-trip time for a network packet is orders of magnitude larger than the amount of time it takes for an equivalent amount of data to be transferred from memory, methods which reduce network usage can result in significant performance improvements for parallel programs.
This work demonstrates that broadcast performance can be improved by a significant factor using a portable reliable multicasting protocol compared to unicasting, which is typically used. Our end-product is an MPICH patch that does not require kernel modification. It is therefore portable to any UNIX-based system. MPICH is a popular, portable MPI implementation provided by Argonne National Laboratories (ANL). Since absolute reliability is critical for data integrity when broadcasting messages on clusters, our multicasting protocol also addresses reliability issues.
Supported by the Independent Research and Development Fund at the Advanced Research Laboratories at the University of Texas at Austin, USA
Most of this work was done while Dr. Elster was an adjunct faculty member at the Department of Electrical and Computer Engineering at the University of Texas at Austin, USA

Fulltext Preview

Image of the first page of the fulltext document