Lecture Notes in Computer Science, 2005, Volume 3769/2005, 137-147, DOI: 10.1007/11602569_18

Supporting MPI-2 One Sided Communication on Multi-rail InfiniBand Clusters: Design Challenges and Performance Benefits

Abhinav Vishnu, Gopal Santhanaraman, Wei Huang, Hyun-Wook Jin and Dhabaleswar K. Panda

View Related Documents

Abstract

In cluster computing, InfiniBand has emerged as a popular high performance interconnect with MPI as the de facto programming model. However, even with InfiniBand, bandwidth can become a bottleneck for clusters executing communication intensive applications. Multi-rail cluster configurations with MPI-1 are being proposed to alleviate this problem. Recently, MPI-2 with support for one-sided communication is gaining significance. In this paper, we take the challenge of designing high performance MPI-2 one-sided communication on multi-rail InfiniBand clusters. We propose a unified MPI-2 design for different configurations of multi-rail networks (multiple ports, multiple HCAs and combinations). We present various issues associated with one-sided communication such as multiple synchronization messages, scheduling of RDMA (Read, Write) operations, ordering relaxation and discuss their implications on our design. Our performance results show that multi-rail networks can significantly improve MPI-2 one-sided communication performance. Using PCI-Express with two-ports, we can achieve a peak MPI_Put bidirectional bandwidth of 2620 Million Bytes/s, compared to 1910 MB/s for single-rail implementation. For PCI-X with two HCAs, we can almost double the throughput and reduce the latency to half for large messages.
This research is supported in part by Department of Energy’s grant #DE-FC02-01ER25506; National Science Foundation’s grants #CNS-0403342 and #CCR-0311542; grants from Intel and Mellanox; and equipment donations from Intel, Mellanox, AMD, Apple and Sun Microsystems.

Fulltext Preview

Image of the first page of the fulltext document