Lecture Notes in Computer Science, 2004, Volume 3149/2004, 163-172, DOI: 10.1007/978-3-540-27866-5_21

Collective Communication Performance Analysis Within the Communication System

Lars Ailo Bongo, Otto J. Anshus and John Markus Bjørndalen

View Related Documents

Abstract

We describe an approach and tools for optimizing collective operation spanning tree performance. The allreduce operation is analyzed using performance data collected at a lower level than by traditional monitoring systems. We calculate latencies and wait times to detect load balance problems, find subtrees with similar behavior, do cost breakdown, and compare the performance of two spanning tree configurations. We evaluate the performance of different configurations and mappings of allreduce run on clusters of different size and with different number of CPUs per host. We achieve a speedup of up to 1.49 for allreduce. Monitoring overhead is low, and the analysis is simplified since many subtrees have similar behavior. However, the calculated values have large variations, and reconfiguration may affect unchanged parts.

Fulltext Preview

Image of the first page of the fulltext document