Most parallel computations require the exchange of data between processing elements. One of important basic communication
operations is all-reduce, a variation of the reduction operation. This paper presents an all-reduce communication operation
scheme using all-to-all broadcast communication pattern. All-to-all broadcast is the operation in which each processor sends
its message to all other processors, and receives messages from all other processors in the system. In this paper, we develop
an efficient all-reduce operation scheme in a star network topology with the single-port communication capability. Communication
time is compared against known broadcasting schemes to verify the efficiency of the suggested scheme.
Keywords all-reduce - all-to-all broadcast - distributed memory parallel computing systems - inter-processor communication - star network