The ability to cooperate on common tasks in a distributed setting is key to solving a broad range of computation problems ranging from distributed search such as SETI to distributed simulation and multi-agent collaboration.
Do-All, an abstraction of such cooperative activity, is the problem of performing
N tasks in a distributed system of
P failure-prone processors. Many distributed and parallel algorithms have been developed for this problem and several algorithm simulations have been developed by iterating
Do-All algorithms. The efficiency of the solutions for
Do-All is measured in terms of
work complexity where all processing steps taken by all processors are counted. Work is ideally expressed as a function of
N,
P,
and f,
the number of processor crashes. However the known lower bounds and the upper bounds for extant algorithms do not adequately show how work depends on
f. We present the first non-trivial lower bounds for
Do-All that capture the dependence of work on
N,
P and f. For the model of computation where processors are able to make perfect load-balancing decisions locally, we also present matching upper bounds. We define the
r-iterative Do-All problem that abstracts the repeated use of
Do-All such as found in typical algorithm simulations. Our
f-sensitive analysis enables us to derive tight bounds for
r-iterative Do-All work (that are stronger than the
r-fold work complexity of a single
Do-All). Our approach that models perfect load-balancing allows for the analysis of specific algorithms to be divided into two parts: (
i) the analysis of the cost of tolerating failures while performing work under

free

load-balancing, and (
ii) the analysis of the cost of implementing load-balancing. We demonstrate the utility and generality of this approach by improving the analysis of two known efficient algorithms. We give an improved analysis of an efficient message-passing algorithm. We also derive a tight and complete analysis of the best known
Do-All algorithm for the synchronous shared-memory model. Finally we present a new upper bound on simulations of synchronous shared-memory algorithms on crash-prone processors.
Received: 15 May 2002, Accepted: 15 June 2003, Published online: 6 February 2004This work is supported in part by the NSF TOC Grants 9988304 and 0311368, and the NSF ITR Grant 0121277. The work of the second author is supported in part by the NSF CAREER Award 0093065. The work of the third author is supported in part by the NSF CAREER Award 9984778.