Volume 17, Number 2, 107-130, DOI: 10.1007/s00446-003-0104-x

Finding missing synchronization in a distributed computation using controlled re-execution

Neeraj Mittal and Vijay K. Garg

View Related Documents

Abstract

Correct distributed programs are hard to write. Not surprisingly, distributed systems are especially vulnerable to software faults. Testing and debugging is an important way to improve the reliability of distributed systems. A distributed debugger equipped with the mechanism to re-execute the traced computation in a controlled fashion can greatly facilitate the detection and localization of bugs. This approach gives rise to a general problem of predicate control, which takes a computation and a safety property specified on the computation as inputs, and produces a controlled computation, with added synchronization, that maintains the given safety property as output. We devise efficient control algorithms for two classes of useful predicates, namely region predicates and disjunctive predicates. For the former, we prove that the control algorithm is optimal in the sense that it guarantees maximum concurrency possible in the controlled computation. For the latter, we prove that our control algorithm generates the least number of synchronization dependencies and therefore has optimal message-complexity. Furthermore, we provide a necessary and sufficient condition under which it is possible to efficiently compute a minimal controlling synchronization for a general predicate. We also give an algorithm to compute such a synchronization under the condition provided.

Keywords:  Distributed system - Debugging - Software-fault tolerance - Controlled re-execution - Predicate control

Received: 19 June 2002, Accepted: 7 November 2003, Published online: 1 March 2004
Vijay K. Garg: Supported in part by the NSF Grants ECS-9907213, CCR-9988225, Texas Education Board Grant ARP-320, an Engineering Foundation Fellowship, and an IBM grant.
A preliminary version of the results in this paper first appeared in [15].

Fulltext Preview

Image of the first page of the fulltext document