In this paper, we have addressed the complex problem of recovery for concurrent failures in cluster computing environment.
We have proposed a new approach in which we have dealt with both inter cluster orphan and lost messages unlike the existing
works.The proposed recovery approach is free from the domino-effect and hence guarantees the least amount of re-computation
after recovery. Besides, a process needs to save only its recent local checkpoint, which is also the case for a cluster. So
number of trips to stable storage per process is always one during recovery. The proposed common check pointing interval is
such that it enables a process to log the minimum number of messages it has sent. These features make our approach superior
to the existing works.