Workflow technology is an emerging paradigm for systematic modeling and orchestration of job flow for enterprise and scientific
applications. This paper introduces BPEL4Job, a BPEL-based design for fault handling of job flow in a distributed computing
environment. The features of the proposed design include: a two-stage approach for job flow modeling that separates base flow
structure from fault-handling policy, a generic job proxy that isolates the interaction complexity between the flow engine
and the job scheduler, and a method for migrating flow instances between different flow engines for fault handling in a distributed
system. An implementation of the design based on a set of industrial products from IBM is presented and validated using a
Montage application.