The dependability of a system can be experimentally evaluated at different phases of its life cycle. In the design phase, computer-aided design (CAD) environments are used to evaluate the design via simulation, including simulated fault injection.
Such fault injection tests the effectiveness of fault-tolerant mechanisms and evaluates system dependability, providing timely
feedback to system designers. Simulation, however, requires accurate input parameters and validation of output results. Although
the parameter estimates can be obtained from past measurements, this is often complicated by design and technology changes.
In the prototype phase, the system runs under controlled workload conditions. In this stage, controlled physical fault injection is used to evaluate
the system behavior under faults, including the detection coverage and the recovery capability of various fault tolerance
mechanisms. Fault injection on the real system can provide information about the failure process, from fault occurrence to
system recovery, including error latency, propagation, detection, and recovery (which may involve reconfiguration). But this
type of fault injection can only study artificial faults; it cannot provide certain important dependability measures, such
as mean time between failures (MTBF) and availability. In the operational phase, a direct measurement-based approach can be used to measure systems in the field under real workloads. The collected data
contain a large amount of information about naturally occurring errors/failures.