Several recent studies have established that most system outages are due to software faults. Given the ever increasing complexity
of software and the well-developed techniques and analysis for hardware reliability, this trend is not likely to change in
the near future. In this paper, we classify software faults and discuss various techniques to deal with them in the testing/debugging
phase and the operational phase of the software.We discuss the phenomenon of software aging and a preventive maintenance technique
to deal with this problem called software rejuvenation. Stochastic models to evaluate the effectiveness of preventive maintenance
in operational software systems and to determine optimal times to perform rejuvenation for different scenarios are described.
We also present measurement-based methodologies to detect software aging and estimate its effect on various system resources.
These models are intended to help develop software rejuvenation policies. An automated online measurement-based approach has
been used in the software rejuvenation agent implemented in a major commercial server.