We present the design and implementation of an infrastructure that enables monitoring of resources, services, and applications
in a computational grid and provides a toolkit to help manage these entities when faults occur. This infrastructure builds
on three basic monitoring components: sensors to perform measurements, actuators to perform actions, and an event service
to communicate events between remote processes. We describe how we apply our infrastructure to support a grid service and
an application: (1) the Globus Metacomputing Directory Service; and (2) a long-running and coarse-grained parameter study
application. We use these application to show that our monitoring infrastructure is highly modular, conveniently retargettable,
and extensible.