The performance of a parallel system with NUMA characteristics depends on the efficient use of local memory accesses. Programming
and tool environments for such DSM systems should enable and exploit data locality.
In this paper we present an event-driven hybrid monitoring concept for the SMiLE SCI-based PC cluster. The central part of
the hardware monitor consists of a content-addressable counter array managing a small working set of the most recently referenced
memory regions. We show that this approach allows to provide detailed run-time information which can be exploited by performance
evaluation and debugging tools.