Due to the increasing gap between processor speed and memory access time, a large fraction of a program’s execution time is
spent in accesses to the various levels in the memory hierarchy. Hence, cache-aware programming is of prime importance. For
efficiently utilizing the memory subsystem, many architecture-specific characteristics must be taken into account: cache size,
replacement strategy, access latency, number of memory levels, etc.
In this paper, we present a simulator for the accurate performance prediction of sequential and parallel programs on shared
memory systems. It assists the programmer in locating the critical parts of the code that have the greatest impact on the
overall performance. Our simulator is based on the Latency-of-Data-Access Model that focuses on the modeling of the access times to different memory levels.
We describe the design of our simulator, its configuration and its usage in an example application.