Cache performance significantly influences the computation power of modern processors. With the trend of microprocessor design
for both general use and embedded systems towards chip-multiple, cache performance becomes more important because an off-chip
access is rather expensive in comparison with on-chip references. This means cache locality optimization remains a hot research
area for the next generation of computer architectures.
In this paper we present a tool environment aiming at providing the programmers sufficient support in the task of optimizing
source codes for better runtime cache behavior. This environment contains a set of tools ranging from profiling, analysis,
and simulation tools for gathering performance data, to visualization tools for graphical presentation and platforms for program
development. Together, these tools establish a feedback loop for tuning cache performance on current and emerging uniprocessor
and multiprocessor systems.