In this paper, we present an advanced multiprocessor cache architecture for chip multiprocessors (CMPs). It is designed for
the scalable GigaNetIC CMP, which is based on massively parallel on-chip computing clusters. Our write-through multiprocessor
cache is configurable in respect to the most relevant design options. It is supposed to be used in universal co-proc essors
as well as in network processing units. For an early verification of the software and an early exploration of various hardware
configurations, we have developed a SystemC-based simulation model for the complete chip multiproc essor. For detailed hardware-software
co-verification, we use our FPGA-based rapid prototyping system RAPTOR2000 to emulate our architecture with near-ASIC performance.
Finally, we demonstrate the performance gains for different application scenarios enabled by the usage of our multiprocessor
cache.