The performance of microprocessors has increased exponentially for over 35 years. However, process technology challenges,
chip power constraints, and difficulty in extracting instruction-level parallelism are conspiring to limit the performance
of future individual processors. To address these limits, the computer industry has embraced chip multiprocessing (CMP), predominately
in the form of multiple high-performance superscalar processors on the same die. We explore the trade-off between building
CMPs from a few high-performance cores or building CMPs from a large number of lower-performance cores and argue that CMPs
built from a larger number of lower-performance cores can provide better performance and performance/Watt on many commercial
workloads. We examine two multi-threaded CMPs built using a large number of processor cores: Sun’s Niagara and Niagara 2 processors.
We also explore the programming issues for CMPs with large number of threads. The programming model for these CMPs is similar
to the widely used programming model for symmetric multiprocessors (SMPs), but the greatly reduced costs associated with communication
of data through the on-chip shared secondary cache allows for more fine-grain parallelism to be effectively exploited by the
CMP. Finally, we present performance comparisons between Sun’s Niagara and more conventional dual-core processors built from
large superscalar processor cores. For several key server workloads, Niagara shows significant performance and even more significant
performance/Watt advantages over the CMPs built from traditional superscalar processors.
Keywords Chip multiprocessing - multithreading - performance - parallel programming