This paper analyzes the impact of hardware multithreading support on the performance of distribute share -memory (DSM) multiprocessors
built out of heterogeneous, single-chip computing nodes. Area-efficiency arguments motivate a heterogeneous, hierarchical
organization (HDSM) consisting of few processors with extensive support for instruction-level parallelism an large caches,
an a larger number of simpler processors with smaller caches for efficient execution of thread- parallel code. Such heterogeneous
machine relies on the execution of multiple threads per processor to deliver high performance for unmoified applications.
This paper quantitatively studies the performance of HDSMs for software-based an hardware-multithreade scenarios.The simulation-based
experiments in this paper consider a 16-node multiprocessor, six homogeneous shared-memory benchmarks from the SPLASH- 2 suite,
an a decision-support application (C4.5).Simulation results show that a hardware-based, block-multithreade HDSM configuration
outperforms a software-multithreaded counterpart, on average, by 13%.
This work was partially funde by the National Science Foundation under grants CCR-9970728 an EIA-9975275.Renato Figueiredo
is also supporte by a CAPES scholarship.