The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much
closer to the ideal PRAM parallel computer. In this paper, we develop new techniques for designing a uniform shared-memory
algorithm from a PRAM algorithm and present the results of an extensive experimental study demonstrating that the resulting
programs scale nearly linearly across a significant range of processors (from 1 to 64) and across the entire range of instance
sizes tested. This linear speedup with the number of processors is, to our knowledge, the first ever attained in practice
for intricate combinatorial problems. The example we present in detail here is a graph decomposition algorithm that also requires
the computation of a spanning tree; this problem is not only of interest in its own right, but is representative of a large
class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms,
but have no known efficient parallel implementations. Our results thus offer promise for bridging the gap between the theory
and practice of shared-memory parallel algorithms. nt[mis|Supported in part by NSF CAREER 00-93039, NSF ITR 00-81404, NSF
DEB 99-10123, and DOE CSRI-14968
Supported in part by NSF ITR 00-81404
Supported by an NSF Research Experience for Undergraduates (REU)