Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
|
 |
Design and Analysis of Algorithms for Shared-Memory Multiprocessors
| |
|
Design and Analysis of Algorithms for Shared-Memory Multiprocessors
Charles E. Leiserson7
| (7) |
MIT Laboratory for Computer Science, USA |
Abstract
Shared-memory multiprocessors feature parallelism and a steep cache hierarchy, two salient characteristics that distinguish
them from the commodity processors of ten years ago. Both of these characteristics can be exploited effectively using the
same general strategy: divide-and-conquer recursion. This talk overviews the Cilk multithreaded programming language being
developed in the MIT Laboratory for Computer Science, which allows a programmer to exploit parallelism through divide-and-conquer.
In addition, I show how divide-and-conquer allows caches to be used effectively.
In the first part of my talk, I introduce the Cilk programming language. Cilk minimally extends the C programming language
to allow interactions among computational threads to be specified in a simple and highlevel fashion. Cilk’s provably efficient
runtime system dynamically maps a user’s program onto available physical resources, freeing the programmer from concerns of
communication protocols and load balancing. In addition, Cilk provides an abstract performance model that a programmer can
use to predict the multiprocessor performance of his application from its execution on a single processor. Not only do Cilk
programs scale up to run efficiently on multiple processors, but unlike existing parallel-programming environments, such as
MPI and HPF, Cilk programs “scale down”: the efficiency of a Cilk program on one processor rivals that of a comparable C program.
I illustrate Cilk programming through the example of divide-and-conquer matrix multiplication.
The second part of my talk presents a strategy for designing algorithms to exploit multiple levels of caching effectively.
Unlike previous algorithms, these algorithms are “cache oblivious”: no variables dependent on hardware parameters, such as
cache size and cache-line length, need to be tuned to achieve optimality. Nevertheless, I show that cache-oblivious algorithms
can be developed that use an optimal amount of work and move data optimally among multiple levels of cache. Problems that
can be solved efficiently by cache-oblivious algorithms include matrix multiplication, FFT, sorting, and matrix transpose,
all of which are solved in a divide-and-conquer fashion.
Together, these two technologies provide a foundation for the design and analysis of efficient algorithms for shared-memory
multiprocessors.
Fulltext Preview (Small, Large)
|
|
|
|
|
|