We address the problem of systematically designing correct parallel programs and developing their efficient implementations
on parallel machines. The design process starts with an intuitive, sequential algorithm and proceeds by expressing it in terms
of well-defined, pre-implemented parallel components called skeletons. We demonstrate the skeleton-based design process using the tridiagonal system solver as our example application. We develop
step by step three provably correct, parallel versions of our application, and finally arrive at a cost-optimal implementation
in MPI (Message Passing Interface). The performance of our solutions is demonstrated experimentally on a Cray T3E machine.