Lecture Notes in Computer Science, 1999, Volume 1685/1999, 1128-1135, DOI: 10.1007/3-540-48311-X_158

A Stable and Efficient Parallel Block Gram-Schmidt Algorithm

Denis Vanderstraeten

View Related Documents

Abstract

The Modified Gram-Schmidt (MGS) orthogonalization process — used for example in the Arnoldi algorithm — constitutes often the bottleneck that limits parallel efficiencies. Indeed, a number of communications, proportional to the square of the problem size, is required to compute the dot-products. A block formulation is attractive but it suffers from potential numerical instability.
In this paper, we address this issue and propose a simple procedure that allows the use of a Block Gram-Schmidt algorithm while guaranteeing a numerical accuracy similar to MGS. The main idea is to dynamically determine the size of the blocks. The main advantage of this dynamic procedure are two-folds: first, high performance matrix-vector multiplications can be used to decrease the execution time. Next, in a parallel environment, the number of communications is reduced. Performance comparisons with the alternative Iterated CGS also show an improvement for moderate number of processors.

Fulltext Preview

Image of the first page of the fulltext document