The Modified Gram-Schmidt (MGS) orthogonalization process — used for example in the Arnoldi algorithm — constitutes often
the bottleneck that limits parallel efficiencies. Indeed, a number of communications, proportional to the square of the problem
size, is required to compute the dot-products. A block formulation is attractive but it suffers from potential numerical instability.
In this paper, we address this issue and propose a simple procedure that allows the use of a Block Gram-Schmidt algorithm
while guaranteeing a numerical accuracy similar to MGS. The main idea is to dynamically determine the size of the blocks.
The main advantage of this dynamic procedure are two-folds: first, high performance matrix-vector multiplications can be used
to decrease the execution time. Next, in a parallel environment, the number of communications is reduced. Performance comparisons
with the alternative Iterated CGS also show an improvement for moderate number of processors.