We present a simulation-based performance model to analyze a parallel sparse LU factorization algorithm on modern cached-based,
high-end parallel architectures. We consider supernodal
right-looking parallel factorization on a bi-dimensional grid of processors, that uses static pivoting. Our model characterizes the algorithmic
behavior by taking into account the underlying processor speed, memory system performance, as well as the interconnect speed.
The model is validated using the implementation in the SuperLU_DIST linear system solver, the sparse matrices from real application,
and an IBM POWER3 parallel machine. Our modeling methodology can be adapted to study performance of other types of sparse
factorizations, such as Cholesky or QR, and on different parallel machines.
Keywords Parallel sparse factorization - Performance modeling - Distributed parallel machine