We investigate the performance benefits of a novel recursive formulation of Strassen’s algorithm over highly tuned matrix-multiply
(MM) routines, such as the widely used ATLAS for high-performance systems.
We combine Strassen’s recursion with high-tuned version of ATLAS MM and we present a family of recursive algorithms achieving
up to 15% speed-up over ATLAS alone. We show experimental results for 7 different systems.
Keywords dense kernels - matrix-matrix product - performance optimizations