The use of highly optimized inner kernels is of paramount importance for obtaining efficient numerical algorithms. Often,
such kernels are created by hand. In this paper, however, we present an alternative way to produce efficient matrix multiplication
kernels based on a set of simple codes which can be parameterized at compilation time. Using the resulting kernels we have
been able to produce high performance sparse and dense linear algebra codes on a variety of platforms.
This work was supported by the Ministerio de Ciencia y Tecnología of Spain (TIN2004-07739-C02-01).