A digit-serial, multiplier-accumulator based cryptographic co-processor architecture is proposed, similar to fix-point DSP

s with enhancements, supporting long modular arithmetic and general computations. Several new

column-sum

variants of popular quadratic time modular multiplication algorithms are presented (Montgomery and interleaved division-reduction with or without Quisquater scaling), which are faster than the traditional implementations, need no or very little memory beyond the operand storage and perform squaring about twice faster than general multiplications or modular reductions. They provide similar advantages in software for general purpose CPU

s.
Keywords: Computer arithmetic, cryptography, modular multiplication, Modular reduction, Montgomery multiplication, Quisquater multiplication, optimization, multiply-accumulate architecture, reciprocal.
This revised version was published online in September 2004.
It contains changes to the mathematical formulae where the <= signs did not appear.