This work describes a fully scalable hardware architecture for modular multiplication which is efficient for an arbitrary
bit length. This solution uses a systolic array implementation and can be used for arbitary precision without any modification.
This notion of scalability includes both, freedom in choice of operand precision as well as adaptability to any desired gate
complexity. We present modular exponentiation based on Montgomery’s method without any modular reduction achieving the best
possible bound according to C. Walter. Even more, this tight bound appeared to be practical in our architecture. The described
systolic array architecture is unique, being scalable in several parameters and resulting in a class of exponentiation engines.
The data provided in the figures and tables are believed to be new, providing a practical dimension of this work.
Keywords Montgomery multiplication - modular exponentiation - systolic array - performance model - scalability