We proposed a fast parallel algorithm of Montgomery multiplication based on Residue Number Systems (RNS). An implementation
of RSA cryptosystem using the RNS Montgomery multiplication is described in this paper. We discuss how to choose the base
size of RNS and the number of parallel processing units. An implementation method using the Chinese Remainder Theorem (CRT)
is also presented. An LSI prototype adopting the proposed Cox-Rower Architecture achieves 1024-bit RSA transactions in 4.2
msec without CRT and 2.4 msec with CRT, when the operating frequency is 80 MHz and the total number of logic gates is 333
KG for 11 parallel processing units.
Keywords RSA cryptography – residue number systems – Montgomery multiplication – modular exponentiation