Student Seminar Report & Project Report With Presentation (PPT,PDF,DOC,ZIP)

Full Version: A Karatsuba-based Montgomery Multiplier
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Abstract
Modular multiplication of long integers is an important
building block for cryptographic algorithms. Although
several FPGA accelerators have been proposed for large modular
multiplication, previous systems have been based on O(N2)
algorithms. In this paper, we present a Montgomery multiplier
that incorporates the more efficient Karatsuba algorithm which is
O(N(log 3= log 2)). This system is parameterizable to different bitwidths
and makes excellent use of both embedded multipliers and
fine-grained logic. The design has significantly lower LUT-delay
product and multiplier-delay product compared with previous
designs. Initial testing on a Virtex-6 FPGA showed that it is
60-190 times faster than an optimized multi-threaded software
implementation running on an Intel Xeon 2.5 GHz CPU. The
proposed multiplier system is also estimated to be 95-189 times
more energy efficient than the software-based implementation.
This high performance and energy efficiency makes it suitable
for server-side applications running in a datacenter environment.
I. INTRODUCTION
Montgomery multiplication [1] is critical for many cryptographic
algorithms such as RSA, Digital Signature Algorithm
(DSA), Elliptic Curve DSA and other emerging cryptographic
algorithms, such as pairing-based systems. Despite improvements
in the clock frequency and the level of parallelism
of conventional microprocessors, software-based implementations
of Montgomery multiplication remain insufficient, both
in terms of performance and energy efficiency. This has led
to research investigating more efficient hardware accelerators.
We focus on Montgomery multiplication for long integers e.g.
such as required for operations over the finite field GF(p)
where p is a 512-bit prime number. These computations have
long carry chains and their implementations are fundamentally
different from systems that use composite fields with small
characteristic (e.g. GF(2167)).
Existing FPGA Montgomery multiplier implementations for
large integer systems have one major weakness: they all
use algorithms with O(N2) complexity i.e. the area and/or
runtime increases quadratically with the bit-width of the
computation. In this paper, we improve the implementation of
Montgomery multiplication of long integers at the algorithmic
level in order to lower complexity and increase throughput.
We have developed a parameterized Karatsuba multiplier using
a combination of multiple-precision and coarse-grained
carry-save addition techniques. This method has complexity
O(N(log 3= log 2)).
The major contributions of this work are:
 An FPGA-optimized design for irregular, recursive Karatsuba
multiplication that is parameterizable to different
bit-widths and a batch-pipelined Montgomery multiplier
based on the Karastuba multiplier. (Section III)
 We compare the performance and energy efficiency of the
proposed Montgomery multiplier with existing software
and hardware implementations. (Section IV)

Download full report
http://googleurl?sa=t&source=web&cd=2&ve...0_gary.pdf&ei=tDsxTqX0EqfkiAKljqmiBg&usg=AFQjCNEFJ2tezeXrewnSiQTo-FjNSPtfSw&sig2=uTg0tiaUeanreQNUzTQEpA