Implementation of Strassen's Algorithm for Matrix Multiplication
#1

Abstract
In this paper we report on the development of an ecient and portable implementation
of Strassen's matrix multiplication algorithm. Our implementation is designed to be used in
place of DGEMM, the Level 3 BLAS matrix multiplication routine. Ecient performance
will be obtained for all matrix sizes and shapes and the additional memory needed for tem-
porary variables has been minimized. Replacing DGEMM with our routine should provide
a signi cant performance gain for large matrices while providing the same performance for
small matrices. We measure performance of our code on the IBM RS/6000, CRAY YMP
C90, and CRAY T3D single processor, and o er comparisons to other codes. Our perfor-
mance data recon rms that Strassen's algorithm is practical for realistic size matrices. The
usefulness of our implementation is demonstrated by replacing DGEMM with our routine in
a large application code.
Keywords: matrix multiplication, Strassen's algorithm, Winograd variant, Level 3 BLAS
1 Introduction
The multiplication of two matrices is one of the most basic operations of linear algebra and
scienti c computing and has provided an important focus in the search for methods to speed
up scienti c computation. Its central role is evidenced by its inclusion as a key primitive
operation in portable libraries, such as the Level 3 BLAS [7], where it can then be used as
a building block in the implementation of many other routines, as done in LAPACK [1].
Thus, any speedup in matrix multiplication can improve the performance of a wide variety
of numerical algorithms.
Much of the e ort invested in speeding up practical matrix multiplication implementa-
tions has concentrated on the well-known standard algorithm, with improvements seen when
the required inner products are computed in various ways that are better-suited to a given
machine architecture. Much less e ort has been given towards the investigation of alterna-
tive algorithms whose asymptotic complexity is less than the (m3) operations required by
the conventional algorithm to multiply m  m matrices. One such algorithm is Strassen's
algorithm, introduced in 1969 [19], which has complexity (mlg(7)), where lg(7)  2:807 and
lg(x) denotes the base 2 logarithm of x.
Strassen's algorithm has long su ered from the erroneous assumptions that it is not
ecient for matrix sizes that are seen in practice and that it is unstable. Both of these
assumptions have been questioned in recent work. By stopping the Strassen recursions early
and performing the bottom-level multiplications using the traditional algorithm, competitive
performance is seen for matrix sizes in the hundreds in Bailey's FORTRAN implementation
on the CRAY 2 [2], Douglas, et al.'s [8] C implementation of the Winograd variant of
Strassen's algorithm on various machines, and IBM's ESSL library routine [16]. In addition,
the stability analyses of Brent [4] and then Higham [11, 12] show that Strassen's algorithm
is stable enough to be studied further and considered seriously in the development of high-
performance codes for matrix multiplication.
A useful implementation of Strassen's algorithm must rst eciently handle matrices of
arbitrary size. It is well known that Strassen's algorithm can be applied in a straightforward
fashion to square matrices whose order is a power of two, but issues arise for matrices that
are non-square or those having odd dimensions. Second, establishing an appropriate cuto
criterion for stopping the recursions early is crucial to obtaining competitive performance on
matrices of practical size. Finally, excessive amounts of memory should not be required to
store temporary results. Earlier work addressing these issues can be found in
In this paper we report on our development of a general, ecient, and portable imple-
mentation of Strassen's algorithm that is usable in any program in place of calls to DGEMM,
the Level 3 BLAS matrix multiplication routine. Careful consideration has been given to all
of the issues mentioned above. Our analysis provides an improved cuto condition for rect-
angular matrices, a demonstration of the viability of dynamic peeling, a simple technique for
dealing with odd matrix dimensions that had previously been dismissed [8], and a reduction
in the amount of memory required for temporary variables.
We measure performance of our code on the IBM RS/6000, CRAY YMP C90, and CRAY
T3D single processor and examine the results in several ways. Comparisons with machine-
speci c implementations of DGEMM recon rm that Strassen's algorithm can provide an
improvement over the standard algorithm for matrices of practical size. Timings of our code
using several di erent cuto criteria are compared, demonstrating the bene ts of our new
technique. Comparisons to the Strassen routine in the ESSL RS/6000 and the CRAY C90
libraries and the implementation of Douglas, et al., show that competitive performance can
be obtained in a portable code that uses the previously untried dynamic peeling method
for odd-sized matrices. This is especially signi cant since for certain cases our memory
requirements have been reduced by 40 to more than 70 percent over these other codes.
The remainder of this paper is organized as follows. Section 2 reviews Strassen's al-
gorithm. In Section 3 we describe our implementation and address implementation issues
related to cuto , odd dimensions, and memory usage. Performance of our implementation is
examined in Section 4, where we also report on using our Strassen code for the matrix mul-
tiplications in an eigensolver application. We o er a summary and conclusions in Section 5.

DOWNLOAD FULL REPORT
http://javacalculus.googlecodefiles/Stra...orithm.pdf
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: strassen matrix multiplication c program diagram, flowchart of booth s multiplication alogrithm, matrix multiplication algorithm and flowchart, jblas matrix multiplication performance java versus c matlab, matrix multiplication in excel, cray 1, the result in multiplication,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  An Efficient Algorithm for Mining Frequent Patterns full report project topics 3 4,748 01-10-2016, 10:02 AM
Last Post: Guest
  watermarking algorithm seminar class 3 2,674 27-04-2016, 11:17 AM
Last Post: dhanabhagya
  DYNAMIC SEARCH ALGORITHM IN UNSTRUCTURED PEER-TO-PEER NETWORKS--PARALLEL AND DISTRIBU electronics seminars 9 7,351 14-07-2015, 02:25 PM
Last Post: seminar report asees
  Design and Implementation of TARF: A Trust-Aware Routing Framework for WSNs Projects9 6 3,546 10-01-2015, 11:13 PM
Last Post: Guest
  TEA ENCRYPTION (ALGORITHM) computer science technology 1 2,656 11-11-2014, 10:45 AM
Last Post: Guest
  Persuasive Cued Click-Points: Design, Implementation, and Evaluation of a Knowledge-B Projects9 3 3,018 15-04-2013, 11:14 AM
Last Post: computer topic
  IMPLEMENTATION OF IMAGE STEGANOGRAPHY USING LSB TECHINQUE seminar presentation 6 5,766 04-02-2013, 04:49 PM
Last Post: seminar details
  Cooperative Caching in Wireless P2P Networks: Design, Implementation, and Evaluation seminar class 2 3,312 02-02-2013, 02:08 PM
Last Post: seminar details
Wink Implementation of a simple device mediation application of a Network Management Syste computer science crazy 2 2,159 06-12-2012, 01:57 PM
Last Post: seminar details
  A TABU SEARCH ALGORITHM FOR CLUSTER BUILDING IN WIRELESS SENSOR NETWORKS- MOBILE COMP electronics seminars 1 2,052 01-12-2012, 12:06 PM
Last Post: seminar details

Forum Jump: