ASK HERE

seminar class · 02-05-2011, 04:42 PM

Abstract
In this paper we report on the development of an ecient and portable implementation
of Strassen's matrix multiplication algorithm. Our implementation is designed to be used in
place of DGEMM, the Level 3 BLAS matrix multiplication routine. Ecient performance
will be obtained for all matrix sizes and shapes and the additional memory needed for tem-
porary variables has been minimized. Replacing DGEMM with our routine should provide
a signicant performance gain for large matrices while providing the same performance for
small matrices. We measure performance of our code on the IBM RS/6000, CRAY YMP
C90, and CRAY T3D single processor, and oer comparisons to other codes. Our perfor-
mance data reconrms that Strassen's algorithm is practical for realistic size matrices. The
usefulness of our implementation is demonstrated by replacing DGEMM with our routine in
a large application code.
Keywords: matrix multiplication, Strassen's algorithm, Winograd variant, Level 3 BLAS
1 Introduction
The multiplication of two matrices is one of the most basic operations of linear algebra and
scientic computing and has provided an important focus in the search for methods to speed
up scientic computation. Its central role is evidenced by its inclusion as a key primitive
operation in portable libraries, such as the Level 3 BLAS [7], where it can then be used as
a building block in the implementation of many other routines, as done in LAPACK [1].
Thus, any speedup in matrix multiplication can improve the performance of a wide variety
of numerical algorithms.
Much of the eort invested in speeding up practical matrix multiplication implementa-
tions has concentrated on the well-known standard algorithm, with improvements seen when
the required inner products are computed in various ways that are better-suited to a given
machine architecture. Much less eort has been given towards the investigation of alterna-
tive algorithms whose asymptotic complexity is less than the (m3) operations required by
the conventional algorithm to multiply m m matrices. One such algorithm is Strassen's
algorithm, introduced in 1969 [19], which has complexity (mlg(7)), where lg(7) 2:807 and
lg(x) denotes the base 2 logarithm of x.
Strassen's algorithm has long suered from the erroneous assumptions that it is not
ecient for matrix sizes that are seen in practice and that it is unstable. Both of these
assumptions have been questioned in recent work. By stopping the Strassen recursions early
and performing the bottom-level multiplications using the traditional algorithm, competitive
performance is seen for matrix sizes in the hundreds in Bailey's FORTRAN implementation
on the CRAY 2 [2], Douglas, et al.'s [8] C implementation of the Winograd variant of
Strassen's algorithm on various machines, and IBM's ESSL library routine [16]. In addition,
the stability analyses of Brent [4] and then Higham [11, 12] show that Strassen's algorithm
is stable enough to be studied further and considered seriously in the development of high-
performance codes for matrix multiplication.
A useful implementation of Strassen's algorithm must rst eciently handle matrices of
arbitrary size. It is well known that Strassen's algorithm can be applied in a straightforward
fashion to square matrices whose order is a power of two, but issues arise for matrices that
are non-square or those having odd dimensions. Second, establishing an appropriate cuto
criterion for stopping the recursions early is crucial to obtaining competitive performance on
matrices of practical size. Finally, excessive amounts of memory should not be required to
store temporary results. Earlier work addressing these issues can be found in
In this paper we report on our development of a general, ecient, and portable imple-
mentation of Strassen's algorithm that is usable in any program in place of calls to DGEMM,
the Level 3 BLAS matrix multiplication routine. Careful consideration has been given to all
of the issues mentioned above. Our analysis provides an improved cuto condition for rect-
angular matrices, a demonstration of the viability of dynamic peeling, a simple technique for
dealing with odd matrix dimensions that had previously been dismissed [8], and a reduction
in the amount of memory required for temporary variables.
We measure performance of our code on the IBM RS/6000, CRAY YMP C90, and CRAY
T3D single processor and examine the results in several ways. Comparisons with machine-
specic implementations of DGEMM reconrm that Strassen's algorithm can provide an
improvement over the standard algorithm for matrices of practical size. Timings of our code
using several dierent cuto criteria are compared, demonstrating the benets of our new
technique. Comparisons to the Strassen routine in the ESSL RS/6000 and the CRAY C90
libraries and the implementation of Douglas, et al., show that competitive performance can
be obtained in a portable code that uses the previously untried dynamic peeling method
for odd-sized matrices. This is especially signicant since for certain cases our memory
requirements have been reduced by 40 to more than 70 percent over these other codes.
The remainder of this paper is organized as follows. Section 2 reviews Strassen's al-
gorithm. In Section 3 we describe our implementation and address implementation issues
related to cuto, odd dimensions, and memory usage. Performance of our implementation is
examined in Section 4, where we also report on using our Strassen code for the matrix mul-
tiplications in an eigensolver application. We oer a summary and conclusions in Section 5.

DOWNLOAD FULL REPORT
http://javacalculus.googlecodefiles/Stra...orithm.pdf

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	An Efficient Algorithm for Mining Frequent Patterns full report	project topics	3	4,748	01-10-2016, 10:02 AM Last Post: Guest
	watermarking algorithm	seminar class	3	2,674	27-04-2016, 11:17 AM Last Post: dhanabhagya
	DYNAMIC SEARCH ALGORITHM IN UNSTRUCTURED PEER-TO-PEER NETWORKS--PARALLEL AND DISTRIBU	electronics seminars	9	7,351	14-07-2015, 02:25 PM Last Post: seminar report asees
	Design and Implementation of TARF: A Trust-Aware Routing Framework for WSNs	Projects9	6	3,546	10-01-2015, 11:13 PM Last Post: Guest
	TEA ENCRYPTION (ALGORITHM)	computer science technology	1	2,656	11-11-2014, 10:45 AM Last Post: Guest
	Persuasive Cued Click-Points: Design, Implementation, and Evaluation of a Knowledge-B	Projects9	3	3,018	15-04-2013, 11:14 AM Last Post: computer topic
	IMPLEMENTATION OF IMAGE STEGANOGRAPHY USING LSB TECHINQUE	seminar presentation	6	5,766	04-02-2013, 04:49 PM Last Post: seminar details
	Cooperative Caching in Wireless P2P Networks: Design, Implementation, and Evaluation	seminar class	2	3,312	02-02-2013, 02:08 PM Last Post: seminar details
	Implementation of a simple device mediation application of a Network Management Syste	computer science crazy	2	2,159	06-12-2012, 01:57 PM Last Post: seminar details
	A TABU SEARCH ALGORITHM FOR CLUSTER BUILDING IN WIRELESS SENSOR NETWORKS- MOBILE COMP	electronics seminars	1	2,052	01-12-2012, 12:06 PM Last Post: seminar details

Important Note..!

ASK HERE