I am matrix algorithm representation in verilog code
Posts: 6,843
Threads: 4
Joined: Mar 2015
matrix multiplication in verilog code
Abstract
Digital multipliers are indispensable in the hardware implementation of many important functions such as DCT, IDCT, FFT etc in signal processing. This paper deals with Design and implementation of Vedic Multipler in Image Compression using DCT algorithm. The DCT (Discrete Cosine Transform) performs spatial compression of the data while IDCT performs decompression of the data. Here, matrix multiplication is one of the important step in both the transforms. Hence, to perform these computations, we introduce Vedic multiplier which is based on Urdhava Tiryakbhyam(vertical and crosswise) sutra. In this paper, we have designed DCT algorithm using Verilog and code is written in Xilinx I.S.E 7.1i version, synthesized on Xilinx Synthesis Tool (XST). We retrieved Register Transfer Logic (RTL) and the simulation results are observed on Modelsim 6.0 Simulator. These simulation results were compared with matlab simulation results. From the comparison, we see that DCT using Vedic Multiplier is efficiently implemented and the proposed Vedic multiplier significantly improves the computational speed involved in multiplication operations of the image processing. Hence, Vedic multipliers can find immense use in applications of image processing to save time and area.In this paper we discuss about our experiences in improving the performance of two key algorithms: the single-precision matrix-matrix multiplication subprogram (SGEMM of BLAS) and single-precision FFT using CUDA. The former is computation-intensive, while the latter is memory bandwidth or communication-intensive. A peak performance of 393 Gflops is achieved on NVIDIA GeForce GTX280 for the former, about 5% faster than the CUBLAS 2.0 library. Better FFT performance results are obtained for a range of dimensions. Some common principles are discussed for the design and implementation of many-core algorithms.
Matrix multiplier:
//Module for calculating Res = A*B
//Where A,B and C are 2 by 2 matrices.
module Mat_mult(A,B,Res);
//input and output ports.
//The size 32 bits which is 2*2=4 elements,each of which is 8 bits wide. input [31:0] A;
input [31:0] B;
output [31:0] Res;
//internal variables
reg [31:0] Res;
reg [7:0] A1 [0:1][0:1];
reg [7:0] B1 [0:1][0:1];
reg [7:0] Res1 [0:1][0:1];
integer i,j,k;
always@ (A or B)
begin
//Initialize the matrices-convert 1 D to 3D arrays
{A1[0][0],A1[0][1],A1[1][0],A1[1][1]} = A;
{B1[0][0],B1[0][1],B1[1][0],B1[1][1]} = B;
i = 0;
j = 0;
k = 0;
{Res1[0][0],Res1[0][1],Res1[1][0],Res1[1][1]} = 32'd0; //initialize to zeros.
//Matrix multiplication
for(i=0;i < 2;i=i+1)
for(j=0;j < 2;j=j+1)
for(k=0;k < 2;k=k+1)
Res1[i][j] = Res1[i][j] + (A1[i][k] * B1[k][j]);
//final output assignment - 3D array to 1D array conversion.
Res = {Res1[0][0],Res1[0][1],Res1[1][0],Res1[1][1]};
end
endmodule
Testbench Code:
module tb;
// Inputs
reg [31:0] A;
reg [31:0] B;
// Outputs
wire [31:0] Res;
// Instantiate the Unit Under Test (UUT)
Mat_mult uut (
.A(A),
.B(B),
.Res(Res)
);
initial begin
// Apply Inputs
A = 0; B = 0; #100;
A = {8'd1,8'd2,8'd3,8'd4};
B = {8'd5,8'd6,8'd7,8'd8};
end
endmodule