VLSI Design and Implementation of Low Power MAC Unit with Block Enabling Technique
#1

VLSI Design and Implementation of Low Power MAC Unit with
Block Enabling Technique


.pdf   VLSI Design and Implementation of Low Power MAC Unit with.pdf (Size: 329.53 KB / Downloads: 18)
Abstract
In the majority of digital signal processing (DSP) applications the critical operations
are the multiplication and accumulation. Real-time signal processing requires high speed
and high throughput Multiplier-Accumulator (MAC) unit that consumes low power, which
is always a key to achieve a high performance digital signal processing system. The
purpose of this work is, design and implementation of a low power MAC unit with block
enabling technique to save power. Firstly, a 1-bit MAC unit is designed, with appropriate
geometries that gives optimized power, area and delay. The delay in the pipeline stages in
the MAC unit is estimated based on which a control unit is designed to control the data
flow between the MAC blocks for low power. Similarly, the N-bit MAC unit is designed
and controlled for low power using a control logic that enables the pipelined stages at
appropriate time. The adder cell designed has advantage of high operational speed, small
transistor count and low power. The MAC is implemented on a 0.18um CMOS technology
using CADENCE VIRTUOSO tool. This paper also investigates on various architectures of
multipliers and adders which are suitable for implementation of high throughput signal
processing and at the same time to achieve low power consumption. The whole MAC chip
is operated at 125 MHz using 1.8 V power supply. The power is reduced by 27% using the
block enabling technique compared to the normal design.
Keywords: Low Power, MAC, clock gating, block enable, multiplier.
1. Introduction
In the majority of digital signal processing (DSP) applications the critical operations usually involve
many multiplications and/or accumulations. For real-time signal processing, a high speed and high
throughput Multiplier-Accumulator (MAC) is always a key to achieve a high performance digital
signal processing system. In the last few years, the main consideration of MAC design is to enhance its
speed. This is because, speed and throughput rate is always the concern of digital signal processing
system. But for the epoch of personal communication, low power design also becomes another main
design consideration. This is because, battery energy available for these portable products limits the
power consumption of the system. Therefore, the main motivation of this work is to investigate various
VLSI Design and Implementation of Low Power MAC Unit with Block Enabling Technique 621
pipelined multiplier/accumulator architectures and circuit design techniques which are suitable for
implementing high throughput signal processing algorithms and at the same time achieve low power
consumption. A conventional MAC unit consists of (fast multiplier) multiplier and an accumulator that
contains the sum of the previous consecutive products. The function of the MAC unit is given by the
following equation:
F = Σ A i Bi (1.1)
Figure 1: Basic structure of MAC
Figure 2: MAC architecture
8 bit Wallace Tree
multiplier
17 bit Register
17 bit Accumulator
18 bit Accumulator Register
18 bit Register
8 8
output
The main goal of a DSP processor design is to enhance the speed of the MAC unit, and at the
same time limit the power consumption. In a pipelined MAC circuit, the delay of pipeline stage is the
delay of a 1-bit full adder (Jou, Chen, Yang and Su, 1995) . Estimating this delay will assist in
identifying the overall delay of the pipelined MAC. In this work, 1-bit full adder is designed. Area,
power and delay are calculated for the full adder, based on which the pipelined MAC unit is designed
for low power.
2. Multiplier and Accumulator Unit
MAC is composed of an adder, multiplier and an accumulator. Usually adders implemented are Carry-
Select or Carry-Save adders, as speed is of utmost importance in DSP (Chandrakasan, Sheng, &
Brodersen, 1992 and Weste & Harris, 3rd Ed). One implementation of the multiplier could be as a
parallel array multiplier. The inputs for the MAC are to be fetched from memory location and fed to
the multiplier block of the MAC, which will perform multiplication and give the result to adder which
will accumulate the result and then will store the result into a memory location. This entire process is
to be achieved in a single clock cycle (Weste & Harris, 3rd Ed). Figure 2 is the architecture of the MAC
unit which had been designed in this work. The design consists of one 17 bit register, one 8-bit Wallace
622 Shanthala S and S. Y. Kulkarni
tree multiplier, 17-bit accumulator using ripple carry and two18-bit accumulator registers. To multiply
the values of A and B, Wallace tree multiplier is used instead of conventional multiplier because
Wallace tree multiplier can increase the MAC unit design speed. Ripple Carry Adder (RCA) is used as
an accumulator in this design. Apparently, together with the utilization of Wallace tree multiplier
approach, carry save adder in the final stage of the Wallace tree multiplier and Ripple Carry adder as
the accumulator, this MAC unit design is not only reducing the standby power consumption but also
can enhance the MAC unit speed so as to gain better system performance. The operation of the
designed MAC unit is as in Equation 2.1. The product of Ai X Bi is always fed back into the 17-bit
Ripple Carry accumulator and then added again with the next product Ai x Bi. This MAC unit is
capable of multiplying and adding with previous product consecutively up to as many as eight times.
Operation: Output = Σ Ai Bi (2.1)
In this paper, the design of 8x8 multiplier unit is carried out that can perform accumulation on
17 bit number. This MAC unit has 18 bit output and its operation is to add repeatedly the
multiplication results. The total design area is also being inspected by observing the total count of
transistors. Power delay product is calculated by multiplying the power consumption result with the
time delay.
2.1. Wallace tree Multiplier
The design analysis starts with the analysis of elementary algorithm for multiplication by Wallace tree
multiplier. Figure 3 shows the algorithm for 8 x 8 bits multiplication performed by Wallace tree
multiplier. There are five stages to go through, to complete the multiplication process (Weste & Harris,
3rd Ed). Each stage used half adders and full adders that are denoted by the red circle for the 1bit half
adder and the blue circle for the 1 bit full adder. Firstly, we had to reduce the partial products using the
Figure 3: Algorithm for 8 bits x 8 bits Wallace tree multiplier (Harun, 2007)
half adders and full adders that are combined to build a carry save adder (CSA) until there were just
two rows of partial products left. Next, we add remaining two rows by using a fast carry propagate
adder. In this project, CSA (carry save adder) using ripple carry adder is used to get the final product.
Secondly, the schematic of the conventional 8 bits x 8bits high speed Wallace tree multiplier is
designed by referring to the algorithm.
VLSI Design and Implementation of Low Power MAC Unit with Block Enabling Technique 623
2.2. Carry Save Adder
When three or more operands are to be added simultaneously using two operand adders, the time
consuming carry propagation must be repeated several times. If the number of operands is ‘k’, then
carries have to propagate (k-1) times (Weste & Harris, 3rd Ed). In the carry save addition, we let the
carry propagate only in the last step, while in all the other steps we generate the partial sum and
sequence of carries separately. A CSA is capable of reducing the number of operands to be added from
3 to 2 without any carry propagation. A CSA can be implemented in different ways. In the simplest
implementation, the basic element of carry save adder is the combination of two half adders or 1 bit
full adder(Weste & Harris, 3rd Ed).
2.3. Block Enabling Technique
In any MAC unit, data flows from the input register to the output register through multiple stages such
as, multiplier stage, adder stage and the accumulator stage as shown in figure 4. Within the multiplier
stage, further we find that there are multiple stages of addition. During each operation of multiplication
and addition, the blocks in the pipeline may not be required to be on or enabled until the actual data
gets in from the previous stage. In block enabling technique, we find the delay of each stage. Every
block gets enabled only after the expected delay. For the entire duration until the inputs are available,
the successive blocks are disabled, thus saving power. In the next section, we design a 1-bit MAC unit
with pipeline structure and find the power consumption.
Figure 4: General Block Diagram of a Pipeline MAC with block enabling Technique cs - control signal
cs 1
cs_2
cs_3
cs_4
cs_5
cs 1
624 Shanthala S and S. Y. Kulkarni
2.4. Pipelined block enabled logic
Figure 5 shows a three stage pipelined MAC with block enable logic. In this logic, depending upon the
delay of individual blocks, the control logic enables the clock, power and logic pins of the block, thus
saving power. Figure 6 shows the block schematic of the 1 bit full adder circuit with enable. Each of
the blocks in the MAC unit has an enable signal to save power.
Figure 5: MAC with control logic
reset
Adder
Enable
Register
Enable
Control
Logic
En_1
En_2
en_0
a
b
Figure 6: 1bit full adder with enable
Full
Adder
sum
carry
a
b
enable
c
2.5. Accumulator Register
Figure 7 shows the 1 bit register file cell that may be represented by a D-flip flop and two gates. Note
that in addition to the clock signal, the cell has 3 inputs and 1 output: write select, read select and D
input and Q output signal. With in this cell, the D-flip flop will store the value of the input signal
whenever write select is equal to 1, consequently, whenever the read select signal is equal to 1, this Dflip
flop will pass its stored value to the output through a tristate buffer.
VLSI Design and Implementation of Low Power MAC Unit with Block Enabling Technique 625
Figure 7: 1 bit Register cell
From the observations made, we find that the basic building blocks for any MAC unit are
Multiplier, Adder and Register. Multiplier and adder blocks require full adders, and registers require
flip-flops or latches. The objective of this work is to find the total area, power and delay of the MAC
unit that forms the critical part of any DSP application. At the micro level, the power, delay and area
for the basic blocks are calculated based on the experimental setup. Based on the results obtained, the
reasons for power and delay are identified at the micro level and remedies are taken to minimize this
power. Further this power reduction technique is extended at the macro level. In our design, it is the
MAC architecture. Section 3 discusses these results.
2.6. Full adder design
Different ways of realizing the full adder (Jou, Chen,Yang & Su, 1997, Suzuki, Ohkubo, Shinbo,
Yamanaka, Shimizu, Sasaki & Nakagome, 1993, Lu & Samulei, 1993) are tabulated in table 1 and the
results of the same are also compared. From the table it is very clear that the mux based full adder
implementation consumes very less power and also has minimum delay. In this work, the mux based
full adder is considered for implementation. The mux based full adder has a delay of 0.0012ns, this
implies that, when the input is applied it takes 0.0012ns to produce the outputs. Hence we can disable
other blocks connected to the output of full adder, and hence power is saved. Using this design, 1-bit
pipelined MAC unit is realized. The basic building blocks for the MAC units are the flip flop to store
1-bit data, 1-bit adder, AND gate for control activity. These basic building blocks are taken
independently and analyzed for delay and power. The pipelined MAC is incorporated with an enable
pin to reduce power consumption, i.e. at any given point of time only one of the blocks gets enabled to
ensure data flow from one stage to the next stage. For example, if the adder block is computing, the
register block is disabled to save power or during the loading operation, adder block is disabled to save
power. This is controlled by an external signal E, which enables or disables the corresponding block to
keep it idle.
Low power techniques as discussed in (Anantha, Samuel & Borderson, 1992) are considered in
this work for reducing power.
626 Shanthala S and S. Y. Kulkarni
Table 1: Full adder design comparison
Full adder using No. of transistors Area (um2) Power (uw) Delay (ns)
Only nand 36 507.592 0.01293 0.00987
Only mux 22 324.225 0.0001459 0.0012
Exor, and, or 30 408.127 3.58 0.0065
Conventional cmos logic 28 387.548 0.0328 0.01055
Quasi domino 23 375.124 0.01645 0.00767
Static and dynamic 22 367.721 46.65 0.0109
Exor & AND 30 413.402 3.58 0.0065
2.7. AND Gate
The basic gate that is required to enable or disable the MAC blocks is controlled using an AND gate.
The results tabulated in table 2 are by conducting experiments using cadence tools with 180nm
technology library. The width of the transistors is varied to find the effect of delay and power. Table 3
lists the effect of width variation on power. It is observed that the delay of AND gate is not constant
and it varies as per the input signals. From the table 2, we observe that delay reduces with increase in
width, we select 0.4 as the AND gate geometries that gives minimum delay. As the AND gate has
delay, the blocks connected to the output of AND gate are disabled until this time, and these blocks are
enabled only after the outputs are available, hence saving power. From table 3 we find that the power
also varies with input, and the power is maximum for 0.4 geometries.
Table 2: Delay variations for AND gate
Wn / Wp Delay ‘td’ (S) Delay ‘td’ (S) i/p a = pulse, b = 1 i/p a = 1, b = pulse
0.2 2.187 E-10 2.156 E-10
0.3 2.225 E-10 2.192 E-10
0.4 1.805 E-10 1.776 E-10
0.5 1.745 E-10 1.720 E-10
Table 3: Power variations for AND gate
Wn / Wp Power (W) Power (W) i/p a = pulse, b = 1 i/p a = 1, b = pulse
0.2 4.411 E-11 1.275 E-08
0.3 4.457 E-11 2.106E-08
0.4 5.439 E-11 8.919E-09
0.5 2.211 E-11 1.301E-08
2.8.1. Bit Register
Register forms one of the basic unit for the MAC unit, as the register stores data, there is possibility of
leakage current and that affects power dissipation. Also the clock connected to the register cell also
keeps changing and hence affects the dynamic power dissipation. In this work, the basic register cell is
analyzed for its power consumption. The register cell is enabled with clock gating and the power and
delay is calculated. Table 4 shows the power and delay of the basic register cell calculated using
cadence tools. We find that the power gets reduced with enable. Knowing the delay, we enable the
blocks connected at the output of the 1 bit register only after the output is available. This helps in
saving power.
VLSI Design and Implementation of Low Power MAC Unit with Block Enabling Technique 627
Table 4: Power and delay results for 1-Bit register cell
Power (W) Delay ‘td’(S)
Data i/p di = pulse Data i/p di = pulse
Wn / Wp With enable Without enable With enable Without enable
0.2 4.029 E-09 4.078 E-09 7.881 E-10 7.882 E-10
0.3 2.485 E-08 3.628 E-09 6.996 E-10 6.985 E-10
0.4 3.684 E-09 3.712 E-09 6.369 E-10 6.371 E-10
0.5 4.636 E-09 4.644 E-09 6.167 E-10 6.152 E-10
2.9.1. Bit Full Adder
Mux based full adder is designed in this work using 180nm technology and the results are obtained
using cadence tools. Table 5 depicts the results for power and table 6 depicts the results for the delay of
1-B full adder. We find that the power increases with increase in width and also suddenly reduces. This
is due to the fact that as we increase the width ratios of the transistors, due to mobility variations,
threshold variations occur and hence the power reduces. Hence in this work Wn / Wp ratio of 0.3 is
chosen for better results. The delay of full adder is 0.39ns and 0.4317ns, the maximum delay is
selected and based upon this delay the output blocks connected to the full adder are enabled.
Reply
#2

to get information about the topic " vlsi design" full report ppt and related topic refer the page link bellow

http://studentbank.in/report-an-introduc...gn-in-vlsi

http://studentbank.in/report-low-power-v...ull-report

http://studentbank.in/report-low-power-v...?pid=26884
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: a new vlsi architecture of parallel mac, principles of low power design vlsi**##39361## **abstract for air powered generator, latest project in low power vlsi design 2013 for m e, seminar and report on the latest mac os, vlsi implementation for a low power mobile ofdm receiver asic, ppt on low cost vlsi implementation of noise remotion, seminar topics on low power vlsi design implementation,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  SCADA FOR POWER SYSTEM AUTOMATION seminar paper 3 4,824 05-04-2016, 01:07 PM
Last Post: dhanabhagya
  design and manufacturing of tilting conveyor seminar addict 1 2,367 18-03-2016, 04:35 PM
Last Post: computer science crazy
  DESIGN AND FABRICATION OF AN AQUA SILENCER seminar addict 2 5,880 07-04-2015, 11:53 AM
Last Post: Kishore1
  Advanced Construction Methods for New Nuclear Power Plants seminar details 3 3,111 24-10-2014, 11:40 PM
Last Post: jaseela123d
  SURATGARH SUPER THERMAL POWER STATION seminar addict 2 2,169 05-09-2014, 10:36 PM
Last Post: seminar report asees
  Hydro power plants PPT seminar details 1 2,230 24-08-2014, 01:09 AM
Last Post: avantidarbhe
  A Modular Single-Phase Power-Factor-Correction Scheme With a Harmonic Filtering seminar details 2 1,921 26-07-2013, 04:44 PM
Last Post: computer topic
  RSA ALGORITHM IMPLEMENTATION project uploader 2 2,346 26-06-2013, 06:39 PM
Last Post: arora_rachna04
  Siemens AG, Power Generation Germany seminar addict 2 2,822 09-04-2013, 10:46 AM
Last Post: computer topic
  power management using load shedding. seminar addict 2 1,957 13-03-2013, 02:44 PM
Last Post: computer idea

Forum Jump: