HIGH PERFORMANCE DSP CAPABILITY WITHIN AN OPTIMIZED LOW-COST FPGA ARCHITECTURE
#1

[attachment=1438]

HIGH-PERFORMANCE DSP CAPABILITY WITHIN AN OPTIMIZED LOW-COST FPGA ARCHITECTURE

ABSTRACT
The applications of Digital Signal Processing (DSP) continue to expand,
driven by trends such as the increased use of video and still images
and the demand for increasingly reconfigurable systems such as Software
Defined Radio (SDR). Many of these applications combine the need for
significant DSP processing with cost sensitivity, creating demand for
high-performance, low-cost DSP solutions.
General-purpose DSP chips and FPGAs are two common methods of
implementing DSP functions.
The use of DSP techniques will continue to grow at the expense of
analog implementations. An analysis of the functions typically used in
DSP applications indicates that a combination of multiplier, addition,
subtraction and accumulation elements is required. The Lattice ECP
devices provide a sophisticated DSP block combined with a low-cost FPGA
fabric. Through the implementation of addition, subtraction,
accumulation and pipelining within the sysDSP block, performance and
LUT utilization are considerably higher than those of alternative low-
cost FPGA solutions that provide only basic multiplier capabilities.
The speed and utilization advantages of the sysDSP block help users
reduce costs through the selection of
smaller and lower speed grade devices.
This paper provides an overview of common DSP functions and
then explores the differences between the general purpose DSPs and
FPGAs. This is followed by a description of the LatticeECP„¢-DSP
(EConomy Plus Digital Signal Processing) architecture and a comparison
of the LatticeECP-DSP to existing FPGA solutions.

Introduction
The increased use of video and still images and the demand for
increasingly reconfigurable systems such as Software Defined Radio
(SDR) such applications combine the need for significant DSP processing
with cost sensitivity, creating demand for high-performance, low-cost
DSP solutions.
General-purpose DSP chips and FPGAs are two common methods of
implementing DSP functions An analysis of the functions typically used
in DSP applications indicates that a combination of multiplier,
addition, subtraction and accumulation elements is required. The
LatticeECP devices provide a sophisticated DSP block combined with a
low-cost FPGA fabric. Through the implementation of addition,
subtraction, accumulation and pipelining within the sysDSP block,
performance and LUT utilization are considerably higher than those of
alternative low-cost FPGA solutions that provide only basic multiplier
capabilities. The speed and utilization advantages of the sysDSP block
help users reduce costs through the selection of smaller and lower
speed grade devices.
Introduction to DSP Typical Functions
While a vast array of digital signal processing functions are
implemented by designers, Finite Impulse Response (FIR) filters,
Infinite Impulse Response (IIR) filters, Fast Fourier Transforms (FFTs)
and mixers are common to many applications. Each of these functions
requires a combination of multiply elements along with addition,
subtraction and accumulation. This section provides a brief overview of
the algorithms used to implement these functions.
Finite Impulse Response (FIR) Filters
The finite impulse response filter stores a series of n data
elements, each delayed by an additional cycle.These data elements are
commonly referred to as taps. Each tap is multiplied by a coefficient
and the results summed to produce the output. Some implementations
perform all the multiplications in parallel. More generally, the
implementation is broken down into N stages, with an accumulator
passing the partial result from one stage to the next. This
implementation trades speed for functional resources, taking N
computation stages and requiring n/N multipliers. Depending upon
whether the coefficients are static or dynamic,and thedesignofthe
coefficient values, there are a number of other design optimizations
commonly used that are beyond the scope of this paper.
Figure 1 shows the implementation of a typical FIR filter.

Figure 1 -- Typical FIR Filter
Fast Fourier Transform (FFT) Functions
Fast Fourier Transforms are used for a variety of applications, ranging
from image compression to determining the spectral content of a data
sample. There are a variety of methods for implementing the Fast
Fourier Transform. Probably the most common method is to use Cooley-
Tukey decimation in time approach, which breaks the FFT down into a
number of smaller FFTs. The simplest implementation uses an element
commonly referred to as the Radix-2 butterfly, through which the input
data must be passed multiple times. Figure 2 shows the Radix-2
Butterfly. The calculation is conceptually simple, as shown on the left
of the diagram. However, as all the multiplies and additions are done
with complex numbers, the actual number of multiplies and additions
required is somewhat more challenging, as shown on the right side of
the diagram. Text Box: Data InText Box: CoefficientText Box:
CoefficientText Box: CoefficientText Box: XText Box: XText Box: XText
Box: SText Box: DataOut

Figure 2 “ Radix-2 Butterfly Commonly Used For Implementing FFTs
Infinite Impulse Response (IIR) Filters The Infinite Impulse Response
(IIR) filter is similar to the FIR filter, except that feedback paths
are introduced. These feedback paths make the design and analysis of
IIR filters more complex than FIR filters. However, the IIR approach
can provide a more powerful filter for the same silicon area. Although
there are several IIR architectures, one common approach is to build
IIR filters out of second order bi-quads, as shown in figure 3.

Figure 3 “ IIR Second Order Bi-quad
Text Box: = A+jB+C*E+C*jF+jD*E+jD*jFText Box: (A+jB)Text Box: +Text
Box: Data OutText Box: Data InText Box: = A+jB-C*E-C*jF-jD*E-jD*jFText
Box: (C+jD)Text Box: +Text Box: XText Box: -Text Box: Twiddle
FactorText Box: (E+jF)Text Box: Data InText Box: Data OutText Box:
XText Box: +Text Box: XText Box: XText Box: XText Box: X
Mixer Functions
Many applications use mixers to shift the frequency of a signal. While,
conceptually, just a single multiplier could be used, in digital
applications there are a number of advantages to representing the
numbers in a complex form. Most typically this is done by representing
signals as I and Q components. Figure 4 shows a mixer that would be
used in digital up-conversion.

Figure 4 “ Typical Up Converter Mixer using Complex Arithmetic
General Purpose DSP Solutions Versus FPGA Implementations
As illustrated in the description of common functions, multipliers,
followed by addition, subtraction or accumulation are at the heart of
most DSP applications. General-purpose DSP chips combine efficient
implementations of these functions with a general-purpose
microprocessor. The number of multipliers is generally in the range of
one to four, and the microprocessor will sequence data to pass it
through the multiply and other functions storing intermediate results
in memory or accumulators. Performance isincreased primarily by
increasing the clock speed used for multiplication. Typical clock Text
Box: IText Box: XText Box: Data outText Box: -Text Box: QText Box:
XText Box: Direct Digital Synthesizerspeeds run from tens of MHz to
1GHz. Performance, as measured by Millions of Multiply Accumulates
(MMAC) per second, typically ranges from 10 to 4000. Functions
requiring higher performance have to be split across multiple DSP
engines. The price of these chips ranges from a few dollars at the
bottom end of the performance range to hundreds of dollars at the high
end. The key advantage of this approach is the ability to directly
implement algorithms written in a high-level programming language such
as C.
DSP oriented FPGAs provide the ability to implement many functions in
parallel on onechip. General-purpose routing, logic and memory
resources are used to interconnect the functions, perform additional
functions, sequence and, as necessary, store data. some basic devices
provide multiplier only support, requiring users to construct all other
functions in logic. More sophisticated devices provide addition,
subtraction and accumulator functions as part of their set of DSP
building blocks. FPGAs typically have tens of multiplier elements and
can operate at clock speeds of hundreds of MHz. For example, the
LatticeECP-DSP 20 FPGA has 28 18x18 multipliers that can run at speeds
up to 250MHz, delivering performance up to 7,000 MMAC per second. Table
1
compares the FPGA and general-purpose approach.

Table 1 “ Comparison of General-Purpose DSP and FPGA approaches
LatticeECP-DSP Architecture
The LatticeECP-DSP devices consist of a low-cost FPGA fabric coupled
with between four and ten sysDSPTM blocks. Figure 5 shows the overall
block diagram of the ECP devicei. The sysDSP block in the LatticeECP
family supports four functional elements in three data path widths: 9,
18 and 36. The user selects a function element for a DSP block and then
selects the width and type (signed/unsigned) of its operands. The
operands in the sysDSP Blocks can be either signed or unsigned, but not
mixed within a function element. Similarly, the operand widths cannot
be mixed within a block. The resources in each sysDSP block can be
configured to support the following four elements:
¢ MULT (Multiply, Figure 6)
¢ MAC (Multiply Accumulate, Figure 7)
¢ MULTADD (Multiply Addition/Subtraction, Figure 8)
¢ MULTADDSUM (Multiply Addition/Subtraction Summation, 9)
The number of elements available in each block depends upon the width
selected from the three available options: x9, x18, and x36. A number
of these elements are concatenated for highly parallel implementations
of DSP functions. Table 2 shows the capabilities of the block.

Table 2 “ Maximum Number of Elements in a sysDSP Block

Figure5_Lattice ECP-DSP Block Diagram
The sysDSP block has built-in optional pipelining at the input,
intermediate and output stages. In addition, inputs can be loaded in
parallel or shifted across the array as necessary. Options are also
provided for dynamically switching between signed and unsigned
arithmetic and subtraction and addition.



Figure 6“ MULT (Multiplier) Element Figure 7 “ MAC (Multiply
Accumulate) Element

Figure 8 “ MULTADD (Multiplier Addition/Subtraction) Element

Figure 9 “ MULTADDSUM (Multiplier Addition/Subtraction Summation)
Element
Performance and Device Utilization Improvements
The availability of pipelining registers, summation, subtraction and
accumulation within the sysDSP blocks increases their utility. As
illustrated, in typical functions it is very common to need to combine
multiplication with addition, summation, or accumulation. Pipelining
registers, while conceptually simple, rapidly consume significant
resources when implemented on wide data paths. The sysDSP blocksâ„¢
ability to implement these functions results in lower consumption of
general-purpose FPGA resources and higher performance. Both of these
factors translate directly into lower costs, as in many cases they
allow designers to select smaller devices with lower speed grades.
Lattice ECP-DSP Design Flow
Lattice provides designers with four simple methods to access the
capabilities of the sysDSP Block:
¢ The Module/IP Manager is a graphical interface provided in the
ispLEVER® tools that allows the rapid creation of modules implementing
DSP elements. These modules can then be used in HDL designs as
appropriate.
¢ The coding of certain functions into a design™s HDL and
allowing the synthesis tools to Inference the use of a DSP block.
¢ The implementation of designs in MathWork™s Simulink tool using a
Lattice Block set. The ispLeverDSP portion of the ispLEVER tools will
then convert these blocks into HDL as appropriate.
¢ Instantiation of DSP primitives directly in the source code.
The method chosen for any design will depend upon the DSP algorithm
design methodology and the degree of control desired over the physical
implementation.
Low-Cost FPGA Implementations
With the introduction of the LatticeECP/EC devices, users can now
choose among three current generation,low-cost FPGAs:the SpartanIII
devices from Xilinx,Alteraâ„¢s Cyclone family and the LatticeECP/EC
devices. Alteraâ„¢s Cyclone FPGAs contain no DSP oriented element, making
it challenging to implement large DSP functions in these devices
without consuming a significant number of internal resources.
Naturally, achieving high-performance with these implementations is
equally challenging. The Xilinx Spartan III FPGA family does provide
some basic multiplier capability. While this is certainly preferable to
having no DSP capability at all, significant resources must still be
consumed to implement the
adders, subtractors, accumulators and pipeline registers found in
typical designs. To measure the effect of providing these resources,
Lattice benchmarked performance and utilization for a FIR filter and an
IIR filter. The FIR used was a 64-tap filter with 18-bit wide data. The
IIR filter used was 4th order arranged as two biquads and 18 bit data
path.
.
FIR and IIR implementations in LatticeECP-DSP and Spartan III Devices
5 conclusion
The use of DSP techniques will continue to grow at the expense of
analog implementations. An analysis of the functions typically used in
DSP applications indicates that a combination of multiplier, addition,
subtraction and accumulation elements is required. The LatticeECP
devices provide a sophisticated DSP block combined with a low-cost FPGA
fabric. Through the implementation of addition, subtraction,
accumulation and pipelining within the sysDSP block, performance and
LUT utilization are considerably higher than those of alternative low-
cost FPGA solutions that provide only basic multiplier capabilities.
The speed and utilization advantages of the sysDSP block help users
reduce costs through the selection of smaller and lower speed grade
devices.
REFERENCES
http://xilinxfpga
latticesemi.com
mentorfpga
Digital Signal Processing OPPENHIEM
The programmable logic data book, XILINX,Inc
Reply
#2
presented by:
S.TEJASWI

HIGH-PERFORMANCE DSP CAPABILITY WITHIN OPTIMIZED LOW-COST FPGA ARCHITECTURE
ABSTRACT
:
The applications of Digital Signal Processing (DSP) continue to expand, driven by trends such as the increased use of video and still images and the demand for increasingly reconfigurable systems such as Software Defined Radio (SDR). Many of these applications combine the need for significant DSP processing with cost sensitivity, creating demand for high-performance, low-cost DSP solutions.General-purpose DSP chips and FPGAs are two common methods of implementing DSP functions.
The use of DSP techniques will continue to grow at the expense of analog implementations. An analysis of the functions typically used in DSP applications indicates that a combination of multiplier, addition, subtraction and accumulation elements is required. The Lattice ECP devices provide a sophisticated DSP block combined with a low-cost FPGA fabric. Through the implementation of addition, subtraction, accumulation and pipelining within the sysDSP block, performance and LUT utilization are considerably higher than those of alternative low-cost FPGA solutions that provide only basic multiplier capabilities. The speed and utilization advantages of the sysDSP block help users reduce costs through the selection of smaller and lower speed grade devices.
This paper provides an overview of common DSP functions and then explores the differences between the general purpose DSPs and FPGAs. This is followed by a description of the LatticeECP™-DSP (EConomy Plus Digital Signal Processing) architecture and a comparison of the LatticeECP-DSP to existing FPGA solutions.



Reply
#3

hi friend you can refer these pages to get the details on high performance dsp architectures

http://studentbank.in/report-high-perfor...chitecture

http://studentbank.in/report-high-perfor...ures--1908

http://studentbank.in/report-high-perfor...ures--3878

http://studentbank.in/report-high-perfor...274#pid274
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: smartcam low cost, low power dsp, fpga arm, download manager resume capability, fpga web server high performance, job interview within the company, capability development trends,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  FPGA IN OUTER SPACE project topics 7 7,297 23-09-2013, 11:46 AM
Last Post: computer topic
  MODIFIED LOW-COST ENERGY METER USING ADE7757 seminar class 0 2,777 06-05-2011, 12:16 PM
Last Post: seminar class
  Three-Phase Rectifier With Active Current Injection and High Efficiency seminar class 0 1,868 06-05-2011, 11:34 AM
Last Post: seminar class
  Microcantilever based force tracking with application to high resolution imaging seminarsense 0 1,304 16-11-2010, 04:27 PM
Last Post: seminarsense
  A 1-V 36 uW low noise adaptive interface IC for portable biomedical applications Wifi 1 2,036 30-10-2010, 01:09 PM
Last Post: Wifi
  New Non-Volatile Memory Structures for FPGA Architectures Wifi 0 1,442 29-10-2010, 09:00 AM
Last Post: Wifi
  Practical Implementation Of A High-frequency Current Sense Technique For VRM Wifi 0 1,614 26-10-2010, 05:23 PM
Last Post: Wifi
  PDA With Wireless Capability projectsofme 0 1,331 18-10-2010, 10:38 AM
Last Post: projectsofme
  DEAF MAN’S TELEPHONE USING DSP Sarabjeet Singh 0 971 11-08-2010, 12:15 AM
Last Post: Sarabjeet Singh
  low-power multiplier with the spurious power suppression technique Electrical Fan 0 2,493 09-12-2009, 02:44 PM
Last Post: Electrical Fan

Forum Jump: