S It is based on our previous SODA (Signal Processing On Demand Architecture) architecture, which is a 32-lane, four-processor SIMD optimized for WCDMA of 2 Mbps and IEEE 802.11a. SODA has several drawbacks, such as large file-recording capacity, wasted cycles for data alignment, etc., and can not meet the higher performance and power requirements of emerging standards. We propose SODA-II, which addresses these problems by deploying the following schemes: sequence of operations, cascade execution of SIMD units, step-by-step access to memory and multi-cyclic calculation units. Operation chaining involves the chaining of primitive instructions, thus eliminating unnecessary access to log files and saving energy. Running vector instructions through SIMD drives improves system performance. Staggered execution of the calculation units helps to simplify data alignment networks. It is implemented along with multi-cycling so that the calculation units are occupied most of the time. The proposed architecture is evaluated with an internal architecture emulator that uses a level of components and power models built with Synopsys and Artisan tools.
Wireless communications and, more specifically, increasing penetration of cell phones and cellular infrastructure are the main drivers for the development of new programmable Digital Signal Processors (DSP). In this tutorial, an overview of recent developments in DSP processors architectures is given, making them well suited to run computationally intensive algorithms typically found in communications systems. DSP processors have adapted instruction sets, memory architectures, and data paths to run computationally intensive, efficient, low-power computing algorithms. Basic building blocks include convolutional decoders (mainly the Viterbi algorithm), turbo coding algorithms, FIR filters, voice coders, and so on. This is illustrated with examples of different commercial and research processors.
It is specifically developed for the next generation of digital wireless systems and voice applications. In addition to providing a basic set of instructions, similar to the current 16-bit DSP, it contains characteristic architectural features and unique instructions, which makes the engine highly efficient for intensive computing tasks such as vector quantification and Viterbi operations. The datapath contains two Multiply-Accumulate units and one ALU. The bandwidth of the external memory is maintained in two data buses and two corresponding address buses. Even so, the internal bus network is designed in such a way that all three units operate in parallel. This parallelism is reflected in performance benchmarks. For example, a N-FIR FIR filter will take N / 2 instruction cycles compared to N for a general purpose 16-bit DSP, and will only require half the number of memory accesses for a general-purpose DSP. This efficiency is reflected in the very low MIPS requirement for implementing cellular standards.