Optimization and Evaluation of the Reconfigurable Grid Alu Processor
#1

Abstract
Currently few architectural approaches proposenew paths to raise the performance of conventional sequentialinstruction streams in the time of the billions transistor era.Many application programs could profit from processors thatare able to speed up the execution of sequential applicationsbeyond the performance of current superscalar processors. TheGrid Alu Processor (GAP) is a runtime reconfigurable processordesigned for the acceleration of a conventional sequential instructionstream without the need of recompilation. It comprisesa superscalar processor front-end, a configuration unit, and anarray of reconfigurable functional units (FUs), which is fullyintegrated into the pipeline. The configuration unit maps datadependent and independent instructions simultaneously at runtimeinto the array of FUs. This paper evaluates the GAParchitecture and optimizes the hardware, the number of FUs,and the configuration layers implemented in the array. The simulationsshow a significant speed up for sequential applicationson GAP in comparison to an out-of-order superscalar simulator(SimpleScalar). GAP outperforms SimpleScalar in average byabout 50% on the basic architecture and about 100% with anextended version including configuration layers.
I. INTRODUCTION
Current emerging development of many-core processors tosustain the gain of performance defined by Moore’s law [1]is facing many hurdles. The most difficult impediment isto accelerate the execution of sequential streams. The hugenumber of transistors available on a single chip will allowdesigners to implement dozens of cores for the acceleration ofparallel applications. However, no speed-up can be achievedfor single sequential applications by such architectures. Also,the memory bottleneck tends to be evident for data intensiveapplications, since more cores yield more traffic on the busand result in less performance [2]. Asymmetric multiprocessor(AMP) design could offer an underlying fabric that optimallymeets the demands of different applications, but still needs alot of complex compiler analysis. Moreover, the applicationperformance depends on the efficiency of the cores. Applicationspecific integrated circuits (ASICs) allow finding anoptimal hardware solution for a special kind of applications;unfortunately they exhibit very poor dynamicity. The reconfigurationof processors on instruction level came thereforeup to offer a single design that effectively executes differentapplications with different demands [3]. These architecturescan reconfigure an array of functional units on the instructionlevel offering a superior dynamicity and tend to have theperformance of ASICs.Previous attempts to achieve reconfigurability on the instructionlevel are based on compiler analysis and profiling forthe data flow graphs [4] [5]. A main processor with extendedinstruction set architecture (ISA) controls the mapping of specifiedtiles of program code to an accelerator [6] [7]. Indeed,the cooperation between hardware and software and the needfor a controlling processor has decreased the expectation ofmuch better performance [8].With the Grid Alu Processor (GAP) architecture we addressthe challenges of accelerating sequential applications,exploiting more instruction level parallelism (ILP), and ofparallelizing memory accesses. A very important feature ofGAP is that it does not require a new ISA and special softwareto prepare and map the configurations to the hardware. Itpermits herewith the use of the well-known GCC compilerfor superscalar architectures without any modifications ofthe generated binary files, which is not the case for relatedarchitectures.The GAP comprises a superscalar front-end followed by anovel configuration unit and an array of FUs. The configurationunit allows the use of a common RISC-ISA of superscalarprocessors by mapping instructions at runtime into the arrayof FUs. The in-order fetch, decode, and reconfiguration abilitykeeps the processor front-end simple and avoids the most large,unscalable and power-hungry hardware structures needed byout-of-order processors like: large issue windows and theneeded hardware to elaborate on it, renaming structures, andreorder buffer.It is obvious that simple and complex operations requiredifferent execution times in the arithmetic logic units (ALUs).Unfortunately, because of the synchronously clocked pipelinesthe execution of operations is restricted to the clock cycleboundaries. The globally asynchronous locally synchronousarchitectures (GALS) remove the delay between differentstages in the processor and achieve about 60% more performancecompared to similar synchronous architectures [9].Following a similar technique, we have designed an array offunctional units that work asynchronously, whereas all otherstages are synchronous.Most modern processors are highly pipelined to allow a highclock frequency. But long bypass wires prevent improving theperformance by adding time penalty to the execution time [10].Itanium II’s 6-way integer execution unit presents an evidenceof bypass wire delay, since over a half of the critical path ofthe ALU is spent on the bypass paths [11]. The instructionsin issue stage that have data dependencies to the ones in theexecution stage have to wait for bypassed values, even withpipelining the bypass wire by dedicating a clock cycle for it.


Download full report
http://ieeexplore.ieeeiel5/5610958/56235...er=5610958
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: implementation of alu**formate without pc, matlab coding for alu, alu project report, reversible alu, matlab code for alu, 64bit alu, implementation of alu,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  Service-Oriented Architecture for Weaponry and Battle Command and Control Systems in 1 1,063 15-02-2017, 03:40 PM
Last Post: jaseela123d
  Reconfigurable Virtual Keyboard seminar class 5 2,070 30-11-2015, 02:30 PM
Last Post: seminar report asees
  Persuasive Cued Click-Points: Design, Implementation, and Evaluation of a Knowledge-B Projects9 3 3,026 15-04-2013, 11:14 AM
Last Post: computer topic
  Cooperative Caching in Wireless P2P Networks: Design, Implementation, and Evaluation seminar class 2 3,318 02-02-2013, 02:08 PM
Last Post: seminar details
  grid computing project ideas computer science topics 1 2,663 21-12-2012, 10:55 AM
Last Post: seminar details
  application projects in java and vb.net (titles and topics) project topics 1 5,495 28-11-2012, 01:11 PM
Last Post: seminar details
  Performance Evaluation and Comparision of Routing Protocols for Wireless Ad-hoc netwo nit_cal 1 1,432 07-11-2012, 12:42 PM
Last Post: seminar details
  PREVENTION OF SQL INJECTION AND DATA THEFTS USING DIVIDE AND CONQUER APPROACH seminar presentation 3 4,180 24-10-2012, 01:09 PM
Last Post: seminar details
  Design and Analysis of the Gateway Relocation and Admission Control Algorithm in Mobi Projects9 1 1,721 10-10-2012, 12:22 PM
Last Post: seminar details
  Grid Computing Used For Next Generation High Speed Processing Technology seminar class 1 2,124 03-10-2012, 12:24 PM
Last Post: seminar details

Forum Jump: