ASK HERE

seminar class · 06-05-2011, 12:34 PM

Abstract
Currently few architectural approaches proposenew paths to raise the performance of conventional sequentialinstruction streams in the time of the billions transistor era.Many application programs could profit from processors thatare able to speed up the execution of sequential applicationsbeyond the performance of current superscalar processors. TheGrid Alu Processor (GAP) is a runtime reconfigurable processordesigned for the acceleration of a conventional sequential instructionstream without the need of recompilation. It comprisesa superscalar processor front-end, a configuration unit, and anarray of reconfigurable functional units (FUs), which is fullyintegrated into the pipeline. The configuration unit maps datadependent and independent instructions simultaneously at runtimeinto the array of FUs. This paper evaluates the GAParchitecture and optimizes the hardware, the number of FUs,and the configuration layers implemented in the array. The simulationsshow a significant speed up for sequential applicationson GAP in comparison to an out-of-order superscalar simulator(SimpleScalar). GAP outperforms SimpleScalar in average byabout 50% on the basic architecture and about 100% with anextended version including configuration layers.
I. INTRODUCTION
Current emerging development of many-core processors tosustain the gain of performance defined by Moore’s law [1]is facing many hurdles. The most difficult impediment isto accelerate the execution of sequential streams. The hugenumber of transistors available on a single chip will allowdesigners to implement dozens of cores for the acceleration ofparallel applications. However, no speed-up can be achievedfor single sequential applications by such architectures. Also,the memory bottleneck tends to be evident for data intensiveapplications, since more cores yield more traffic on the busand result in less performance [2]. Asymmetric multiprocessor(AMP) design could offer an underlying fabric that optimallymeets the demands of different applications, but still needs alot of complex compiler analysis. Moreover, the applicationperformance depends on the efficiency of the cores. Applicationspecific integrated circuits (ASICs) allow finding anoptimal hardware solution for a special kind of applications;unfortunately they exhibit very poor dynamicity. The reconfigurationof processors on instruction level came thereforeup to offer a single design that effectively executes differentapplications with different demands [3]. These architecturescan reconfigure an array of functional units on the instructionlevel offering a superior dynamicity and tend to have theperformance of ASICs.Previous attempts to achieve reconfigurability on the instructionlevel are based on compiler analysis and profiling forthe data flow graphs [4] [5]. A main processor with extendedinstruction set architecture (ISA) controls the mapping of specifiedtiles of program code to an accelerator [6] [7]. Indeed,the cooperation between hardware and software and the needfor a controlling processor has decreased the expectation ofmuch better performance [8].With the Grid Alu Processor (GAP) architecture we addressthe challenges of accelerating sequential applications,exploiting more instruction level parallelism (ILP), and ofparallelizing memory accesses. A very important feature ofGAP is that it does not require a new ISA and special softwareto prepare and map the configurations to the hardware. Itpermits herewith the use of the well-known GCC compilerfor superscalar architectures without any modifications ofthe generated binary files, which is not the case for relatedarchitectures.The GAP comprises a superscalar front-end followed by anovel configuration unit and an array of FUs. The configurationunit allows the use of a common RISC-ISA of superscalarprocessors by mapping instructions at runtime into the arrayof FUs. The in-order fetch, decode, and reconfiguration abilitykeeps the processor front-end simple and avoids the most large,unscalable and power-hungry hardware structures needed byout-of-order processors like: large issue windows and theneeded hardware to elaborate on it, renaming structures, andreorder buffer.It is obvious that simple and complex operations requiredifferent execution times in the arithmetic logic units (ALUs).Unfortunately, because of the synchronously clocked pipelinesthe execution of operations is restricted to the clock cycleboundaries. The globally asynchronous locally synchronousarchitectures (GALS) remove the delay between differentstages in the processor and achieve about 60% more performancecompared to similar synchronous architectures [9].Following a similar technique, we have designed an array offunctional units that work asynchronously, whereas all otherstages are synchronous.Most modern processors are highly pipelined to allow a highclock frequency. But long bypass wires prevent improving theperformance by adding time penalty to the execution time [10].Itanium II’s 6-way integer execution unit presents an evidenceof bypass wire delay, since over a half of the critical path ofthe ALU is spent on the bypass paths [11]. The instructionsin issue stage that have data dependencies to the ones in theexecution stage have to wait for bypassed values, even withpipelining the bypass wire by dedicating a clock cycle for it.

Download full report
http://ieeexplore.ieeeiel5/5610958/56235...er=5610958

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	Service-Oriented Architecture for Weaponry and Battle Command and Control Systems in		1	1,063	15-02-2017, 03:40 PM Last Post: jaseela123d
	Reconfigurable Virtual Keyboard	seminar class	5	2,070	30-11-2015, 02:30 PM Last Post: seminar report asees
	Persuasive Cued Click-Points: Design, Implementation, and Evaluation of a Knowledge-B	Projects9	3	3,026	15-04-2013, 11:14 AM Last Post: computer topic
	Cooperative Caching in Wireless P2P Networks: Design, Implementation, and Evaluation	seminar class	2	3,318	02-02-2013, 02:08 PM Last Post: seminar details
	grid computing project ideas	computer science topics	1	2,663	21-12-2012, 10:55 AM Last Post: seminar details
	application projects in java and vb.net (titles and topics)	project topics	1	5,495	28-11-2012, 01:11 PM Last Post: seminar details
	Performance Evaluation and Comparision of Routing Protocols for Wireless Ad-hoc netwo	nit_cal	1	1,432	07-11-2012, 12:42 PM Last Post: seminar details
	PREVENTION OF SQL INJECTION AND DATA THEFTS USING DIVIDE AND CONQUER APPROACH	seminar presentation	3	4,180	24-10-2012, 01:09 PM Last Post: seminar details
	Design and Analysis of the Gateway Relocation and Admission Control Algorithm in Mobi	Projects9	1	1,721	10-10-2012, 12:22 PM Last Post: seminar details
	Grid Computing Used For Next Generation High Speed Processing Technology	seminar class	1	2,124	03-10-2012, 12:24 PM Last Post: seminar details

Important Note..!

ASK HERE