13-04-2010, 11:14 AM
[attachment=3187]
¢ VLIW
ARCHITECTURE
¢ Increasing Processor Performance
Semiconductor Technology
Parallel Processing
Multiprocessors, Multicomputers
Parallelism within the Processor
Pipelining
ILP
¢ ILP (Instruction Level Parallelism)
Parallel Execution of Instructions.
Overlapping of instructions
ILP processors
Superscalar processors
VLIW processors.
¢ Scalar Processors
Fetching and executing an instruction at a time
A program represents a plan of execution.
The processor acts as an interpreter that executes the instruction in the program one at a time.
¢ Execution in a Scalar Processor
Decision about operations by H/W
More than one instruction at a time
Dynamic scheduling
¢ Basic Superscalar Approach
¢ Execution in Superscalar
¢ Disadvantages of Superscalar
Complexity of hardware.
Window size constrained. This limits the capacity to detect independent instructions.
More power consumption.
¢ VLIW
Very Long Instruction Word.
Instructions hundereds of bits in length
Uses long instruction called a Multiop
Multiple functional units are concurrently used
Functional units share a common register file.
Code compaction by compiler.
¢ A Brief History
Joseph fisher,Trace scheduling,1979
He coined the acronym VLIW.
In 1984, two companies were started
Multiflow, started by Joseph Fisher
Cydrome, founded by Bob Rau.
Basic VLIW Approach
In 1987, Cydrome delivered the first machine “ the 256 bit Cydra 5.
Multiflow delivered
Trace/200 - 1987
Trace/300 - 1988
Trace/500 - 1990
Since then VLIW machines have seen a revival and some degrees of success.
Multiflow closed in 1990
Cydrome closed in 1998
¢ VLIW Execution
¢ Case Studies
Defoe.
Intel Itanium Processor.
Transmeta Crusoe Processor.
¢ Defoe Architecture
¢ Instruction Encoding
64 bit compressed VLIW architecture.
Used variable length multiops
Individual operations are encoded as 32 bit words.
A special stop bit indicates the end of an instruction word.
¢ Intel Itanium Processor
Intelâ„¢s first implementation of IA-64.
IA-64 is an ISA for the EPIC (Explicitly Parallel Instruction Computing) style of VLIW, developed jointly by Intel and HP.
64 bit processor, with
4 integer units
4 multimedia units
2 load/store units
2 extended precision floating
point units
2 single precision floating point units
¢ Transmeta Crusoe Processor
Designed to reduce power consumption.
Dynamic scheduling consumes more power.
VLIW replaces the complex ways of gaining ILP with simpler and more power efficient ways.
¢ Instruction Format
Instructions are either 64 or 128 bits long.
Molecules and atoms.
64 GPRs
¢ Compiler Support
Instruction scheduling algorithms are critical.
Three important scheduling algorithms
Trace scheduling
Trace scheduling-2
Super Block scheduling
¢ Advantages
Less hardware complexity.
Static Scheduling
Much more hardware can be devoted to useful computation.
Software has a larger window to look at..
Can find more ILP.
¢ Shortcomings
Wasteful encoding with NOPs.
Hard to maintain code compatibility between generations.
Increased program size.
Compiler has to explicitly add NOP.
New versions of the architecture can force major rewriting of the compiler.
¢ Future of VLIW
Newer processors are mainly used for
Stream and image processing. Eg PhilipsTrimedia
Digital Signal Processig. Eg TMS320C62x from Texas Instr
Mobile computing. Eg Transmeta Crusoe
High end server applications. Eg Intel Itanium
Stream and media processing lend themselves
to VLIW style with large amounts of ILP.
Superscalars will be forced to use simpler
structures and seek help from software.
¢ References
cs.utah.edu/~mbinu/coursework/686_vliw/
semiconductors.philipsacrobat/others/
Advanced Computer Architecture - Kai Hwang.