ASK HERE

project report tiger · 12-02-2010, 10:33 AM

[attachment=1949]

ABSTRACT
Intel's Hyper-Threading Technology brings the concept of simultaneous multi-threading to the Intel Architecture. Hyper-Threading Technology makes a single physical processor appear as two logical processors; the physical execution resources are shared and the architecture state is duplicated for the two logical processors. From a software perspective, this means operating systems and user programs can schedule processes or threads to logical processors as they would on multiple physical processors. From a micro architecture perspective, this means that instructions from both logical processors will persist and execute simultaneously on shared execution resources. This paper describes the Hyper-Threading Technology architecture, and discusses the micro architecture details of Intel's first implementation on the Intelv Xeon processor family. Hyper-Threading Technology is an important addition to Intel's enterprise product line and will be integrated into a wide variety of products.

INTRODUCTION
The amazing growth of the Internet and telecommunications is powered by ever-faster systems demanding increasingly higher levels of processor performance. To keep up with this demand we cannot rely entirely on traditional approaches to processor design. Micro architecture techniques used to achieve past processor performance improvement-super pipelining, branch prediction, super-scalar execution, out-of-order execution, caches-have made microprocessors increasingly more complex, have more transistors, and consume more power. In fact, transistor counts and power are increasing at rates greater than processor performance. Processor architects are therefore looking for ways to improve performance at a greater rate than transistor counts and power dissipation. Intel's Hyper-Threading Technology is one solution.
DeoL of CSE
I
SNCrCK Knlpnrhprv
THE TECHNIQUES BEFORE HYPER THREADING
Traditional approaches to processor design have focused on higher clock speeds, instruction-level parallelism (ILP), and caches. Accesses to DRAM memory are slow compared to execution speeds of the processor. One technique to reduce this latency is to add fast caches close to the processor. Caches can provide fast memory access to frequently accessed data or instructions. However, there will always be times when the data needed will not be in any processor cache. Handling such cache misses requires accessing memory, and the processor is likely to quickly run out of instructions to execute before stalling on the cache miss. The vast majority of techniques to improve processor performance from one generation to the next is complex and often adds significant die-size and power costs. These techniques increase performance but not with 100% efficiency; i.e., doubling the number of execution units in a processor does not double the performance of the processor, due to limited parallelism in instruction flows.
A look at today's software trends reveals that server applications consist of multiple threads or processes that can be executed in parallel. On-line transaction processing and Web services have an abundance of software threads that can be executed simultaneously for faster performance. Even desktop applications are becoming increasingly parallel. Intel architects have been trying to leverage this so-called thread-level parallelism (TLP) to gain a better performance vs. transistor count and power ratio. In both the high-end and mid-range server markets, multiprocessors have been commonly used to get more performance from the system. By adding more processors, applications potentially get substantial performance improvement by executing multiple threads on multiple processors at the same time. These threads might be from the same application, from different applications running simultaneously, from operating system services, or from operating system threads doing background maintenance. Multiprocessor systems have been used for many years, and high-end programmers are familiar with the techniques to exploit multiprocessors for higher performance levels. In recent years a number of other techniques to further exploit TLP have been discussed and some products have been announced. One of these techniques is chip multiprocessing (CMP), where two
Dent nfCSF.
processors are put on a single die. The two processors each have a full set of execution and architectural resources. The processors may or may not share a large on-chip cache. CMP is largely orthogonal to conventional multiprocessor systems, as you can have multiple CMP processors in a multiprocessor configuration. Recently announced processors incorporate two processors on each die. However, a CMP chip is significantly larger than the size of a single-core chip and therefore more expensive to manufacture; moreover, it does not begin to address the die size and power considerations.
Another approach is to allow a single processor to execute multiple threads by switching between them. Time-slice multithreading is where the processor switches between software threads after a fixed time period. Time-slice multithreading can result in wasted execution slots but can effectively minimize the effects of long latencies to memory. Switch-on-event multithreading would switch threads on long latency events such as cache misses.
Finally, there is simultaneous multi-threading, where multiple threads can execute on a single processor without switching. The threads execute simultaneously and make much better use of the resources. This approach makes the most effective use of processor resources: it maximizes the performance vs. transistor count and power consumption.
fWr nfCSF.
WHAT IS HYPER-THREADING
Hyper-Threading technology is an innovative design from Intel that enables multi-threaded software applications to process threads in parallel within each processor resulting in increased utilization of processor execution resources. To make it short, it is to place two logical processors into a single CPU die. As a result, an average improvement of -40% in CPU resource utilization yields higher processing throughput.
How Hyper-Threading Works
A form of simultaneous multi-threading technology (SMT), Hyper-Threading
technology allows multiple threads of software applications to be run simultaneously on one processor by duplicating the architectural state on each processor while the same processor execution resources is shared. The figure below represents how a Hyper-Threading based processor differentiates a traditional multiprocessor. The left-hand configuration shows a traditional multiprocessor system with two physical processors. Each processor has its own independent execution resources and architectural state. The right-hand configuration represents an Intel Hyper-Threading technology based processor. You can see that the architectural state for each processor is duplicated, while the execution resources is shared.
Processor Processor Processor
Execution Execution Execution
Resources Resources Resources
For multiprocessor-capable software applications, the Hyper-Threading based processor is considered two separate logical processors on which the software
MULTIPROCESSOR HYPER-THREADING
applications can run without modification. Also, each logical processor responds to interrupts independently. The first logical processor can track one software thread, while the second logical processor tracks another software thread simultaneously. Because the two threads share the same execution resources, the second thread can use resources that would be otherwise idle if only one thread was executing. This results in an increased utilization of the execution resources within each physical processor.
I 'cut. of CSE
SNGCE. Kolencherv
WINDOWS SUPPORT FOR HT TECHNOLOGY
How Do Windows-Based Servers Recognize Processors with Hyper-Threading Technology
Windows-based servers receive processor information from the BIOS. Each server vendor creates their own BIOS using specifications provided by Intel. Assuming the BIOS is written according to Intel specifications, it begins counting processors using the first logical processor on each physical processor. Once it has counted a logical processor on all of the physical processors, it will count the second logical processor on each physical processor, and so on, as shown in Figure 1.
Figure 1: Numbers indicate the order in which logical processors are recognized by the BIOS when writterj according to Intel specifications. This example shows a four-way system enabled with Hyper-Threading Technology.
Windows 2000 Server Family and Hyper-threading Technology
Windows 2000 Server does not distinguish between physical and logical processors on systems enabled with Hyper-Threading Technology; Windows 2000 simply fills out the license limit using the first processors counted by the BIOS. For example, when you launch Windows 2000 Server (4-CPU limit) on a four-way system enabled with Hyper-Threading Technology, Windows will use the first logical processor on each of the four physical processors, as shown in Figure 2; the second logical processor on each physical processor will be unused, because of the 4-CPU license limit. (This assumes the BIOS was written according to Intel specifications. Windows uses the processor count and sequence indicated by the BIOS.)
Dent, of CSE
SNCTCF. KnlpnrhprM
Logical Processors
Figure 2: Numbers indicate the order in which logical processors are used by Windows 2000 Server (4-CPU limit) on a four-way system enabled with Hyper-Threading Technology. Assumes BIOS is written according to Intel specifications.
However, when you launch Windows 2000 Advanced Server (8-CPU limit) on a four-way system enabled with Hyper-Threading Technology, Windows will use all eight logical processors, as shown in Figure 3.
Figure 3: Numbers indicate the order in which logical processors are used by Windows 2000 Advanced Server (8-CPU limit) on a four-way system enabled with Hyper-Threading Technology. Assumes BIOS is written according to Intel specifications.
Logical Processors
Physical Processors
Although Windows recognizes all eight logical processors in this example, in most cases performance would be better using eight physical processors.
Windows .NET Server Family and Hyper-threading Technology
When examining the processor count provided by the BIOS, Windows .NET Server distinguishes between logical and physical processors, regardless of how they are counted by the BIOS. This provides a powerful advantage over Windows 2000, in that Windows .NET Server only treats physical processors as counting against the license limit. For example, if you launch Windows .NET Standard Server (2-CPU
limit) on a'two-way system enabled with Hyper-Threading Technology, Windows will use all four logical processors, as shown in Figure 4.
Figure 4: Numbers indicate the order in which logical processors are used by Windows .NET Standard Server (2-CPU limit) on a two-way system enabled with Hyper-Threading Technology. Assumes BIOS is written according to Intel specifications.
Logical Processors
Physical Processors
Windows Server Applications and Hyper-threading Technology
Regardless of whether an application has been specifically designed to take advantage of Hyper-Threading Technology, or even whether the application is multi-threaded, Intel expects the existing body of applications in the market today to run correctly on systems enabled with Hyper-Threading Technology without further modification, and without being recompiled.
THREAD SCHEDULING AND HYPER-THREADING
TECHNOLOGY
Operating systems schedule threads on available processors based on a "ready-to-run" criteria. The set of available threads is contained in a thread pool. A thread is ready-to-run if it has all the resources it needs, except the processor. Threads that are waiting for disk, memory, or other 10, are not in a ready-to-run state. In general, high priority threads will be selected over low priority threads. Over time, a low priority thread will become favored and will eventually be scheduled on an available processor.
In the case where there are more ready-to-run threads than logical processors, the operating system will select higher-priority threads to schedule for each available processor. The lower-priority threads will be delayed to allow the higher priority threads more execution time.
In the case where there are two ready-to run threads and two logical processors, the operating system schedules each thread on a logical processor. The two threads may contend for the same physical processor resources, because HT Technology shares physical resources without respect to thread priority. As the two threads contend for resources, the high priority thread will complete instructions slower than when it owns the processor's execution resources itself.
Thread priority boost is a condition where a lower priority thread is consuming the same CPU resources as a higher priority thread. Thread priority boost may cause inconsistent, sub-optimal, or even degraded performance of higher-priority threads on systems with Hyper-Threading Technology.
MULTITHREADING, HYPER-THREADING, MULTIPROCESSING: NOW, WHAT'S THE DIFFERENCE
Hyper-Threading Technology
Today's software consists of multiple threads or processes that can be executed in parallel. A Web server is a good example of an application that benefits from multi-threading, which allows it to serve multiple users concurrently. To fully exploit the benefits of multi-threading, hardware and software engineers have used many techniques over the years. The most straightforward method is to use a multiÃ‚Â¬processor system fitted with two or more processors. This method achieves true parallelism, but at the expense of increased cost.
The'second technique is chip multi-processing (CMP), where two processors are put on a single die. This technique is very similar to the first, where you have more than one physical processor, albeit each on a single die. However, this technique is also very expensive to implement, as the cost of manufacturing such a chip is high.
The third method is the most conventional and less expensive to implement. The technique is to use a single processor and the operating system to make use of time slicing to switch between threads. In most cases, time slicing is adequate and results in improved performance and throughput. However, time slicing could produce a penalty due to inefficient resource usage and the high cost of context switching. Context switching is the task performed by the CPU when it switches threads to execute different instructions. During this switching process, the CPU needs to save the state of the outgoing thread and load the state of the incoming thread. The CPU does not perform any useful work at this moment and hence context switching adds additional overhead to the execution time.
The fourth and last technique to be discussed is relatively new - Simultaneous Multi-Threading. Simultaneous multi-threading executes multiple threads on a processor without the need for switching. Hyper-Threading Technology enables software to implement this approach.
Hem nfCXF. SAmrF v^i^^nU^,
To understand how Hyper-Threading Technology works, let's take a look at the architecture of a conventional processor. A processor consists of mainly the Architecture State (Arch State) and Processor Execution Resources. The Arch State consists of registers, including the general-purposes registers, the control registers, the advanced programmable interrupt controller (APIC) registers, and some machine state registers. The Arch State defines and controls the environment of an executing thread or task. The Processor Execution Resources contain other resources such as caches, execution units, branch predictors, control logic, and buses.
Multithreading
As the demands on software grew, system programmers began to covet the time that processors wasted running a single thread while waiting for certain events to happen. When a program was waiting for a diskette drive to be ready or a user to type some information, programmers began to wonder if the processor could be doing other work. Under MS-DOS, the answer was unequivocally no. Instructions were executed sequentially, and if there was a pause in the thread of instructions, all downstream instructions had to wait for the pause to terminate. No magic or smoke and mirrors could get around this limitation.
To come up with a solution, software architects began writing operating systems that supported running pieces of programs, called threads. These multithreading operating systems made it possible for one thread to run while another was waiting for something to happen. On IntelÃ‚Â® processor-based PCs and servers, today's operating systems, such as Windows 2000 and Windows XP, all support multithreading. In fact, the operating systems themselves are multithreaded. Portions of them can run while other portions are stalled.
To benefit from multithreading, programs also need to be multithreaded themselves. That is, rather than being developed as a single long sequence of instructions, they are broken up into logical units whose execution is controlled by the mainline of the program. This allows, for example, Microsoft Word to repaginate a
m r'lisj
document while the user is typing. Repagination occurs on one thread and handling keystrokes occurs on another. On single processor systems, these threads are executed sequentially, not concurrently. The processor switches back and forth between the keystroke thread and the repagination thread quickly enough that both processes appear to occur simultaneously.
When dual-threaded programs are executing on a single processor machine, some overhead is incurred when switching between the threads. Because switching between threads costs time, it would appear that running the two threads this way is less efficient than running the two threads in succession. However, if either thread has to wait on a system device or the user, the ability to have the other thread continue operating compensates very quickly for all the overhead of the switching. And since one thread in our example handles user input, there will certainly be frequent periods when it is just waiting. By switching between threads, operating systems that support multithreaded programs can improve performance, even if they are running on a uniprocessor system.
Multiprocessing
Multiprocessing systems have multiple processors running at the same time. Traditional multiprocessing systems have anywhere from 2 to about 128 processors. Beyond that number multiprocessing systems become parallel processors. We will touch on those later.
Multiprocessing systems allow different threads to run on different processors. This capability considerably accelerates program performance. Now two threads can run more or less independently of each other without requiring thread switches to get at the resources of the processor. Multiprocessor operating systems are themselves multithreaded and they too generate threads that can run on the separate processors to best advantage.
n nt nfCXF
In the early days, there were two kinds of multiprocessing:
Â¢ Asymmetrical -
On asymmetrical systems, one or more processors were exclusively dedicated to specific tasks, such as running the operating system. The remaining processors were available for all other tasks, generally the user applications. It quickly became apparent that this configuration was not optimal. On some machines, the operating-system processors were running at 100% capacity, while the user-assigned processors were doing nothing.
Â¢ Symmetrical -
In short order, system designers came to favor an architecture that balanced the processing load better: Symmetrical Multiprocessing (SMP). The "symmetry" refers to the fact than any threadâ€be it from the operating system or the user applicationâ€can run on any processor. In this way, the total computing load is spread evenly across all computing resources.
Today, symmetrical multiprocessing systems are the norm, and asymmetrical
designs have nearly completely disappeared. SMP systems use double the number of
processors, and performance generally jumps 80% or more. Why doesn't performance
jump 100% Two factors come into play: the first is the way threads interact, the
other is systemoverhead
Thread interaction has two components:
1. How threads handle competition for the same resources
2. How threads communicate among themselves
When two threads both want access to the same resource, one of them has to wait. The resource can be a disk drive, a record in a database that another thread is writing to, or any of a myriad other features of the system. The penalties accrued when threads have to wait for each other are so steep that minimizing this delay is a central design issue for hardware installations and the software they run. It is generally the largest factor in preventing perfect scalability of performance of multiprocessing systems, because running threads that never contend for the same resource is effectively impossible
A second factor is thread synchronization. When a program is designed in threads, there are many occasions where the threads need to interact, and the interaction points require delicate handling. For example, if one thread is preparing data for another thread to process, delays can occur when the first thread does not have data ready when the processing thread needs it. More compelling examples occur when two threads need to share a common area of memory. If both threads can write to the same area in memory, then the thread that wrote first has to check that what it wrote has not been overwritten, or it must lock out other threads until it has finished with the data. This synchronization and inter-thread management is clearly an aspect that does not benefit from having more available processing resources.
System overhead is the thread management done by the operating system. The more processors are running, the more the operating system has to coordinate. As a result, each new processor adds incrementally to the system management work of the operating system. This means that each new processor will contribute less and less to the overall system performance.
W 1st SJ W~\ )
HYPER-THREADING TECHNOLOGY ARCHITECTURE
Hyper-Threading Technology makes a single physical processor appear as multiple logical processors . To do this, there is one copy of the architecture state for each logical processor, and the logical processors share a single set of physical execution resources. From a software or architecture perspective, this means operating systems and user programs can schedule processes or threads to logical processors as they would' on conventional physical processors in a multiprocessor system. From a micro architecture perspective, this means that instructions from logical processors will persist and execute simultaneously on shared execution resources.
| Arch State. |
Processor Execution Resources Processor Execution Resources
Ã‚Â¦
Figure 2: Processors without Hyper-Threading
Dert. of CSE
SNGCE fCnLnnhorv
As an example, Figure 2 shows a multiprocessor system with two physical processors that are not Hyper-Threading Technology-capable. Figure 3 shows a multiprocessor system with two physical processors that are Hyper-Threading Technology-capable. With two copies of the architectural state on each physical processor, the system appears to have four logical processors.
| Arch Slate || Arch StaLt | | Arch Slate || Arch State |
Processor Execution Resources Processor Execution; Resources
1 Ã‚Â¦
Figure 3: Processors with Hypei^TJirending Teclra oJttgy
Dcr>t. CSE
FIRST IMPLEMENTATION ON THE INTEL XEON PROCESSOR FAMILY
Several goals were at the heart of the micro architecture design choices made for the Intel. Xeon processor MP implementation of Hyper-Threading Technology. One goal was to minimize the die area cost of implementing Hyper-Threading Technology. Since the logical processors share the vast majority of micro architecture resources and only a few small structures were replicated, the die area cost of the first implementation was less than 5% of the total die area. A second goal was to ensure that when one logical processor is stalled the other logical processor could continue to make forward progress. A logical processor may be temporarily stalled for a variety of reasons, including servicing cache misses, handling branch mispredictions, or waiting for the results of previous instructions. Independent forward progress was ensured by managing buffering queues such that no logical processor can use all the entries when two active software threads2 were executing. This is accomplished by either partitioning or limiting the number of active entries each thread can have. A third goal was to allow a processor running only one active software thread to run at the same speed on a processor with Hyper-Threading Technology as on a processor without this capability. This means that partitioned resources should be recombined when only one software thread is active. A high-level view of the micro architecture pipeline is shown in Figure 4. As shown, buffering queues separate major pipeline logic blocks. The buffering queues are either partitioned or duplicated to ensure independent forward progress through each logic block.
FRONT END OF AN INTEL XEON PROCESSOR PIPELINE
The front end of the pipeline is responsible for delivering instructions to the later pipe stages. As shown in Figure 5a, instructions generally come from the Execution Trace Cache (TC), which is the primary or Level 1 (LI) instruction cache. Figure 5b shows that only when there is a TC miss does the machine fetch and decode instructions from the integrated Level 2 (L2) cache. Near the TC is the Microcode ROM, which stores decoded instructions for the longer and more complex IA-32 instructions.
Tint nfCSF
XK'GTF. Knlenrh,â€žÂ¢
Execution Trace Cache (TC)
The TC stores decoded instructions, called micro operations or "uops." Most instructions in a program are fetched and executed from the TC. Two sets of next-instruction-pointers independently track the progress of the two software threads executing. The two logical processors arbitrate access to the TC every clock cycle. If both logical processors want access to the TC at the same time, access is granted to one then the other in alternating clock cycles. For example, if one cycle is used to fetch a line for one logical processor, the next cycle would be used to fetch a line for the other logical processor, provided that both logical processors requested access to the trace cache. If one logical processor is stalled or is unable to use the TC, the other logical processor can use the full bandwidth of the trace cache, every cycle. The TC
entries are tagged with thread information and are dynamically allocated as needed. The TC is 8-way set associative, and entries are replaced based on a leastrecently-used (LRU) algorithm that is based on the full 8 ways. The shared nature of the TC allows one logical processor to have more entries than the other if needed.
Microcode ROM
When a complex instruction is encountered, the TC sends a microcode-instruction pointer to the Microcode ROM. The Microcode ROM controller then fetches the uops needed and returns control to the TC. Two microcode instruction pointers are used to control the flows independently if both logical processors are executing complex IA-32 instructions. Both logical processors share the Microcode ROM entries. Access to the Microcode ROM alternates between logical processors just as in the TC.
SNGCR. KnLnchcrv
ADVANTAGES OF HT TECHNOLOGY
Hyper-threading, officially called Hyper-Threading Technology (HTT), is
Intel's trademark for their implementation of the simultaneous multithreading technology on the Pentium 4 microarchitecture. It is basically a more advanced form of Super-threading that debuted on the Intel Xeon processors and was later added to Pentium 4 processors. The technology improves processor performance under certain workloads by providing useful work for execution units that would otherwise be idle, for example during a cache miss.
The advantages of ht technology are listed as: improved support for multiÃ‚Â¬threaded code, allowing multiple threads to run simultaneously, improved reaction and response time, and increased number of users a server can support.
According to Intel, the first implementation only used an additional 5% of the die area over the "normal" processor, yet yielded performance improvements of 15Ã‚Â¬30%.
Intel claims up to a 30% speed improvement compared against an otherwise identical, non-SMT Pentium 4. The performance improvement seen is very application-dependent, however, and some programs actually slow down slightly when Hyper Threading Technology is turned on. This is due to the replay system of the Pentium 4 tying up valuable execution resources, thereby starving the other thread. However, any performance degradation is unique to the Pentium 4 (due to various architectural nuances), and is not characteristic of simultaneous multithreading in general.
Hyper threading allows the operating system to see two logical processors rather than the one physical processor present.
Hyper-Threading works by duplicating certain sections of the processorâ€ those that store the architectural stateâ€but not duplicating the main execution resources. This allows a Hyper-Threading equipped processor to pretend to be two "logical" processors to the host operating system, allowing the operating system to
schedule two threads or processes simultaneously. Where execution resources in a non-Hyper-Threading capable processor are not used by the current task, and especially when the processor is stalled, a Hyper-Threading equipped processor may use those execution resources to execute the other scheduled task. (The processor may stall due to a cache miss, branch misprediction, or data dependency.)
Except for its performance implications, this innovation is transparent to operating systems and programs. All that is required to take advantage of Hyper-Threading is symmetric multiprocessing (SMP) support in the operating system, as the logical processors appear as standard separate processors.
APPLICATIONS OF HYPER-THREADING TECHNOLOGY
Enterprise, e-Business, and gaming software applications continue to put higher demands on processors. To improve performance in the past, threading was enabled in the software by splitting instructions into multiple streams so that multiple processors could act upon them. Hyper-Threading Technology (HT Technology)"!" provides thread-level parallelism on each processor, resulting in more efficient use of processor resources, higher processing throughput, and improved performance on today's multithreaded software. The combination of an IntelÃ‚Â® processor and chipset that support HT Technology, an operating system that includes optimizations for HT Technology, and a BIOS that supports HT Technology and has it enabled, delivers increased system performance and responsiveness.
Hyper-Threading Technology for Business Desktop PCs
HT Technology helps desktop users get more performance out of existing software in multitasking environments. Many applications are already multithreaded and will automatically benefit from this technology. Business users can run demanding desktop applications simultaneously while maintaining system responsiveness. IT departments can deploy desktop background services that make their environments more secure, efficient and manageable, while minimizing the impact on end-user productivity and providing headroom for future business growth and new solution capabilities.
Hyper-Threading Technology for Gaming and Video
The IntelÃ‚Â® PentiumÃ‚Â® processor Extreme Edition combines HT Technology with dual-core processing to give people PCs capable of handling four software threads. HT Technology enables gaming enthusiasts to play the latest titles and experience ultra realistic effects and game play. And multimedia enthusiasts can create, edit, and encode graphically intensive files while running a virus scan in the background.
Tient nfCSF.
Hyper-Threading Technology for Servers
With HT Technology, multithreaded server software applications can execute threads in parallel within each processor in a server platform. Select products from the IntelÃ‚Â® XeonÃ‚Â® processor family use HT Technology to increase compute power and throughput for today's Web-based and enterprise server applications.
Hyper-Threading Technology Benefits for Enterprise and e-Business
Ã‚Â¦ Enables more user support, improving business productivity.
Ã‚Â¦ Provides faster response times for Internet and e-Business applications, enhancing customer experiences
Ã‚Â¦ Increases the number of transactions that can be processed.
Ã‚Â¦ Allows compatibility with existing IA-32 applications and operating systems.
Ã‚Â¦ Handles larger workloads.
fhnt of CSE
CONCLUSION
Hyper-Threading Technology is a technology that enables a single processor to run two separate threads simultaneously. Although several chip manufacturers have announced their intentions to ship processors with this capability, only Intel has done so at this time. The reason in part stems from a design change Intel made in the release of the PentiumÃ‚Â® Pro processor in 1995. The company added multiple execution units to the processor. Even though the chip could execute only one thread, the multiple execution units enabled some instructions to be executed out of order. As the processor handled the main instructions, a look-ahead capability recognized upcoming instructions that could be executed out of order on the other execution pipelines and their results folded back into the stream of executed instructions when their turn came up. This facility made for a more optimized flow of executed instructions. It also was used to speculatively execute instructions from a branch in an upcoming "if test. When the mainline hit the test, results of pre-executed instructions from the correct branch would be used. If the speculation had pre-executed the wrong branch, those instructions were simply discarded.
It is important to note that though the Hyper-Threading Technology gives the operating system the impression that it is running on a multi-processor system, its performance does not exactly duplicate a true multi-processor system. However, there is significant improvement in performance over a conventional processor (up to about 30%), and the slight increases in die size (around 5%) and cost make Hyper-Threading Technology a cost-effective solution. The Intel XeonÃ‚Â® processor family is the first to implement the Hyper-Threading Technology. Hyper-Threading Technology is available on Intel desktop platforms as well.
D&L of CSE
SNGCE. Kalcttcherv
FUTURE SCOPE
Older Pentium 4 based MPUs use Hyper-Threading, but the current-generation cores, Merom, Conroe and Woodcrest, do not. Hyper-Threading is a specialized form of simultaneous multithreading, which has been said to be on Intel roadmaps for the generation after Merom, Conroe and Woodcrest.
While some have alleged that Hyper-Threading is somehow energy inefficient and claim this to be the reason Intel dropped it from their new cores, this is almost certainly not the case. A number of low-power chips do use multithreading, including the PPE from the Cell processor, the CPUs in the Playstation 3, Sun Microsystem's Niagara and the MIPS 34K.
Multiprocessing systems run threads on separate processors. Systems with Hyper-Threading Technology run two threads on one chip. IntelÃ‚Â® XeonÃ‚Â® processor-based servers combine both technologies. They run two Hyper-Threading Technology enabled processors on the same machine. This creates a machine with four concurrent threads executing. If the instructions are scheduled correctlyâ€and the operating systems are tuned for hyper-threadingâ€the machines get enormous processing capability, and thread-heavy applications like Java* Virtual Machines run considerably faster.
At the end of 2002, all Intel Xeon processors are implemented with Hyper-Threading Technology, and Intel has announced that its desktop processors will support it next. As a result, the multithreading issues and opportunities that hyper-threading provides will become universal programming aspects during the next few years. The similarities between these technologies is further underscored by the fact that Intel and the vendors behind the OpenMP initiative are porting this parallel processing technology to hyper-threaded systems to extract the greatest possible benefit from the multiple processing pipelines.
n rrvrr
BIBLIOGRAPHY
1. A. Agarwal, B.H. Lim, D. Kranz and J. Kubiatowicz, "APRIL: A processor Architecture for Multiprocessing,
2. R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porter, and B. Smith, "The TERA Computer System,"
3. google.com.
4. seminars4u.com.
5. CONTENTS
6. INTRODUCTION 1
7. THE TECHNIQUES BEFORE HYPER THREADING 2
8. WHAT IS HYPER-THREADING 4
9. WINDOWS SUPPORT FOR HT TECHNOLOGY 6
THREAD SCHEDULING AND HYPER-THREADING
10. TECHNOLOGY 9
MULTITHREADING, HYPER-THREADING,
11. MULTIPROCESSING: NOW, WHAT'S THE DIFFERENCE 10
12.
13.
14. HYPER-THREADING TECHNOLOGY ARCHITECTURE 15
15.
16.
17. FIRST IMPLEMENTATION ON THE INTEL XEON PROCESSOR
18. FAMILY 17
19. ADVANTAGES OF HYPER THREADING TECHNOLOGY 20
20. APPLICATIONS OF HYPER THREADING TECHNOLOGY 22
21. CONCLUSION 24
22. FUTURE SCOPE 25
23. BIBLIOGRAPHY 26
24. Dept. of CSE

Important Note..!

ASK HERE