ASK HERE

31-12-2009, 06:45 PM

[attachment=942]
ABSTRACT
Hyper-Threading Technology is a groundbreaking innovation from IntelÃ‚Â® Corporation that enables multi-threaded software applications to execute threads in parallel This level of threading technology has never been seen before in a general-purpose microprocessor. Internet, e-Business, and enterprise software applications continue to put higher demands on processors.
To improve performance in the past, threading was enabled in the software by splitting instructions into multiple streams so that multiple processors could act upon them.Today with Hyper-Threading Technology, processor-level threading can be utilized which offers more efficient use of processor resources for greater parallelism and improved performance on today's multi-threaded software. Hyper-Threading Technology provides thread-level-parallelism (TLP) on each processor resulting in increased utilization of processor execution resources. As a result, resource utilization yields higher processing throughput. Hyper-Threading Technology is a form of simultaneous multi-threading technology (SMT) where multiple threads of software applications can be run simultaneously on one processor.

This technology is largely invisible to the platform. In fact, many applications are already multi-threaded and will automatically benefit from this technology. Today's multi-processing aware software is also compatible with Hyper-Threading Technology enabled platforms, but further performance gains can be realized by specifically tuning software for Hyper-Threading Technology. This technology complements traditional multi-processing by providing additional headroom for future software optimizations and business growth.

INTRODUCTION
Hyper-Threading (HT) Technology is ground breaking technology from Intel that allows processors to work more efficiently. This new technology enables the processor to execute two series, or threads, of instructions at the same time, thereby improving performance and system responsiveness while delivering performance headroom for the future.
Intel Hyper-Threading Technology improves the utilization of onboard resources so that a second thread can be processed in the same processor. Hyper-Threading Technology provides two logical processors in a single processor package.

Hyper-Threading Technology offers:

Â¢ improved overall system performance
Â¢ increased number of users a platform can support
Â¢ improved reaction and response time because tasks can be run on separate threads
Â¢ increased number of transaction that can be executed
Â¢ compatibility with existing IA-32 software
Code written for dual-processor (DP) and multi-processor (MP) systems is compatible with Intel Hyper-Threading Technology-enabled platforms. A Hyper-Threading Technology-enabled system will automatically process multiple threads of multi-threaded code.
UTILIZATION OF PROCESSOR RESOURCES
Intel Hyper-Threading Technology improves performance of multi-threaded applications by increasing the utilization of the on-chip resources available in the IntelÃ‚Â® NetBurstâ€žÂ¢ microarchitecture. The Intel NetBurst microarchitecture provides optimal performance when executing a single instruction stream. A typical thread of code with a typical mix of IntelÃ‚Â® IA-32-based instructions, however, utilizes only about 35 percent of the Intel NetBurst microarchitecture execution resources.
By adding the necessary logic and resources to the processor die in order to schedule and control two threads of code, Intel Hyper-Threading Technology makes these underutilized resources available to a second thread of code, offering increased throughput and overall system performance.
Hyper-Threading Technology provides a second logical processor in a single package for higher system performance. Systems containing multiple Hyper-Threading Technology-enabled processors further improve system performance, processing two code threads for each processor.

HYPER-THREADING TECHNOLOGY IMPROVES PERFORMANCE
Intel Hyper-Threading Technology offers better performance improvement as additional processors are added. Multi-processor systems with Hyper-Threading Technology can outperform multi-processor systems without Hyper-Threading Technology.

MULTI-THREADED APPLICATIONS
Virtually all contemporary operating systems (including Microsoft Windows* and Linux*) divide their workload up into processes and threads that can be independently scheduled and dispatched. The same division of workload can be found in many high-performance applications such as database engines, scientific computation programs, engineering-workstation tools, and multi-media programs.

To gain access to increased processing power, programmers design these programs to execute in dual-processor (DP) or multiprocessor (MP) environments. Through the use of symmetric multiprocessing (SMP), processes and threads can be dispatched to run on a pool of several physical processors. With multi-threaded, MP-aware applications, instructions from several threads are simultaneously dispatched for execution by the processors' core. In processors with Hyper-Threading Technology, a single processor core executes these two threads concurrently, using out-of-order instruction scheduling to keep as many of its execution units as possible busy during each clock cycle.
IntelÃ‚Â® NetBurstâ€žÂ¢ Microarchitecture Pipeline
Without Hyper-Threading Technology enabled, the Intel NetBurst microarchitecture processes a single thread through the pipeline. Recall that the typical mix of typical instructions only utilizes about 35% of the resources in the Intel NetBurst microarchitecture.
MULTIPROCESSOR PERFORMANCE ON A SINGLE PROCESSOR
The Intel Xeon processor introduces a new technology called Hyper-Threading (HT) that, to the operating system, makes a single processor behave like two logical processors. When enabled, the technology allows the processor to execute multiple threads simultaneously, in parallel within each processor, which can yield significant performance improvement. We set out to quantify just how much improvement you can expect to see. The current Linux symmetric multiprocessing (SMP) kernel at both the 2.4 and 2.5 versions was made aware of Hyper-Threading, and performance speed-up had been observed in multithreaded benchmarks
This article gives the results of our investigation into the effects of Hyper-Threading (HT) on the Linux SMP kernel. It compares the performance of a Linux SMP kernel that was aware of Hyper-Threading to one that was not. The system under test was a multithreading-enabled, single-CPU Xeon. The benchmarks used in the study covered areas within the kernel that could be affected by Hyper-Threading, such as the scheduler, low-level kernel primitives, the file server, the network, and threaded support.
The results on Linux kernel 2.4.19 show Hyper-Threading technology could improve multithreaded applications by 30%. Current work on Linux kernel 2.5.32 may provide performance speed-up as much as 51%.
HYPER-THREADING SPEEDS LINUX

Intel's Hyper-Threading Technology enables two logical processors on a single physical processor by replicating, partitioning, and sharing the resources within the Intel NetBurst microarchitecture pipeline.
Hyper-Threading support in the Xeon processor
The Xeon processor is the first to implement Simultaneous Multi-Threading (SMT) in a general-purpose processor. To achieve the goal of executing two threads on a single physical processor, the processor simultaneously maintains the context of multiple threads that allow the scheduler to dispatch two potentially independent threads concurrently.
The operating system (OS) schedules and dispatches threads of code to each logical processor as it would in an SMP system. When a thread is not dispatched, the associated logical processor is kept idle.
When a thread is scheduled and dispatched to a logical processor, LP0, the Hyper-Threading technology utilizes the necessary processor resources to execute the thread.
When a second thread is scheduled and dispatched on the second logical processor, LP1, resources are replicated, divided, or shared as necessary in order to execute the second thread. Each processor makes selections at points in the pipeline to control and process the threads. As each thread finishes, the operating system idles the unused processor, freeing resources for the running processor.
The OS schedules and dispatches threads to each logical processor, just as it would in a dual-processor or multi-processor system. As the system schedules and introduces threads into the pipeline, resources are utilized as necessary to process two threads.
Hyper-Threading support in Linux kernel 2.4
Under the Linux kernel, a Hyper-Threaded processor with two virtual processors is treated as a pair of real physical processors. As a result, the scheduler that handles SMP should be able to handle Hyper-Threading as well. The support for Hyper-Threading in Linux kernel 2.4.x began with 2.4.17 and includes the following enhancements:
Kernel performance measurement
To assess the effects of Hyper-Threading on the Linux kernel, we measured the performance of kernel benchmarks on a system containing the Intel Xeon processor with HT. The hardware was a single-CPU, 1.6 GHz Xeon MP processor with SMT, 2.5 GB of RAM, and two 9.2 GB SCSI disk drives. The kernel under measurement was stock version 2.4.19 configured and built with SMP enabled. The existence of Hyper-Threading support can be seen by using the command cat /proc/cpuinfo to show the presence of two processors, processor 0 and processor 1. Note the ht flag in Listing 1 for CPUs 0 and 1. In the case of no Hyper-Threading support, the data will be displayed for processor 0 only.
Linux kernel benchmarks
To measure Linux kernel performance, five benchmarks were used: LMbench, AIM Benchmark Suite IX (AIM9), chat, dbench, and tbench. The LMbench benchmark times various Linux application programming interfaces (APIs), such as basic system calls, context switching latency, and memory bandwidth. The AIM9 benchmark provides measurements of user application workload. The chat benchmark is a client-server workload modeled after a chat room. The dbench benchmark is a file server workload, and tbench is a TCP workload. Chat, dbench, and tbench are multithreaded benchmarks, while the others are single-threaded benchmarks.
Effects of Hyper-Threading on Linux APIs
The effects of Hyper-Threading on Linux APIs were measured by LMbench, which is a microbenchmark containing a suite of bandwidth and latency measurements. Among these are cached file read, memory copy (bcopy), memory read/write (and latency), pipe, context switching, networking, filesystem creates and deletes, process creation, signal handling, and processor clock latency. LMbench stresses the following kernel components: scheduler, process management, communication, networking, memory map, and filesystem. The low level kernel primitives provide a good indicator of the underlying hardware capabilities and performance.

To study the effects of Hyper-Threading, we focused on latency measurements that measure time of message control, (in other words, how fast a system can perform some operation). The latency numbers are reported in microseconds per operation.
Effects of Hyper-Threading on Linux single-user application workload
The AIM9 benchmark is a single user workload designed to measure the performance of hardware and operating systems . Most of the tests in the benchmark performed identically in Hyper-Threading and non-Hyper-Threading, except for the sync file operations and Integer Sieves. The three operations, Sync Random Disk Writes, Sync Sequential Disk Writes, and Sync Disk Copies, are approximately 35% slower in Hyper-Threading. On the other hand, Hyper-Threading provided a 60% improvement over non-Hyper-Threading in the case of Integer Sieves.
Effects of Hyper-Threading on Linux multithreaded application workload

To measure the effects of Hyper-Threading on Linux multithreaded applications, we use the chat benchmark, which is modeled after a chat room. The benchmark includes both a client and a server. The client side of the benchmark will report the number of messages sent per second; the number of chat rooms and messages will control the workload. The workload creates a lot of threads and TCP/IP connections, and sends and receives a lot of messages. It uses the following default parameters:
Number of chat rooms = 10
Number of messages = 100
Message size = 100 bytes
Number of users = 20

By default, each chat room has 20 users. A total of 10 chat rooms will have 20x10 = 200 users. For each user in the chat room, the client will make a connection to the server. So since we have 200 users, we will have 200 connections to the server. Now, for each user (or connection) in the chat room, a "send" thread and a "receive" thread are created. Thus, a 10-chat-room scenario will create 10x20x2 = 400 client threads and 400 server threads, for a total of 800 threads. But there's more.
Each client "send" thread will send the specified number of messages to the server. For 10 chat rooms and 100 messages, the client will send 10x20x100 = 20,000 messages. The server "receive" thread will receive the corresponding number of messages. The chat room server will echo each of the messages back to the other users in the chat room. Thus, for 10 chat rooms and 100 messages, the server "send" thread will send 10x20x100x19 or 380,000 messages. The client "receive" thread will receive the corresponding number of messages.
Effects of Hyper-Threading on Linux multithreaded file server workload
The effect of Hyper-Threading on the file server was measured with dbench and its companion test, tbench. dbench is similar to the well known NetBench benchmark from the Ziff-Davis Media benchmark program, which lets you measure the performance of file servers as they handle network file requests from clients. However, while NetBench requires an elaborate setup of actual physical clients, dbench simulates the 90,000 operations typically run by a NetBench client by sniffing a 4 MB file called client.txt to produce the same workload. The contents of this file are file operation directives such as SMBopenx, SMBclose, SMBwritebraw, SMBgetatr, etc. Those I/O calls correspond to the Server Message Protocol Block (SMB) that the SMBD server in SAMBA would produce in a netbench run. The SMB protocol is used by Microsoft Windows 3.11, NT and 95/98 to share disks and printers.

In our tests, a total of 18 different types of I/O calls were used including open file, read, write, lock, unlock, get file attribute, set file attribute, close, get disk free space, get file time, set file time, find open, find next, find close, rename file, delete file, create new file, and flush file buffer.
dbench can simulate any number of clients without going through the expense of a physical setup. dbench produces only the filesystem load, and it does no networking calls. During a run, each client records the number of bytes of data moved and divides this number by the amount of time required to move the data. All client throughput scores are then added up to determine the overall throughput for the server. The overall I/O throughput score represents the number of megabytes per second transferred during the test. This is a measurement of how well the server can handle file requests from clients.
dbench is a good test for Hyper-Threading because it creates a high load and activity on the CPU and I/O schedulers. The ability of Hyper-Threading to support multithreaded file serving is severely tested by dbench because many files are created and accessed simultaneously by the clients. Each client has to create about 21 megabytes worth of test data files. For a test run with 20 clients, about 420 megabytes of data are expected. dbench is considered a good test to measure the performance of the elevator algorithm used in the Linux filesystem. dbench is used to test the working correctness of the algorithm, and whether the elevator is aggressive enough. It is also an interesting test for page replacement.
tbench
tbench is another file server workload similar to dbench. However, tbench produces only the TCP and process load. tbench does the same socket calls that SMBD would do under a netbench load, but tbench does no filesystem calls. The idea behind tbench is to eliminate SMBD from the netbench test, as though the SMBD code could be made fast. The throughput results of tbench tell us how fast a netbench run could go if we eliminated all filesystem I/O and SMB packet processing. tbench is built as part of the dbench package.
Hyper-Threading support in Linux kernel 2.5.x
Linux kernel 2.4.x was made aware of HT since the release of 2.4.17. The kernel 2.4.17 knows about the logical processor, and it treats a Hyper-Threaded processor as two physical processors. However, the scheduler used in the stock kernel 2.4.x is still considered naive for not being able to distinguish the resource contention problem between two logical processors versus two separate physical processors.
Consider a system with two physical CPUs, each of which provides two virtual processors. If there are two tasks running, the current scheduler would let them both run on a single physical processor, even though far better performance would result from migrating one process to the other physical CPU. The scheduler also doesn't understand that migrating a process from one virtual processor to its sibling (a logical CPU on the same physical CPU) is cheaper (due to cache loading) than migrating it across physical processors.
HT-aware passive load-balancing:
The IRQ-driven balancing has to be per-physical-CPU, not per-logical-CPU. Otherwise, it might happen that one physical CPU runs two tasks while another physical CPU runs no task; the stock scheduler does not recognize this condition as "imbalance." To the scheduler, it appears as if the first two CPUs have 1-1 task running while the second two CPUs have 0-0 tasks running. The stock scheduler does not realize that the two logical CPUs belong to the same physical CPU.
"Active" load-balancing:
This is when a logical CPU goes idle and causes a physical CPU imbalance. This is a mechanism that simply does not exist in the stock 1:1 scheduler. The imbalance caused by an idle CPU can be solved via the normal load-balancer. In the case of HT, the situation is special because the source physical CPU might have just two tasks running, both runnable. This is a situation that the stock load-balancer is unable to handle, because running tasks are hard to migrate away. This migration is essential -- otherwise a physical CPU can get stuck running two tasks while another physical CPU stays idle.
HT-aware task pickup:
When the scheduler picks a new task, it should prefer all tasks that share the same physical CPU before trying to pull in tasks from other CPUs. The stock scheduler only picks tasks that were scheduled to that particular logical CPU.
HT-aware affinity:
Tasks should attempt to "stick" to physical CPUs, not logical CPUs.
HT-aware wakeup:
The stock scheduler only knows about the "current" CPU, it does not know about any sibling. On HT, if a thread is woken up on a logical CPU that is already executing a task, and if a sibling CPU is idle, then the sibling CPU has to be woken up and has to execute the newly woken-up task immediately.

EACH PROGRAM HAS A MIND OF ITS OWN
The OS and system hardware not only cooperate to fool the user about the true mechanics of multi-tasking, but they cooperate to fool each running program as well. While the user thinks that all of the currently running programs are being executed simultaneously, each of those programs thinks that it has a monopoly on the CPU and memory. As far as a running program is concerned, it's the only program loaded in RAM and the only program executing on the CPU. The program believes that it has complete use of the machine's entire memory address space and that the CPU is executing it continuously and without interruption. Of course, none of this is true. The program actually shares RAM with all of the other currently running programs, and it has to wait its turn for a slice of CPU time in order to execute, just like all of the other programs on the system.
A few terms: process, context, and thread
Before continuing our discussion of multiprocessing, let's take a moment to unpack the term "program" a bit more. In most modern operating systems, what users normally call a program would be more technically termed a process. Associated with each process is a context, "context" being just a catch-all term that encompasses all the information that completely describes the process's current state of execution (e.g. the contents of the CPU registers, the program counter, the flags, etc.).
Processes are made up of threads, and each process consists of at least one thread: the main thread of execution. Processes can be made up of multiple threads, and each of these threads can have its own local context in addition to the process's context, which is shared by all the threads in a process. In reality, a thread is just a specific type of stripped-down process, a "lightweight process," and because of this throughout the rest of this article I'll use the terms "process" and "thread" pretty much interchangeably.
Even though threads are bundled together into processes, they still have a certain amount of independence. This independence, when combined with their lightweight nature, gives them both speed and flexibility. In an SMP system like the ones we'll discuss in a moment, not only can different processes run on different processors, but different threads from the same process can run on different processors. This is why applications that make use of multiple threads see performance gains on SMP systems that single-threaded applications don't

IMPLEMENTING HYPER-THREADING
Although hyper-threading might seem like a pretty large departure from the kind of conventional, process-switching multithreading done on a single-threaded CPU, it actually doesn't add too much complexity to the hardware. Intel reports that adding hyper-threading to their Xeon processor added only %5 to its die area.
Intel's Xeon is capable of executing at most two threads in parallel on two logical processors. In order to present two logical processors to both the OS and the user, the Xeon must be able to maintain information for two distinct and independent thread contexts. This is done by dividing up the processor's microarchitectural resources into three types: replicated, partitioned, and shared.

WHAT HYPERTHREADING CAN (AND CAN'T)
DO FOR YOU
Hyper threading . The word alone sounds like a marketing tactic, an esoteric feature designed to convince OEMs and end users to upgrade to the latest and greatest Intel-based systems. And to some extent, hyperthreading is exactly that. With returns diminishing on increased clock speeds and memory caches for the average user, But at the same time, hyperthreading solves a real computing problem. And its implementation in the Pentium 4 and succeeding generations of desktop processors could spark a quiet revolution in how software is designed
What Does It Do?
In a nutshell, hyperthreading, or simultaneous multithreading, makes a single physical processor appear to an operating system as two logical processors. Most software today is threaded -- that is, instructions are split into multiple streams so that multiple processors can act on them. With hyperthreading, a single processor can handle those multiple streams, or threads, as if it were two processors. Without hyperthreading, a chip would have to process the two threads sequentially rather than simultaneously. Or it might perform time-slicing, in which a processor rapidly shifts between running various threads at a fixed interval, Intel spokesperson George Alfs said.

In the Chips
The first Intel chips to take advantage of hyperthreading were Xeon server processors. But in November 2002, Intel brought hyperthreading to the desktop with its 3.06 GHz Pentium 4. "We will be providing this technology in additional SKUs over time," Alfs told NewsFactor. "We intend to have hyperthreading in a majority of our desktop Pentium 4 processors." chief research officer Peter Kastner said he expected such a move from the company. "Intel has hinted that it will push hyperthreading technology throughout its Pentium line, making it available to most PC buyers, not just at the top end," he told NewsFactor.
Software Support
Of course, microprocessor improvements mean nothing without software that can take advantage of them. For hyperthreading, software support is in the early stages. "Buying the Pentium 4 with hyperthreading will be an increasingly smart decision over the life of the desktop," Kastner said. "While many applications are not optimized for hyperthreading today, we expect that as new releases come out, hyperthreading will become a standard feature." For software to benefit from hyperthreading, the program must support multithreaded execution -- that is, it must allow two distinct tasks to be executed at the same time, vice president Steve Kleynhans told NewsFactor.

Two Paths
There are two ways to achieve this goal. The first is to write an application that is specifically designed to be multithreaded. The second is to run two independent applications at the same time. "People are running multiple, mixed loads of applications on their desktops," Kleynhans said. "Many of those are background tasks." Both Home and Professional Editions support hyperthreading out of the box. Numerous other multithreaded applications also can get a boost from Intel's hyperthreading feature, particularly content creation applications, such as Photoshop, and video and audio encoding

INTEL INNOVATION COULD DOUBLE CHIP POWER

Intel showed off a new chip technology will allow one chip to act like two.
Called "hyperthreading," the new technology essentially takes advantage of formerly unused circuitry on the Pentium 4 that lets the chip operate far more efficiently--and almost as well as a dual-processor computer. With it, a desktop can run two different applications simultaneously or run a single application much faster than it would on a standard one-processor box.
"It makes a single processor look like two processors to the operating system," said Shannon Poulin, enterprise launch and disclosure manager at Intel. "It effectively looks like two processors on a chip."
Paul Otellini, general manager of the Intel Architecture Group, demonstrated the hyperthreading technology at the Intel Developer's Forum. They showed off a 3.5GHz Pentium 4 running the computer game "Quake 3" and managing four different video streams simultaneously. The Pentium 4 demonstration didn't depend on Hyper-Threading; instead, it came out as part of Intel's effort to show how consumers and software developers will continue to need faster PCs. "There are a lot of tremendous applications on the horizon that will consume the MIPS (millions of instructions per second)," Otellini said. "Gigahertz are necessary for the evolution and improvement of computing."
Technically, hyperthreading takes advantage of additional registers--circuits that help manage data inside a chip--that come on existing Pentium 4's but aren't used. Through these registers, the processor can handle more tasks at once by taking better advantage of its own resources. The chip can direct instructions from one application on its floating-point unit, which is where the heavy math is done, and run parts of another application through its integer unit. A chip with hyperthreading won't equal the computing power of two Pentium 4's, but the performance boost is substantial, Poulin said. A workstation with hyperthreaded Xeon chips running Alias-Wavefront, a graphics application, has achieved a 30 percent improvement in tests, he said. Servers with hyperthreaded chips can manage 30 percent more users.
Will developers climb aboard?
The open question is whether software developers will latch onto the idea. Software applications will need to be rewritten to take advantage of hyperthreading, and getting developers to tweak their products can take an enormous amount of time. Intel, for instance, has been working for well over a year to get developers to rewrite their programs to take full advantage of the features of the Pentium 4, which has been out for approximately nine months. The company even changed the migration program to speed the process of optimizing Pentium III applications for the Pentium 4.
Still, to date, only 30 applications have been enhanced to take full advantage of the Pentium 4, according to Louis Burns, vice president and general manager of the Desktop Platforms Group at Intel. But more are on the way, he said. Otellini acknowledged that recruiting developers will take time.
"The real key is going to be to get the applications threaded, and that takes a lot of work," he said. Nonetheless, adopting the technology to server and workstations applications should be fairly easy if the application already runs on dual-processor systems, other Intel officials said. "Thread your applications and drivers and OSes to take advantage of this relatively free performance," Otellini asked developers during his speech.
Hyperthreading, which will appear in servers and workstations in 2002 and desktops in 2003, is part of an overall Intel strategy to find new ways to squeeze more performance out of silicon. For years, the company has largely relied on boosting the clock speed and tweaking parts of the chip's architecture to eke out gains. The performance gains to be achieved from boosting the clock speed, however, are limited. In all practicality, most users won't experience that much realistic difference between a 1GHz computer and one that contains a 2GHz chip, according to, among others, Dean McCarron, an analyst at Mercury Research.
Ideally, hyperthreading, which has been under development for four and a half years, will show meatier benefits. An individual could play games while simultaneously downloading multimedia files from the Internet with a computer containing the technology, Poulin predicted. Hyperthreaded chips would also be cheaper than dual-processor computers. "You only need one heat sink, one fan, one cooling solution," he said, along with, of course, one chip. Chips running hyperthreading have been produced, and both Microsoft's Windows XP and Linux can take advantage of the technology, according to Poulin. Computers containing a single hyperthreaded chip differ from dual-processor computers in that two applications can't take advantage of the same processor substructure at the same time. "Only one gets to use the floating point at a single time," Poulin said.
On other fronts, Intel on Tuesday also unveiled Machine Check Architecture, which allows servers to catch data errors more efficiently. The company will also demonstrate McKinley for the first time. McKinley is the code name for the next version of Itanium, Intel's 64-bit chip that competes against Sun's UltraSparc. McKinley is due in demonstration systems by the end of this year.
CONCLUSION
Intel Xeon Hyper-Threading is definitely having a positive impact on Linux kernel and multithreaded applications. The speed-up from Hyper-Threading could be as high as 30% in stock kernel 2.4.19, to 51% in kernel 2.5.32 due to drastic changes in the scheduler run queue's support and Hyper-Threading awareness. Today with Hyper-Threading Technology, processor-level threading can be utilized which offers more efficient use of processor resources for greater parallelism and improved performance on today's multi-threaded software.

REFERENCES
Â¢ google.com
Â¢ intel.com
Â¢ arstechnica.com
CONTENTS
1. INTRODUCTION
2. UTILIZATION OF PROCESSOR RESOURCES
3. HYPER-THREADING TECHNOLOGY IMPROVES PERFORMANCE
4. MULTI-THREADED APPLICATIONS
5. .MULTIPROCESSOR PERFORMANCE ON A SINGLE PROCESSOR
6. HYPER-THREADING SPEEDS LINUX
7. EACH PROGRAM HAS A MIND OF ITS OWN
8. IMPLEMENTING HYPER-THREADING
9. WORKING OF HYPERTHREADING
10. WHAT HYPERTHREADING CAN (AND CAN'T) DO FOR YOU
11. INTEL INNOVATION COULD DOUBLE CHIP POWER
12. CONCLUTION
ACKNOWLEDGEMENTS

I express my sincere thanks to Prof. M.N Agnisarman Namboothiri (Head of the Department, Computer Science and Engineering, MESCE), Mr. Zainul Abid (Staff incharge) for their kind co-operation for presenting the seminars.
I also extend my sincere thanks to all other members of the faculty of Computer Science and Engineering Department and my friends for their co-operation and encouragement.

project report tiger · 01-03-2010, 11:27 PM

A SEMINAR REPORT ON
HYPER THREADING TECHNOLOGY

Submitted By:
Megha
Msc Comp Sc.

Abstract:
Intel Pentium 4 processor that incorporates Hyper-Threading Technology
Hyper-threading works by duplicating certain sections of the processorâ€those that store the architectural stateâ€but not duplicating the main execution resources. This allows a hyper-threading processor to appear as two "logical" Hyper threading technology
Hyper-threading (officially Hyper-Threading Technology, and abbreviated HT Technology, HTT or HT) is Intel's term for its simultaneous multithreading implementation in their Xeon, Pentium 4, Atom, Core i3, Core i5 and Core i7 CPUs.
Hyper-threading is an Intel-proprietary technology used to improve parallelization of computations (doing multiple tasks at once) performed on PC microprocessors. For each processor core that is physically present, the operating system addresses two virtual processors, and shares the workload between them when possible.
Hyper-threading requires only that the operating system support multiple processors, and Intel recommends disabling HTT when using operating systems that have not been optimized for the technology.
Another definition of Hyper Threading Technology
A technology developed by Intel that enables multithreaded software applications to execute threads in parallel on a single multi-core processor instead of processing threads in a linear fashion. Older systems took advantage of dual-processing threading in software by splitting instructions into multiple streams so that more than one processor could act upon them at once.
Performance
The advantages of hyper-threading are listed as: improved support for multi-threaded code, allowing multiple threads to run simultaneously, improved reaction and response time.
According to Intel the first implementation only used 5% more die area than the comparable non-hyperthreaded processor, but the performance was 15â€œ30% better.
Intel claims up to a 30% performance improvement compared with an otherwise identical, non-simultaneous multithreading Pentium 4. Intel also claims significant performance improvements with a hyper-threading-enabled Pentium 4 processor in some artificial intelligence algorithms. The performance improvement seen is very application-dependent, however, and some programs actually slow down slightly when Hyper Threading Technology is turned on. This is due to the replay system of the Pentium 4 tying up valuable execution resources, thereby starving the other thread. (The Pentium 4 Prescott core gained a replay queue, which reduces execution time needed for the replay system, but this is not enough to completely overcome the performance hit.) However, any performance degradation is unique to the Pentium 4 (due to various architectural nuances), and is not characteristic of simultaneous multithreading in general.
Details
Processors to the host operating system, allowing the operating system to schedule two threads or processes simultaneously. When execution resources would not be used by the current task in a processor without hyper-threading, and especially when the processor is stalled, a hyper-threading equipped processor can use those execution resources to execute another scheduled task. (The processor may stall due to a cache miss, branch misprediction, or data dependency.)
This technology is transparent to operating systems and programs. All that is required to take advantage of hyper-threading is symmetric multiprocessing (SMP) support in the operating system, as the logical processors appear as standard separate processors.
It is possible to optimize operating system behavior on multi-processor hyper-threading capable systems. For example, consider an SMP system with two physical processors that are both hyper-threaded (for a total of four logical processors). If the operating system's process scheduler is unaware of hyper-threading it will treat all four processors as being the same. If only two processes are eligible to run it might choose to schedule those processes on the two logical processors that happen to belong to one of the physical processors; that processor would become extremely busy while the other would be idle, leading to poorer performance than is possible with better scheduling. This problem can be avoided by improving the scheduler to treat logical processors differently from physical processors; in a sense, this is a limited form of the scheduler changes that are required for NUMA systems.
Security
In May 2005 Colin Percival demonstrated that a malicious thread operating with limited privileges can monitor the execution of another thread through their influence on a shared data cache, allowing for the theft of cryptographic keys.[3] Note that while the attack described in the paper was demonstrated on an Intel Pentium 4 processor with HTT, the same techniques could theoretically apply to any system where caches are shared between two or more non-mutually-trusted execution threads; see also side channel attack.
Past
Hyper-Threading was first introduced in the Foster MP-based Xeon in 2002. It appeared on the 3.06 GHz Northwood-based Pentium 4 in the same year, and then appeared in every Pentium 4 HT, Pentium 4 Extreme Edition and Pentium Extreme Edition processor. Previous generations of Intel's processors based on the Core microarchitecture do not have Hyper-Threading, because the Core micro architecture is a descendant of the P6 micro architecture used in iterations of Pentium since the Pentium Pro through the Pentium III and the Celeron (Covington, Mendocino, Copper mine and Tualatin-based) and the Pentium II Xeon and Pentium III Xeon models. However, Intel is using the feature in the newer Atom and Core i7 processors.
Inefficiencies
In 2006 hyper-threading was criticized for being energy-inefficient. For example, specialist low-power CPU design company ARM has stated SMT can use up to 46% more power than dual core designs. Dual core processors are different than Dual CPU. Furthermore, they claim SMT increases cache thrashing by 42%, whereas dual core results in a 37% decrease.[4] These considerations are claimed to be the reason Intel dropped SMT from the Core 2 micro architecture.[by whom?]
Present & Future
Intel released the Nehalem (Core i7) in November 2008 in which hyper-threading makes a return. Nehalem contains 4 cores and effectively scales 8 threads.[5]
The Intel Atom is an in-order processor with hyper-threading, for low power mobile PCs and low-price desktop PCs.[6]
The Itanium 9300 launched with eight threads per processor through enhanced hyper-threading technology. Paulson, the next-generation Itanium, is schedule to have additional hyper-threading enhancements.[7]
The Intel Xeon 5500 server chips also utilize two-way hyper-threading[8][9]
Difference
Hyper-threading is using one processor but logically dividing it into two so that it gives the user the benefit of two processors with only using the resources equivalent to almost one. This is achieved by sharing, partitioning and duplicating the various resources almost into two processors. Used by the latest Pentium processors, which are HT enabled, in layman's terms, it allows you to use more than two applications at the same time without slowing down processing speed.
Multi-threading is when various processes are time sliced such that it gives the user the impression that all the programs are being run at the same time. This is what happens on your computer regularly.
Super-threading allows threads from different processes to be executed at the same time unlike Multi-threading where every process has a time slot during which, thread from only one process will be executed. But every time, if for EG, there are four instructions issued to the processor. They will all be from the same process. Hyper-threading takes it a step further. It allows threads from different processes to be issued at the same time, in turn, utilizing the waste cycles of the processor.
Super-threading is a multithreading approach that weaves together the execution of different threads on a single processor without truly executing them at the same time.[1] This qualifies it as time-sliced or temporal multithreading rather than simultaneous multithreading. It is motivated by the observation that the processor is occasionally left idle while executing an instruction from one thread. Super-threading seeks to make use of unused processor cycles by applying them to the execution of an instruction from another thread
Disadvantages
Hyper-Threading is not SMP. (Symmetric Multi-Processing)implies several processors, and we have only one processor. However, it is supplemented with a certain feature which lets it pretend it consists of two processors.
Well, the Hyper-Threading technology allows increasing efficiency of the processor in certain cases. In particular, when applications of different nature are used simultaneously. This is an advantage, but the effect takes place only in certain situations. The classical market principle says: pay more to get more.
It really boosts up performance sometimes. The effect can be much greater than even when we compare two platforms with the same processor but different chipsets. But the effect depends on a style of working with a computer. Note that the classical SMP style is when a user counts on the response of the classical multiprocessor system.
The style of the Hyper-Threading is a combination of entertaining or service processes with "working" processes. You won't get a tangible gain in most classical multiprocessor tasks if you run one application at a time. But you will surely make shorter the time of execution of most background tasks used as a makeweight. Intel has actually reminded us that operating systems we are using are multitask, and it offered a way to speed up fulfillment of a complex of simultaneously executed applications (not a single one). This is a very interesting approach, and we are glad this idea is realized.
See also
Â¢ Multi-core
Â¢ Barrel processor
References
1. ^ Operating Systems that include optimizations for Hyper-Threading Technology
2. ^ Intel Processor Spec Finder: SL6WK
3. ^ Cache Missing for Fun and Profit
4. ^ theinquirer.net
5. ^ Intel explains the new Core i7 CPU
6. ^ http://inteltechnology/atom/microarchitecture.htm
7. ^ http://microelectronics.cbronlinenews/in...sor_100208
8. ^ http://intelp/en_US/products/server/processor
9. ^ http://intelbusiness/resources/demos/xeon5500/performance/demo.htm
External links
Â¢ Intel's high level overview of Hyper-threading
Â¢ Hyper-threading on MSDN Magazine
Â¢ HyperThreading Overview from OSDEV Community
Â¢ An introductory article from Ars Technica
Â¢ Hyper-Threading Technology Architecture and Microarchitecture, technical description of Hyper-Threading (1.2 MB PDF-file)
Â¢ [1] Enter Patent Number 4,847,755
Â¢ Merom, Conroe, Woodcrest lose HyperThreading
Security
Â¢ KernelTrap discussion: Hyper-Threading Vulnerability
Performance
Â¢ ZDnet: Hyperthreading hurts server performance, say developers
Â¢ ARM is no fan of HyperThreading
disadvantages
Hyper-Threading is not SMP. (Symmetric Multi-Processing)implies several processors, and we have only one processor. However, it is supplemented with a certain feature which lets it pretend it consists of two processors.
Well, the Hyper-Threading technology allows increasing efficiency of the processor in certain cases. In particular, when applications of different nature are used simultaneously. This is an advantage, but the effect takes place only in certain situations. The classical market principle says: pay more to get more.
It really boosts up performance sometimes. The effect can be much greater than even when we compare two platforms with the same processor but different chipsets. But the effect depends on a style of working with a computer. Note that the classical SMP style is when a user counts on the response of the classical multiprocessor system.
The style of the Hyper-Threading is a combination of entertaining or service processes with "working" processes. You won't get a tangible gain in most classical multiprocessor tasks if you run one application at a time. But you will surely make shorter the time of execution of most background tasks used as a makeweight. Intel has actually reminded us that operating systems we are using are multitask, and it offered a way to speed up fulfillment of a complex of simultaneously executed applications (not a single one). This is a very interesting approach, and we are glad this idea is realized.

for more please read
http://ixbtlabsarticles2/pentium43ghzht/

seminar topics · 30-03-2010, 12:25 PM

please read http://studentbank.in/report-hyper-threading
and http://studentbank.in/report-hyper-threa...ars-report
and http://studentbank.in/report-hyperthread...ars-report
for more about HYPER THREADING technology , seminar ,technical,presentation details

seminar presentation · 24-05-2010, 12:22 PM

[attachment=3617]

Introduction
Intel's Hyper-Threading Technology brings the concept of simultaneous multi-threading to the Intel Architecture.
Hyper-Threading Technology makes a single physical processor appear as two logical processors; the physical execution resources are shared and the architecture state is duplicated for the two logical processors.
From a software or architecture perspective, this means operating systems and user programs can schedule processes or threads to logical processors as they would on multiple physical processors.
From a microarchitecture perspective, this means that instructions from both logical processors will persist and execute simultaneously on shared execution resources
The first implementation of Hyper-Threading Technology was done on the IntelXeon<=> processor MP.
In this implementation there are two logical processors on each physical processor.
The logical processors have their own independent architecture state, but they share nearly all the physical execution and hardware resources of the processor
The potential for Hyper-Threading Technology is tremendous; our current implementation has only just begun to tap into this potential.
Hyper-Threading Technology is expected to be viable from mobile processors to servers; its introduction into market segments other than servers is only gated by the availability and prevalence of threaded applications and workloads in those markets

Processor Micro-architecture
Traditional approaches to processor design have focused on higher clock speeds, instruction-level parallelism (ILP), and caches.
Techniques to achieve higher clock speeds involve pipelining the microarchitecture to finer granularities, also called super-pipelining.
Higher clock frequencies can greatly improve performance by increasing the number of instructions that can be executed each second
One technique is out-of-order execution where a large window of instructions is simultaneously evaluated and sent to execution units, based on instruction dependencies rather than program order.
Accesses to DRAM memory are slow compared to execution speeds of the processor. One technique to reduce this latency is to add fast caches close to the processor

Conventional Multi-threading
In recent years a number of other techniques to further exploit TLP have been discussed and some products have been announced.
One of these techniques is chip multiprocessing (CMP), where two processors are put on a single die
The two processors each have a full set of execution and architectural resources. The processors may or may not share a large on-chip cache.
CMP is largely orthogonal to conventional multiprocessor systems, as you can have multiple CMP processors in a multiprocessor configuration.
Recently announced processors incorporate two processors on each die.
However, a CMP chip is significantly larger than the size of a single-core chip and therefore more expensive to manufacture; moreover, it does not begin to address the die size and power considerations

Time-slice Multi-threading
Time-slice multithreading is where the processor switches between software threads after a fixed time period. Quite a bit of what a CPU does is illusion.
For instance, modern out-of-order processor architectures don't actually execute code sequentially in the order in which it was written.
It is noted that an OOE(out of order execution)architecture takes code that was written and compiled to be executed in a specific order, reschedules the sequence of instructions (if possible) so that they make maximum use of the processor resources, executes them, and then arranges them back in their original order so that the results can be written out to memory
To the programmer and the user, it looks as if an ordered, sequential stream of instructions went into the CPU and identically ordered, sequential stream of computational results emerged

Concept of Simultaneous Multi-threading
Simultaneous Multi-threading is a processor design that combines hardware multithreading with superscalar processor technology to allow multiple threads to issue instructions each cycle
It is seen that the architecture for simultaneous multi threading achieves three goals:
(1) it minimizes the architectural impact on the conventional superscalar design,
(2) it has minimal performance impact on a single thread executing alone, and
(3) it achieves significant throughput gains when running multiple threads.
The simultaneous multi threading enjoys a 2.5-fold improvement over an unmodified superscalar with the same hardware resources
Each square corresponds to an issue slot, with white squares signifying unutilized slots.
Hardware utilization suffers when a program exhibits insufficient parallelism or when available parallelism is not used effectively.
A superscalar processor achieves low utilization because of low ILP in its single thread.
Multiprocessors physically partition hardware to exploit TLP, and therefore performance suffers when TLP is low (e.g., in sequential portions of parallel programs).
In contrast, simultaneous multithreading avoids resource partitioning.
Because it allows multiple threads to compete for all resources in the same cycle, SMT can cope with varying levels of ILP and TLP; consequently utilization is higher, and performance is better

Base Processor Architecture
The base processor is a sophisticated, out-of-order superscalar processor with a dynamic scheduling core similar to the MIPS R10000.
On each cycle, the processor fetches a block of instructions from the instruction cache.
After decoding these instructions,the register-renaming logic maps the logical registers to a pool of physical renaming registers to remove false dependencies.
Instructions are then fed to either the integer or floating-point instruction queues.
When their operands become available, instructions are issued from these queues to the corresponding functional units. Instructions are retired in order

Power Consumption of a Multithreaded Architecture
The power-energy characteristics of multi threaded execution can be examined.
It examines performance (IPC), energy efficiency (energy per useful instruction executed,E/UI), and power (the average power utilized during each simulation run) for both single-thread and multithread execution.
All results (including IPC) are normalized to a baseline (in each case, the lowest single-thread value).
This is done for two reasons

project report helper · 25-09-2010, 03:35 PM

[attachment=4388]

HYPER THREADING seminar report

abstract

Hyper-threading is a technology developed by Intel Corporation. It is used in certain Pentium 4 processors and all Intel Xeon processors. Hyper-threading technology, commonly referred to as "HT Technology," enables the processor to execute two threads, or sets of instructions, at the same time. Since hyper-threading allows two streams to be executed in parallel, it is almost like having two separate processors working together.

While hyper-threading can improve processing performance, software must support multiple processors to take advantage of the technology. Fortunately, recent versions of both Windows and Linux support multiple processors and therefore benefit from hyper-threading. For example, a video playing in Windows Media Player should not be slowed down by a Web page loading in Internet Explorer. Hyper-threading allows the two programs to be processed as separate threads at the same time. However, individual programs can only take advantage of Intel's HT Technology if they have been programmed to support multiple processors.

**seminar addict** · 04-02-2012, 11:15 AM

to get information about the topic Hyperthreading Download full report ,ppt and related topic refer the page link bellow

http://studentbank.in/report-hyper-threa...ars-report

http://studentbank.in/report-hyperthread...ars-report

http://studentbank.in/report-hyperthread...?pid=65644

http://studentbank.in/report-hyper-threa...sor-family

http://studentbank.in/report-hyper-threa...6#pid30436

http://studentbank.in/report-hyper-threading-technology

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	network security seminars report	computer science technology	14	20,880	24-11-2018, 01:19 AM Last Post:
	Modular Computing seminars report	computer science crazy	4	21,722	08-10-2013, 04:32 PM Last Post: Guest
	tele immersion seminars report	computer science technology	9	14,876	20-12-2012, 11:20 AM Last Post: seminar details
	computer science seminars topics	computer science crazy	1	10,096	16-03-2012, 10:38 AM Last Post: seminar paper
	GSM Security And Encryption (download seminars report)	Computer Science Clay	14	14,436	07-03-2012, 07:35 PM Last Post: kushi.8
	wireless lan security seminars report	computer science technology	8	11,832	24-02-2012, 12:21 PM Last Post: seminar paper
	wi-max seminars report	tanaya padhee	9	10,628	23-02-2012, 10:58 AM Last Post: seminar paper
	computer science seminars topics 2012-2011	project topics	2	20,007	21-02-2012, 04:38 PM Last Post: chethana mallya
	2011 seminars topics computer science	project topics	1	2,197	06-02-2012, 09:53 AM Last Post: seminar addict
	Hyper-Threading technology	computer science crazy	1	2,315	04-02-2012, 11:16 AM Last Post: seminar addict

Important Note..!

ASK HERE