Intel’s MMX
#1

[attachment=14782]
Intel’s MMX
Why MMX?

Make the Common Case Fast
Multimedia and Communication consume significant computing resources.
Providing specific hardware support makes sense.
Goals
accelerate multimedia and communications applications.
maintain full compatibility with existing operating systems and applications.
exploit inherent parallelism in multimedia and communication algorithms
includes new instructions and data types to improve performance.
First Step: examine code
Examined a wide range of applications: graphics, MPEG video, music synthesis, speech compression, speech recognition, image processing, games, video conferencing.
Identified and analyzed the most compute-intensive routines
Common Characteristics
Small integer data types: e.g. 8-bit pixels, 16-bit audio samples
Small, highly repetitive loops
Frequent multiply-and-accumulate
Compute-intensive algorithms
Highly parallel operations
MMX Technology
A set of basic, general purpose integer instructions:
Single Instruction, Multiple Data (SIMD)
57 new instructions
Eight 64-bit wide MMX registers
Four new data types
Data Types
Data Types
Example
Pixels are generally 8-bit integers. Pack eight pixels into a 64-bit MMX register.
An MMX instruction takes all eight of the pixels at once from the MMX register, performs the arithmetic or logical operation on all eight elements in parallel, and writes the result into an MMX register.
Compatibility
No new exceptions or states are added.
Aliases to existing FP registers: The exponent field of the corresponding floating-point register (bits 64-78) and the sign bit (bit 79) are set to ones (1's), making the value in the register a NaN (Not a Number) or infinity when viewed as a floating-point value.
57 Instructions
Basic arithmetic: add, subtract, multiply, arithmetic shift and multiply-add
Comparison
Conversion: pack & unpack
Logical
Shift
Move: register-to-register
Load/Store: 64-bit and 32-bit
Packed Add Word with wrap around
Saturation
Saturation: if addition results in overflow or underflow, the result is clamped to the largest or smallest value representable.
This is important for pixel calculations where this would prevent a wrap-around add from causing a black pixel to suddenly turn white
No Mode
There is no "saturation mode bit”: a new mode bit would require a change to the operating system. Separate instructions are used to generate wrap-around and saturating results.
Packed Add Word with unsigned saturation
Multiply-Accumulate
multiply-accumulate operations are fundamental to many signal processing algorithms like vector-dot-products, matrix multiplies, FIR and IIR Filters, FFTs, DCTs etc
Packed Multiply-Add
Packed Parallel Compare
No new condition code flags
No existing IA condition code flags are affected by this instruction.
Result can be used as a mask to select elements from different inputs using a logical operation, eliminating branchs.
Packed Parallel Compare
Pack/Unpack
Important when an algorithm needs higher precision in its intermediate calculations, as in image filtering.
For example, image filtering involves a set of intermediate multiply operations between filter coefficients and a set of adjacent image pixels, accumulating all the values together.
Pack
Conditional Select
The Chroma Keying example demonstrates how conditional selection using the MMX instruction set removes branch mis-predictions, in addition to performing multiple selection operations in parallel. Text overlay on a pix/video background, and sprite overlays in games are some of the other operations that would benefit from this technique.
Chroma Keying
Chroma Keying (con’t
)
Take pixels from the picture with the woman on a green background.
A compare instruction builds a mask for that data. That mask is a sequence of bytes that are all ones or all zeros.
We now know what is the unwanted background and what we want to keep.
Create Mask
Combine: !AND, AND, OR
Branch Removal
Without MMX technology, each pixel is processed separately and requires a conditional branch. Using MMX instructions, eight 8-bit pixels can be processed in parallel and no conditional branches are involved.
Vector Dot Product
The vector dot product is one of the most basic algorithms used in signal-processing of natural data such as images, audio, video and sound.
PMADD does 4 multiplies and 2 adds at a time. Coupled with PADD, eight multiply-accumulate operations can be performed: 2 PMADD and 2 PADD
Vector Dot Product
Vector Dot Product
Vector Dot Product
Assuming precision is sufficient, a dot-product on an 8-element vector can be completed using 8 MMX instructions: 2 PMADDs, 2 PADDs, two shifts (if needed to fix the precision after the multiply), and 2 loads for one of the vectors (the other vector is loaded by the PMADD instruction which can have one of its operands come from memory).
Compare
Compare

With MMX technology, one third of the number of instructions is needed.
Most MMX instructions can be executed in one clock cycle, so the performance improvement will be more dramatic than the simple ratio of instruction counts.
Matrix Multiply
3D games: computations that manipulate 3D objects use 4-by-4 matrices that are multiplied with 4-element vectors many times. Each vector has the X,Y, Z and perspective corrective information for each pixel. The 4-by-4 matrix is used to rotate, scale, translate and update the perspective corrective information for each pixel.
Compare
Matrix Multiply
MMX required half the instructions.
Image Dissolve Using Alpha Blending
Dissolve a Swan into a Flower Result_pixel = Flower_pixel * (alpha/255) + Swan_pixel * [1 - (alpha/255)]
Assume 640x480 resolution
Dissolve: Millions of Inst.
Dissolve
1 billion fewer instructions for the 640x480 dissolve
Conclusion
MMX appeared in 1997 in Pentium processors (with bigger cache).
According to Intel, an MMX microprocessor runs a multimedia application up to 60% faster. In addition, it runs other applications about 10% faster
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: full report of intel mmx technology seminar, abstract intel mmx technology, intel mmx technology seminar report for engineering students, intel mmx technology ppt free download, multiply accumulate, mmx technology, full seminar report on intel mmx technology,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  INTEL CORE i7 seminar class 3 3,932 12-03-2012, 09:39 AM
Last Post: seminar paper
  Intel vPro Technology seminar class 2 1,570 29-09-2011, 09:47 AM
Last Post: seminar addict
  MICROPROCESSOR INTEL 8086/8088 seminar class 0 3,347 24-03-2011, 10:42 AM
Last Post: seminar class
  INTEL 8085 seminar class 0 1,918 21-03-2011, 12:11 PM
Last Post: seminar class
  Intel 8088 (8086) Microprocessor Structure seminar class 0 2,889 07-03-2011, 04:30 PM
Last Post: seminar class
  History of 64-bit Computing: AMD64 and Intel Itanium Processors 64-bit History seminar class 0 8,080 28-02-2011, 09:32 AM
Last Post: seminar class
Tongue Intel iAPX 432 projectsofme 0 1,437 12-10-2010, 12:10 PM
Last Post: projectsofme
  Intel Virtualization Technology computer science topics 0 1,244 29-06-2010, 12:49 AM
Last Post: computer science topics
Star HYPER-THREADING TECHNOLOGY IN INTEL XEON MICROPROCESSOR FAMILY seminar projects crazy 1 3,033 30-03-2010, 12:25 PM
Last Post: seminar topics
  intel pentium m processor full report computer science technology 0 1,323 23-01-2010, 01:26 AM
Last Post: computer science technology

Forum Jump: