09-01-2012, 04:43 PM
Memory Hierarchy Design
[attachment=15926]
1. How to evaluate Cache Performance. Explain various cache optimization categories.
The average memory access time is calculated as follows
Average memory access time = hit time + Miss rate x Miss Penalty.
Where Hit Time is the time to deliver a block in the cache to the processor (includes time to determine whether the block is in the cache), Miss Rate is the fraction of memory references not found in cache (misses/references) and Miss Penalty is the additional time required because of a miss
The average memory access time due to cache misses predicts processor performance.
First, there are other reasons for stalls, such as contention due to I/O devices using memory and due to cache misses
Second, The CPU stalls during misses, and the memory stall time is strongly correlated to average memory access time.
CPU time = (CPU execution clock cycles + Memory stall clock cycles) × Clock cycle time
There are 17 cache optimizations into four categories:
1 Reducing the miss penalty: multilevel caches, critical word first, read miss before write miss, merging write buffers, victim caches;
2 Reducing the miss rate larger block size, larger cache size, higher associativity, pseudo-associativity, and compiler optimizations;
3 Reducing the miss penalty or miss rate via parallelism: nonblocking caches, hardware prefetching, and compiler prefetching;
4 Reducing the time to hit in the cache: small and simple caches, avoiding address translation, and pipelined cache access.
2. Explain various techniques for Reducing Cache Miss Penalty
There are five optimizations techniques to reduce miss penalty.
i) First Miss Penalty Reduction Technique: Multi-Level Caches
The First Miss Penalty Reduction Technique follows the Adding another level of cache between the original cache and memory. The first-level cache can be small enough to match the clock cycle time of the fast CPU and the second-level cache can be large enough to capture many accesses that would go to main memory, thereby the effective miss penalty.
The definition of average memory access time for a two-level cache. Using the subscripts L1 and L2 to refer, respectively, to a first-level and a second-level cache, the formula is
Average memory access time = Hit timeL1 + Miss rateL1 × Miss penaltyL1
and Miss penaltyL1 = Hit timeL2 + Miss rateL2 × Miss penaltyL2
so Average memory access time = Hit timeL1 + Miss rateL1× (Hit timeL2 + Miss rateL2 × Miss penaltyL2)
Local miss rate—This rate is simply the number of misses in a cache divided by the total number of memory accesses to this cache. As you would expect, for the first-level cache it is equal to Miss rateL1 and for the second-level cache it is Miss rateL2.
Global miss rate—The number of misses in the cache divided by the total num-ber of memory accesses generated by the CPU. Using the terms above, the global miss rate for the first-level cache is still just Miss rateL1 but for the second-level cache it is Miss rateL1 × Miss rateL2.
This local miss rate is large for second level caches because the first-level cache skims the cream of the memory accesses. This is why the global miss rate is the more useful measure: it indicates what fraction of the memory accesses that leave the CPU go all the way to memory.
Here is a place where the misses per instruction metric shines. Instead of confusion about local or global miss rates, we just expand memory stalls per instruction to add the impact of a second level cache.
Average memory stalls per instruction = Misses per instructionL1× Hit timeL2 + Misses per instructionL2 × Miss penaltyL2.
we can consider the parameters of second-level caches. The foremost difference between the two levels is that the speed of the first-level cache affects the clock rate of the CPU, while the speed of the second-level cache only affects the miss penalty of the first-level cache.
The initial decision is the size of a second-level cache. Since everything in the first-level cache is likely to be in the second-level cache, the second-level cache should be much bigger than the first. If second-level caches are just a little bigger, the local miss rate will be high.
Figures 5.10 and 5.11 show how miss rates and relative execution time change with the size of a second-level cache for one design.