Bank-aware Dynamic Cache Partitioning for Multicore Architectures
#1

Bank-aware Dynamic Cache bPartitioning for Multicore Architectures
A Seminar Report
by
Anoop S S
105102
Department of Computer Science & Engineering
College of Engineering Trivandrum
Kerala - 695016
2010-11


Abstract
As Chip-Multiprocessor systems (CMP) have become the predominant topology for lead-
ing microprocessors, critical components of the system are now integrated on a single chip.
This enables sharing of computation resources that was not previously possible. In addition,
the virtualization of these computational resources exposes the system to a mix of diverse
and competing workloads. Cache is a resource of primary concern as it can be dominant in
controlling overall throughput. In order to prevent destructive interference between divergent
workloads, the last level of cache must be partitioned. In the past, many solutions have been
proposed but most of them are assuming either simpli ed cache hierarchies with no realistic

3 Bank-Aware Cache Partitioning
This section provide details about their application pro ling mechanism followed by parti-
tioning algorithm for assigning cache capacity to each core. In the end, the cache partitions
allocation algorithm for allocating the cache partitions on the CMP-baseline system are de-
scribed.
3.1 Cache Pro ling of Applications
In order to dynamically pro le the cache requirements of each core, a cache miss prediction
model based on Mattsons stack distance algorithm is implemented. Mattsons stack algorithm
(MSA) was initially proposed by Mattson et al. in for reducing the simulation time of trace-
driven caches by determining the miss ratios of all possible cache sizes with a single pass through
the trace. The basic idea of the algorithm was later used for ecient trace-driven simulations
of a set associative cache. More recently, hardware-based MSA algorithms have been proposed
for CMP system resource management .
MSA is based on the inclusion property of the commonly used Least Recently Used (LRU)
cache replacement policy. Speci cally, during any sequence of memory accesses, the content of
an N-sized cache is a subset of the content of any cache larger than N. To create a pro le for a
K-way set associative cache K+1 counters are needed, named Counter1 to CounterK+1. Every
time there is an access to the monitored cache increment only the counter that corresponds to
the LRU stack distance where the access took place. Counters from Counter1 up to CounterK
correspond to the Most Recently Used (MRU) up to the LRU position in the stack distance,
respectively. If an access touches an address in a cache block that was in the i-th position of the
LRU stack distance, increment Counteri counter. Finally, if the access ends up being a miss,
increment CounterK+1.
restrictions or complex cache schemes that are dicult to integrate in a real design. To address
this problem a dynamic partitioning strategy based on realistic last level cache designs of CMP
processors was proposed. This uses a cycle accurate, full system simulator based on Simics and
Gems to evaluate the partitioning scheme on an 8-core DNUCA CMP system. Results for an
8-core system show that the proposed scheme provides on average a 70 percent reduction in
misses compared to non-partitioned shared caches, and a 25 percent misses reduction compared
to static equally partitioned (private) caches.

1 Introduction
Chip Multiprocessors (CMP) have gradually become an attractive architecture for leveraging
system integration by providing capabilities on a single die that would have previously occupied
many chips across multiple small systems. This integration has brought abundant on-chip
resources that can now be shared in ner granularity among the multiple cores. Such sharing
though has introduced chip-level contention and the need of e ective resource management
policies is more important that ever.
To eciently exploit these resources, systems require multiple program contexts and virtu-
alization has become a key player in this arena. Many small and/or low utilization servers
can now be easily consolidated on a single physical machine, allowing higher utilization of the
available resources with signi cant energy reductions. Such consolidation presents both oppor-
tunities and pitfalls to computer architects to best manage these once isolated resources on
large CMP designs.
In such virtualization environments, workloads tend to place dissimilar demands on shared
resources and therefore, due to resource contention, are much more likely to destructively inter-
fere in an unfair way. Consequently, shared resources' contention become the key performance
bottleneck in CMPs Shared resources include, but are not limited to: main memory bandwidth,
main memory capacity, cache capacity, cache bandwidth, memory subsystem interconnection
bandwidth and system power.
Among these resources, several studies have identi ed the shared last-level cache (here L2)
of CMPs as a major source of performance loss and execution inconsistency. As a solution,
most of the proposed techniques control this contention by partitioning the L2 cache capacity
and allocating speci c portions of it to each core or execution thread. There are both static
and dynamic partitioning schemes available that use workload pro ling information to make
a decision on cache capacity assignment for each core/thread. All of the above techniques
are usually based on high-level system characteristic monitoring since low-level activity based
algorithms such as LRU replacement fail to provide a strong barrier among workloads competing
for shared resources.

[attachment=8550]
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: partitioning seminar report in information technology, multicore architecture fpga, partitioning algorithms for vlsi physical design automation ppt, cache bank and trust, multicore architecture course, matlab code for quadtree partitioning, data clustering and partitioning techniques,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  Dynamic Search Algorithm in Unstructured Peer-to-Peer Networks seminar surveyer 3 2,849 14-07-2015, 02:24 PM
Last Post: seminar report asees
  Adaptive Replacement Cache Full Download Seminar Report and Paper Presentation computer science crazy 1 3,005 19-04-2014, 07:01 PM
Last Post: Guest
  Dynamic Synchronous Transfer Mode computer science crazy 3 4,585 19-02-2014, 03:29 AM
Last Post: Guest
  Dynamic programming language seminar projects crazy 2 3,201 03-01-2013, 12:31 PM
Last Post: seminar details
  High Performance DSP Architectures computer science crazy 1 8,166 12-12-2012, 12:18 PM
Last Post: seminar details
  Distributed Cache Updating for the Dynamic Source Routing Protocol seminar class 3 2,286 17-11-2012, 01:26 PM
Last Post: seminar details
  A Hybrid Disk-Aware Spin-Down Algorithm with I/O Subsystem Support computer girl 0 1,090 07-06-2012, 04:01 PM
Last Post: computer girl
  DYNAMIC MEMORY MANAGEMENT projectsofme 1 1,980 05-03-2012, 09:20 AM
Last Post: seminar paper
  Finding Bugs in Web Applications Using Dynamic Test Generation and Explicit-State Mod seminar surveyer 2 2,363 14-02-2012, 12:55 PM
Last Post: seminar paper
  Rover Technology Enabling Scalable Location Aware Computing shabeer 3 8,579 26-01-2012, 10:05 AM
Last Post: seminar addict

Forum Jump: