05-06-2012, 03:43 PM
Randomized Protocols for Duplicate Elimination in Peer-to-Peer Storage Systems
![Adobe Acrobat PDF .pdf](https://studentbank.in/images/attachtypes/pdf.gif)
INTRODUCTION
PEER-TO-PEER systems have emerged as cost-effective alternatives
for scalable data sharing, backup, and archival
storage [6], [14]. Peers contribute data and storage and, in
return, gain access to data at other peers. Effective storage
management is an important issue in the deployment of such
systems. Data replication and caching are key enabling
techniques for scalability, performance, and availability. In
this context, an important problem relates to pruning
unwanted copies of data efficiently and safely. Attempts at
aggressive replication may lead to significant overheads
associated with thrashing in resource constrained environments.
Even if replication at peers is controlled, as in systems
such as Samsara [7], the network as a whole must provide
mechanisms for eliminating replicas that are not accessed,
while leaving a minimum number of replicas in the network
to satisfy availability constraints.
DUPLICATE ELIMINATION PROTOCOLS
In this section, we describe the duplicate elimination
protocols. These protocols rely on leader election as the
main kernel, with leaders assigned the responsibility of
preserving specified data items, while all other participants
are free to remove them.
A Probabilistic Quorum (PQ) Approach
Our first protocol is based on ideas proposed for building
probabilistic quorum (PQ) systems. A quorum is defined as
a set of subsets of peers such that any two subsets have a
nonempty intersection. To guarantee nonempty intersection
with high probability, the number of members in each
quorum is equal to where n is the number of
peers in the network.
A Randomized Election (RE) Approach
As an alternative to probabilistic quorum systems, we also
consider a randomized leader election protocol based on a
balls and bins [22] abstraction. The protocol operates in two
phases. In the first phase, the number of nodes holding the
same data item is reduced. In the second phase, the
remaining nodes determine the k leaders that retain the
data item. The main features of the protocol are as follows:
IMPLEMENTATION AND EXPERIMENTAL RESULTS
In this section, we present experimental results for the two
protocols described in the previous section. Our experiments
consider a detailed large-scale simulation, as well as
a prototype implementation on PlanetLab. The simulation
setup allows us to quantify the performance of our
protocols for various system parameters. This is not easily
done for the PlanetLab implementation. For our PlanetLab
experiments, we use file traces from real systems.