HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems
#1

hi....my project is ..
HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems............ helpme out any info...abt this..
Reply
#2
the content of the doc file is am posing for easy navigation actually the material added by rajav3[at]yahoo.co.in

thanks to rajav3[at]yahoo.co.in

HBA: Distributed Metadata Management for
Large Cluster-Based Storage Systems
Scope of the project
To create metadata of all the files in the network and finds a particular file using a search engine in fast and easy.
Introduction
Rapid advances in general-purpose communication networks have motivated the employment of inexpensive components to build competitive cluster-based storage solutions to meet the increasing demand of scalable computing. In the recent years, the bandwidth of these networks has been increased by two orders of magnitude. , which greatly narrows the performance gap between them and the dedicated networks used in commercial storage systems. Since all I/O requests can be classified into two categories, that is, user data requests and metadata requests, the scalability of accessing both data and metadata has to be carefully maintained to avoid any potential performance bottleneck along all data paths. This paper proposes a novel scheme, called Hierarchical Bloom Filter Arrays (HBA), to evenly distribute the tasks of metadata management to a group of MSs. A Bloom filter (BF) is a succinct data structure for probabilistic membership query. A straightforward extension of the BF approach to decentralizing metadata management onto multiple MSs is to use an array of BFs on each MS. The metadata of each file is stored on some MS, called the home MS.

Modules
1. Login
2. Finding Network Computers
3. Meta Data Creation
4. Searching Files

Module Description
Login
In Login Form module presents site visitors with a form with username and password fields. If the user enters a valid username/password combination they will be granted access to additional resources on website. Which additional resources they will have access to can be configured separately.
Finding Network Computers
In this module we are going to find out the available computers from the network. And we are going to share some of the folder in some computers. We are going to find out the computers those having the shared folder. By this way will get all the information about the file and we will form the Meta data.
Meta Data Creation
In this module we are creating a metadata for all the system files. The module is going to save all file names in a database. In addition to that, it also saves some information from the text file. This mechanism is applied to avoid the long run process of the existing system.
Searching Files
In this module the user going to enter the text for searching the required file. The searching mechanism is differing from the existing system. When ever the user gives their searching text, It is going to search from the database. At first, the search is based on the file name. After that, it contains some related file name. Then it collects some of the file text, it makes another search. Finally it produces a search result for corresponding related text for the user.

Module I/O
Login
Given Input-Login details
Expected Output-Login persons can use the software
Finding Network Computers
Given Input- Click on the particular button to know network system.
Expected Output- Shows all the connected nodes in the network.
Meta Data Creation
Given Input- Search all the files and stores necessary information.
Expected Output- Updation of database with created metadata.
Searching Files
Given Input- File name, or File size
Expected Output-shows page link to get particular file.
Module diagram

UML Diagrams

Use case diagram

Class diagram
Object diagram
State diagram

Activity diagram


Sequence diagram

Collaboration Diagram

Component Diagram
E-R diagram
Dataflow diagram
Project Flow Diagram
System Architecture

Literature review
Many cluster-based storage systems employ centralized metadata management. Experiments in GFS show that a single MS is not a performance bottleneck in a storage cluster with 100 nodes under a read-only Google searching workload. PVFS, which is a RAID-0-style parallel file system, also uses a single MS design to provide a clusterwide shared namespace. As data throughput is the most important objective of PVFS, some expensive but indispensable functions such as the concurrent control between data and metadata are not fully designed and implemented. In CEFT , which is an extension of PVFS to incorporate a RAID-10-style fault tolerance and parallel I/O scheduling, the MS synchronizes concurrent updates, which can limit the overall throughput under the workload of intensive concurrent metadata updates. In Lustre, some low-level metadata management tasks are offloaded from the MS to object storage devices, and ongoing efforts are being made to decentralize metadata management to further improve the scalability. Some other systems have addressed metadata scalability in their designs. For example, GPFS [18] uses dynamically elected metanodes to manage file metadata. The election is coordinated by a centralized token server. OceanStore, which is designed for LAN-based networked storage systems, scales the data location scheme by using an array of BFs, in which the ith BF is the union of all the BFs for all of the nodes within i hops. The requests are routed to their destinations by following the path with the maximum probability. Panasas ActiveScale not only uses object storage devices to offload some metadata management tasks but also scales up the metadata services by using a group of directory blades. Our target systems differ from the three systems above. Although GPFS and Panasas ActiveScale need to use their specially designed commercial hardware, our target systems only consist of commodity components. Our system is also different from OceanStore in that the latter focusesongeographically distributed storage nodes, whereas our design targets cluster-based storage systems, where all nodes are only one hop away. The following summarizes other research projects in scaling metadata management, including table-based mapping, hash-based mapping, static tree partitioning, and dynamic tree partitioning.
Table-Based Mapping
Globally replicating mapping tables is one approach to decentralizing metadata management. There is a salient trade-off between the space requirement and the granularity and flexibility of distribution. A fine-grained table allows more flexibility in metadata placement. In an extreme case, if the table records the home MS for each individual file, then the metadata of a file can be placed on any MS. However, the memory space requirement for this approach makes it unattractive for large-scale storage systems. A backof- the-envelope calculation shows that it would take as much
as 1.8 Gbytes of memory space to store such a table with 108 entries when 16 bytes are used for a filename and 2 bytes for an MS ID. In addition, searching for an entry in such a huge table consumes a large number of precious CPU cycles. To reduce the memory space overhead, xFS proposes a coarse-grained table that maps a group of files to an MS. To keep a good trade-off, it is suggested that in xFS, the number of entries in a table should be an order of magnitude larger than the total number of MSs.
Hashing-Based Mapping Modulus-based hashing is another decentralized scheme. This approach hashes a symbolic pathname of a file to a digital value and assigns its metadata to a server according to the modulus value with respect to the total number of MSs. In practice, the likelihood of serious skew of metadata workload is almost negligible in this scheme, since the number of frequently accessed files is usually much larger than the number of MSs. However, a serious problem arises when an upper directory is renamed or the total number of MSs changes: the hashing mapping needs to be reimplemented, and this requires all affected metadata to be migrated among MSs. Although the size of the metadata of a file is small, a large number of files may be involved. In particular, the metadata of all files has to be relocated if an MS joins or leaves. This could lead to both disk and network traffic surges and cause serious performance degradation. LazyHybrid is proposed to reduce the impact of metadata migration by updating lazily and also incorporating a small table that maps disjoint hash ranges to MS IDs. The migration overhead, however, can still overweigh the benefits of load balancing in a heavily loaded system.
Static Tree Partitioning static namespace partition is a simple way of distributing metadata operations to a group of MSs. A common partition technique has been to divide the directory tree during the process of installing or mounting and to store the information at some well-known locations. Some distributed file systems such as NFS , AFS, and Coda follow this approach. This scheme works well only when file access patterns are uniform,resulting in a balanced workload. Unfortunately, access patterns in general file systems are highly skewed and, thus, this partition scheme can lead to a highly imbalanced workload if files in some particular subdirectories become more popular than the others. Dynamic Tree Partitioning observe the disadvantages of the static tree partition approach and propose to dynamically partition the namespace across a cluster of MSs in order to scale up the aggregate metadata throughput. The key design idea is that initially, the partition is performed by hashing directories near the root of the hierarchy, and when a server becomes heavily loaded, this busy server automatically migrates some subdirectories to other servers with less load. It also proposes prefix caching to efficiently utilize available RAM on all servers to further improve the performance. This approach has three major disadvantages. First, it assumes that there is an accurate load measurement scheme available on each server and all servers periodically exchange the load information. Second, when an MS joins or leaves due to failure or recovery, all directories need to be rehashed to reflect the change in the server infrastructure, which, in fact, generates a prohibitively high overhead in a petabyte-scale storage system. Third, when the hot spots of metadata operations shift as the system evolves, frequent metadata migration in order to remove these hot spots may impose a large overhead and offset the benefits of load balancing.
Comparison of Existing Schemes summarizes the existing state-of-the-art approaches to decentralizing metadata management and compares them with the HBA scheme, which will be detailed later in this paper. Each existing solution has its own advantages and disadvantages. The hashing-based mapping approach can balance metadata workloads and inherently has fast metadata lookup operations, but it has slow directory operations such as listing the directory contents and renaming directories. In addition, when the total number of MSs changes, rehashing all existing files generates a prohibitive migration overhead. The table-based mapping method does not require any metadata migration, but it fails to balance the load. Furthermore, a back-of-theenvelope calculation shows that it would take as much as 1.8 Gbytes of memory to store such a table with 100 million files. The static tree balance approach has zero migration overhead, small memory overhead, and fast directory comparison of HBA with Existing Decentralization Schemes n and d are the total number of files and partitioned subdirectories, respectively. operation. However, it cannot balance the load, and it has a medium lookup time, since hot spots usually exist in this approach. Similar to the hashing-based mapping, dynamic tree partition has fast lookup operations and small memory overhead. However, this approach relies on load monitors to balance metadata workloads and thus incurs a large migration overhead. To combine their advantages and avoid their disadvantages, a novel approach, called HBA, is proposed in this paper to efficiently route metadata requests within a group of MSs. The detailed design of HBA will be presented later in this project.
Techniques and Algorithm Used

Static Tree Partitioning
Static namespace partition is a simple way of distributing metadata operations to a group of MSs. A common partition technique has been to divide the directory tree during the process of installing or mounting and to store the information at some well-known locations. Some distributed file systems such as NFS, AFS and Coda follow this approach. This scheme works well only when file access patterns are uniform, resulting in a balanced workload. Unfortunately, access patterns in general file systems are highly skewed and, thus, this partition scheme can lead to a highly imbalanced workload
HBA
A straightforward extension of the BF approach to decentralizing metadata management onto multiple MSs is to use an array of BFs on each MS. The metadata of each file is stored on some MS, called the home MS. In this design, each MS builds a BF that represents all files whose metadata is stored locally and then replicates this filter to all other MSs. Including the replicas of the BFs from the other servers, a MS stores all filters in an array. When a client initiates a metadata request, the client randomly chooses a MS and asks this server to perform the membership query against this array. The BF array is said to have a hit if exactly one filter gives a positive response. A miss is said to have occurred whenever no hit or more than one hit is found in the array. The desired metadata can be found on the MS represented by the hit BF with a very high probability.

Advantages
1. Faster search of different files from all the nodes of a network
2. Good results with file size and part of a file name.
Applications
This software can use in every intranet for the fast search of files in the network.
Abstract is:

An efficient and distributed scheme for file mapping or file lookup is critical in decentralizing metadata management within a group of metadata servers. This paper presents a novel technique called HBA (Hierarchical Bloom filter Arrays) to map filenames to the metadata servers holding their metadata. Two levels of probabilistic arrays, namely, Bloom filter arrays, with different level of accuracies, are used on each metadata server. One array, with lower accuracy and representing the distribution of the entire metadata, trades accuracy for significantly reduced memory overhead, while the other array, with higher accuracy, caches partial distribution information and exploits the temporal locality of file access patterns. Both arrays are replicated to all metadata servers to support fast local lookups. We evaluate HBA through extensive trace-driven simulations and an implementation in Linux. Simulation results show our HBA design to be highly effective and efficient in improving performance and scalability of file systems in clusters with 1,000 to 10,000 nodes (or super-clusters) and with the amount of data in the Peta-byte scale or higher. Our implementation indicates that HBA can reduce metadata operation time of a single-metadata-server architecture by a factor of up to 43.9 when the system is configured with 16 metadata servers.

s/w Requirements are:

Operating System : Windows XP Professional
Front End : Microsoft Visual Studio.NET 2005
Coding Language : ASP. Net 2.0, C# 2.0
Back-End : Sql Server 2000.
please read http://studentbank.in/report-hba--9141 for more about HBA-Distributed-Metadata-Management-for-Large-Cluster-Based-Storage-Systems informations...
Reply
#3
HI Raj,
rajav3[at]yahoo.co.in , Can u plz advice me how will table design be in this project of HBA, since am currently working on it.

Thanks & Regards,

Gayathri.
Reply
#4
hi frnd.plz send me hba document.
Reply
#5
Hi,
the actual paper is present in ieee. You can download it from this link:
http://ieeexplore.ieeeXplore/login.jsp?r...ision=-203

you need an ieee xplore subscription for this. Ask your friends or maybe your college maybe having a subscription

Check out this link:
http://digitalcommons.unl.edu/cgi/viewco...searticles
Reply
#6
[attachment=4568]
HBA: Distributed Metadata Management for
Large Cluster-Based Storage Systems



ABSTRACT

An efficient and distributed scheme for file mapping or file lookup is critical in decentralizing metadata management within a group of metadata servers, here the technique used called HIERARCHICAL BLOOM FILTER ARRAYS (HBA) to map filenames to the metadata servers holding their metadata. The Bloom filter arrays with different levels of accuracies are used on each metadata server. The first one with low accuracy and used to capture the destination metadata server information of frequently accessed files. The other array is used to maintain the destination metadata information of all files. Simulation results show our HBA design to be highly effective and efficient in improving the performance and scalability of file systems in clusters with 1,000 to 10,000 nodes (or superclusters) and with the amount of data in the petabyte scale or higher. HBA is reducing metadata operation by using the single metadata architecture instead of 16 metadata server.

Existing System:

File mapping or file lookup is critical in decentralizing metadata management within a group of metadata servers. Following approaches are used in the Existing system.
1. Table-Based Mapping : It fails to balance the load.
2. Hashing-Based Mapping : It has slow directory operations, such as listing the
directory contents And renaming directories .
3. Static Tree Partitioning : Cannot balance the load and has a medium lookup
time.
4. Dynamic Tree Partitioning : Small memory overhead, incurs a large migration
overhead.

Proposed System:
Here we are using the new approaches called HIERARCHICAL BLOOM FILTER ARRAYS (HBA), efficiently route metadata request within a group of metadata servers. There are two arrays used here. First array is used to reduce memory overhead, because it captures only the destination metadata server information of frequently accessed files to keep high management efficiency. And the second one is used to maintain the destination metadata information of all files. Both the arrays are mainly used for fast local lookup.

Reply
#7
Abstract
An efficient and distributed scheme for file mapping or file lookup is critical in decentralizing metadata management within
a group of metadata servers. This paper presents a novel technique called Hierarchical Bloom Filter Arrays (HBA) to map filenames
to the metadata servers holding their metadata. Two levels of probabilistic arrays, namely, the Bloom filter arrays with different levels
of accuracies, are used on each metadata server. One array, with lower accuracy and representing the distribution of the entire
metadata, trades accuracy for significantly reduced memory overhead, whereas the other array, with higher accuracy, caches partial
distribution information and exploits the temporal locality of file access patterns. Both arrays are replicated to all metadata servers
to support fast local lookups. We evaluate HBA through extensive trace-driven simulations and implementation in Linux. Simulation
results show our HBA design to be highly effective and efficient in improving the performance and scalability of file systems in
clusters with 1,000 to 10,000 nodes (or superclusters) and with the amount of data in the petabyte scale or higher. Our implementation
indicates that HBA can reduce the metadata operation time of a single-metadata-server architecture by a factor of up to 43.9 when
the system is configured with 16 metadata servers.
Reply
#8
[attachment=5210]
HBA: Distributed Metadata Management for
Large Cluster-Based Storage Systems



Yifeng Zhu, Member, IEEE, Hong Jiang, Member, IEEE,
Jun Wang, Member, IEEE, and Feng Xian, Student Member, IEEE



Abstract—

An efficient and distributed scheme for file mapping or file lookup is critical in decentralizing metadata management within
a group of metadata servers. This paper presents a novel technique called Hierarchical Bloom Filter Arrays (HBA) to map filenames
to the metadata servers holding their metadata. Two levels of probabilistic arrays, namely, the Bloom filter arrays with different levels
of accuracies, are used on each metadata server. One array, with lower accuracy and representing the distribution of the entire
metadata, trades accuracy for significantly reduced memory overhead, whereas the other array, with higher accuracy, caches partial
distribution information and exploits the temporal locality of file access patterns. Both arrays are replicated to all metadata servers
to support fast local lookups. We evaluate HBA through extensive trace-driven simulations and implementation in Linux. Simulation
results show our HBA design to be highly effective and efficient in improving the performance and scalability of file systems in
clusters with 1,000 to 10,000 nodes (or superclusters) and with the amount of data in the petabyte scale or higher. Our implementation
indicates that HBA can reduce the metadata operation time of a single-metadata-server architecture by a factor of up to 43.9 when
the system is configured with 16 metadata servers.


Reply
#9
I would like to know how to develop this system and what programming language is used to develop. If you don't mind, I also want to know detail implementation of this system like source code for my system's reference.
Reply
#10
HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems

[attachment=17235]
INTRODUCTION
RAPID advances in general-purpose communication networks
have motivated the deployment of inexpensive
components to build competitive cluster-based storage
solutions to meet the increasing demand of scalable
computing [1], [2], [3], [4], [5], [6]. In the recent years, the
bandwidth of these networks has been increased by two
orders of magnitude [7], [8], [9], which greatly narrows the
performance gap between them and the dedicated networks
used in commercial storage systems. This significant
improvement offers an appealing opportunity to provide
cost-effective high-performance storage services by aggregating
existing storage resources on each commodity PC in
a computing cluster with such networks if a scalable
scheme is in place to efficiently virtualize these distributed
resources into a single-disk image. The key challenge in
realizing this objective lies in the potentially huge number
of nodes (in thousands) in such a cluster. Currently, clusters
with thousands of nodes are already in existence, and
clusters with even larger numbers of nodes are expected in
the near future.


RELATED WORK AND COMPARISON OF
DECENTRALIZATION SCHEMES

Many cluster-based storage systems employ centralized
metadata management. Experiments in GFS show that a
single MS is not a performance bottleneck in a storage
cluster with 100 nodes under a read-only Google searching
workload. PVFS [3], which is a RAID-0-style parallel file
system, also uses a single MS design to provide a
clusterwide shared namespace. As data throughput is the
most important objective of PVFS, some expensive but
indispensable functions such as the concurrent control
between data and metadata are not fully designed and
implemented. In CEFT [6], [10], [13], [17], which is an
extension of PVFS to incorporate a RAID-10-style fault
tolerance and parallel I/O scheduling, the MS synchronizes
concurrent updates, which can limit the overall throughput
under the workload of intensive concurrent metadata
updates. In Lustre [1], some low-level metadata management
tasks are offloaded from the MS to object storage
devices, and ongoing efforts are being made to decentralize
metadata management to further improve the scalability.


Table-Based Mapping
Globally replicating mapping tables is one approach to
decentralizing metadata management. There is a salient
trade-off between the space requirement and the granularity
and flexibility of distribution. A fine-grained table allows
more flexibility in metadata placement. In an extreme case,
if the table records the home MS for each individual file,
then the metadata of a file can be placed on any MS.
However, the memory space requirement for this approach
makes it unattractive for large-scale storage systems.


Hashing-Based Mapping
Modulus-based hashing is another decentralized scheme.
This approach hashes a symbolic pathname of a file to a
digital value and assigns its metadata to a server according
to the modulus value with respect to the total number of
MSs. In practice, the likelihood of serious skew of metadata
workload is almost negligible in this scheme, since the
number of frequently accessed files is usually much larger
than the number of MSs. However, a serious problem arises
when an upper directory is renamed or the total number of
MSs changes: the hashing mapping needs to be reimplemented,
and this requires all affected metadata to be
migrated among MSs.


Dynamic Tree Partitioning
Weil et al. [29] observe the disadvantages of the static tree
partition approach and propose to dynamically partition
the namespace across a cluster of MSs in order to scale up
the aggregate metadata throughput. The key design idea is
that initially, the partition is performed by hashing
directories near the root of the hierarchy, and when a
server becomes heavily loaded, this busy server automatically
migrates some subdirectories to other servers with
less load. It also proposes prefix caching to efficiently
utilize available RAM on all servers to further improve the
performance. This approach has three major disadvantages.
Reply
#11
to get information about the topic distributed meta data management for large cluster based storage systems full report ppt and related topic refer the page link bellow

http://studentbank.in/report-hba-distrib...ge-systems

http://studentbank.in/report-hba-distrib...tems--7758

http://studentbank.in/report-hba-distrib...-system-pa

http://seminarsprojects.in/showthread.ph...9#pid66469

http://studentbank.in/report-hba-distrib...age-system
Reply

Important Note..!

If you are not satisfied with above reply ,..Please

ASK HERE

So that we will collect data for you and will made reply to the request....OR try below "QUICK REPLY" box to add a reply to this page
Popular Searches: duggirala gayathri, nani ke coda, automatic reconfiguration for large scale reliable storage systems 2013, metadata explorer, hba project free download, hba australia, decentralization of metadata server,

[-]
Quick Reply
Message
Type your reply to this message here.

Image Verification
Please enter the text contained within the image into the text box below it. This process is used to prevent automated spam bots.
Image Verification
(case insensitive)

Possibly Related Threads...
Thread Author Replies Views Last Post
  ambient security expert systems seminars ppt 2 10,987 25-08-2018, 09:19 PM
Last Post: Zik
Thumbs Up High Technology: Underwater Inspection Systems donbold1012 0 8,116 09-08-2018, 01:42 PM
Last Post: donbold1012
  5d data storage technology ppt 2 1,609 01-12-2017, 09:43 PM
Last Post: Ajaykc
Wink uml diagram for privacy preserving public auditing for secure cloud storage 2 1,001 19-09-2016, 10:20 PM
Last Post: harikash
  matlab code energy based spectrum sensing in cognitive radio energy threshold based algorithm 2 1,051 06-08-2016, 03:30 PM
Last Post: murthyhs
  mobile embedded systems for home care applications ppt 1 755 08-07-2016, 09:32 AM
Last Post: visalakshik
  matlab simulink model of pumped storage hydro scheme 1 669 02-07-2016, 04:16 PM
Last Post: visalakshik
  ambient security expert systems seminars ppt 1 7,426 02-07-2016, 01:58 PM
Last Post: visalakshik
Wink bridge strengthening advanced composite systems 1 564 24-06-2016, 03:17 PM
Last Post: seminar report asees
  arun kumar vtu notes for signals and systems 1 790 08-06-2016, 04:20 PM
Last Post: dhanabhagya

Forum Jump: