ASK HERE

seminar topics · 18-03-2010, 12:38 PM

[attachment=2722]

Flexible Rollback Recovery in Dynamic Heterogeneous Grid Computing
SCOPE OF THE PROJECT
The Objective of the project is to provide recover the lost node or file when it is crashed. Path of the node till be vary according to the selection.
INTRODUCTION

Grid and cluster architectures have gained popularity for computationally intensive parallel applications. However, the complexity of the infrastructure, consisting of computational nodes, mass storage, and interconnection networks, poses great challenges with respect to overall system reliability. Simple tools of reliability analysis show that as the complexity of the system increases, its reliability, and thus, Mean Time to Failure (MTTF), decreases. The reliability of the entire system is computed as the product of the reliabilities of all system components. For applications executing on large clusters or a Grid, the long execution times may exceed the MTTF of the infrastructure and, thus, render the execution infeasible. As an example, let us consider an execution lasting 10 days in a system that does not consider fault tolerance. Under the optimistic assumption that the MTTF of a single node is 2,000 days, the probability of failure of this long execution using 100, 200, or 500 nodes is 0.39, 0.63, or 0.91, respectively, approaching fast certain failure. The high failure probabilities are due to the fact that, in the absence of fault-tolerance mechanisms, the failure of a single node will cause the entire execution to fail. Note that this simple example does not even consider network failures, which are typically more likely than computer failure. Fault tolerance is, thus, a necessity to avoid failure in large applications, such as found in scientific computing, executing on a Grid, or large cluster. The fault-tolerance mechanisms also have to be capable of dealing with the specific characteristics of a heterogeneous and dynamic environment. Even if individual clusters are homogeneous, heterogeneity in a Grid is mostly unavoidable, since different participating clusters often use diverse hardware or software architectures. One possible solution to address heterogeneity is to use platform independent abstractions such as the Java Virtual Machine. However, this does not solve the problem in general. There is a large base of existing applications that have been developed in other languages. Reengineering may not be feasible due to performance or cost reasons. Environments like Microsoft .Net address portability but only few scientific applications on Grids or clusters exist. Whereas Grids and clusters are dominated by unix operating systems, e.g., Linux or Solaris, Microsoft .Net is Windows-centric with only recent or partial unix support. Besides heterogeneity, one has to address the dynamic nature of the Grid. Volatility is not only an intracluster issue, i.e., configuration changes within a cluster, but also an intercluster reality. Intracluster volatility may be the result of node failures, whereas intercluster volatility is caused by network disruptions between clusters. From an administrative viewpoint, the reality of Grid operation, such as cluster/node reservations or maintenance, may restrict long executions on fixed topologies due to the fact that operation at different sites may be hard to coordinate. It is usually difficult to reserve a large cluster for long executions, let alone scheduling extensive uninterrupted time on multiple, perhaps geographically dispersed, sites. Lastly, configuration changes may be induced by the application as the result of changes of runtime observable quality-of-service (QOS) parameters. To overcome the aforementioned problems and challenges, we present mechanisms that tolerate faults and operation-induced disruption of parts or the entire execution of the application. We introduce flexible rollback recovery mechanisms that impose no artificial restrictions on the execution. They do not depend on the pre-failure configuration and consider 1) node and cluster failures as well as operation-induced unavailability of resources and 2) dynamic topology reconfiguration in heterogeneous systems.
MODULES
Â¢ Analysis of nodes
Â¢ Data security using Theft Induced Checkpoint
Crash
Checkpoint using Local
Checkpoint using Forced.
Â¢ Data transmission using Systematic Event Logging
Â¢ Evaluating Theft Induced Checkpoint
Â¢ Evaluating Systematic Event Logging
MODULES DESCRIPTION
Â¢ Analysis of nodes
The node are analyzed which are all the failure node and how to recover the data or node using different technique.
Data Security using Theft induced checkpoint
The process of detecting who are all the intruder by having specific anomaly detection. Those members are eliminated from the network.
Crash
When one of the client process failed it leads to crash and partly the current process are crashed and system fails which also can affect the other processors.
Checkpoint using Local
The Local checkpoint is used to recover the processor from the crash. By giving local, with the help of Theft-Induced Protocol all the processes which are stored in a periodic time interval are recovered and send to client with the help of Systematic Event Logging protocol.
Checkpoint using Forced.
The Forced Checkpoint is also one of the recovery part in the theft induced Checkpoint protocol(TIC) during the system crash. By applying forced, with the help of TIC it recovers only the current events/processes and send to the client.
Data transmission using Systematic Event Logging
The process of transmitting the data with some particular protocols. Protocols have set of procedures to transmit the data
Evaluating Theft induced checkpoint
The process of evaluating how the intruders are detected in the TIC.
Evaluating Systematic Event Logging
The process of evaluating how the data rae transmitted using System Event Logging.
MODULES I/O:
Analysis of nodes
Input-n..nodes.
Output-Failure nodes.
Data Security using Theft induced checkpoint
Input-Intruders.
Output-Indentified intruders.
Crash
Input-nodes.
Output-Crashed nodes..
Checkpoint using Local
Input-Data& nodes.
Output-Recovered nodes& Data.
Checkpoint using Forced.
Input-Data& nodes.
Output-Recovered nodes &Data.
Data transmission using Systematic Event Logging
Input-Data.
Output-Transmitted Data.
Evaluating Theft induced checkpoint
Input-Existing results.
Output-Compared result.
Evaluating Systematic Event Logging
Input-Existing results.
Output-Compared result.
MODULE DIAGRAM
DATAFLOW DIAGRAM

ALGORITHMS/TECHNIQUES USED
THEFT-INDUCED CHECKPOINTING
The creation of checkpoints can be initiated by 1) work stealing or 2) at specific check pointing periods. We will first describe the protocol with respect to work-stealing, since it is the cause of the only communication (and thus, dependencies) between processes. Checkpoints resulting from work-stealing are called forced checkpoints. Then, we will consider the periodic checkpoints, called local checkpoints, which are stored periodically, after expiration of predefined periods.
SYSTEMATIC EVENT LOGGING
Systematic Event Logging which was derived from a log-based method . The motivation for SEL is to reduce the amount of computation that can be lost, which is bound by the execution time of a single failed task. In case of a fault, task duplication needs to be avoided during rollback. Specifically, in the implementation, one has to guarantee that only one instance of a any given task can exist. In the absence of such guarantee, it could happen that during rollback a task recreates other tasks or data objects that already exist from earlier failed executions. Note that, depending on the timing of the fault, this could result in a significant number of duplicated nodes, since each duplicated task itself may be the initiator of a significant portion of computation. In our implementation of SEL, duplication avoidance is achieved using a unique and reproducible identification method of all vertices in the graph.
ADVANTAGES
Efficient Theft detection.
Rollback recovery.
Time consumption.
APPLICATION
QUADRATIC ASSIGNMENT PROBLEM
The local experiments were conducted on the iCluster2,which consists of 104 nodes interconnected by a 100-Mbps Ethernet network, each node featuring two Itanium-2 processors (900 MHz) and 3 Gbytes of local memory. The intercluster experiments were conducted on Grid5000,which consists of clusters located at nine French institutions.

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	SAMBA SERVER ADMINISTRATION full report	project report tiger	3	4,759	17-01-2018, 05:40 PM Last Post: AustinnuAke
	air ticket reservation system full report	project report tiger	16	46,887	08-01-2018, 02:33 PM Last Post: RaymondGom
	Platform Autonomous Custom Scalable Service using Service Oriented Cloud Computing Ar		1	1,050	15-02-2017, 04:39 PM Last Post: jaseela123d
	Cloud Computing with Service Oriented Architecture in Business Applications		1	909	15-02-2017, 11:55 AM Last Post: jaseela123d
	Cloud Computing Security: From Single to Multi-Clouds		1	831	14-02-2017, 04:56 PM Last Post: jaseela123d
	SPOC: A Secure and Privacy-preserving Opportunistic Computing Framework for Mobile-He		1	907	14-02-2017, 03:49 PM Last Post: jaseela123d
	An Efficient Algorithm for Mining Frequent Patterns full report	project topics	3	4,764	01-10-2016, 10:02 AM Last Post: Guest
	online examination full report	project report tiger	14	42,892	03-09-2016, 11:20 AM Last Post: jaseela123d
	Employee Cubicle Management System full report	computer science technology	4	5,121	07-04-2016, 11:37 AM Last Post: dhanabhagya
	e-Post Office System full report	computer science technology	27	25,986	30-03-2016, 02:56 PM Last Post: dhanabhagya

Important Note..!

ASK HERE