ASK HERE

seminar class · 19-04-2011, 09:48 AM

[attachment=12340]
1. Introduction of Fault Tolerance:
1.1 Definition of fault Tolerance: - Fault-Tolerance or graceful degradation is the property that enables a system to continue operating properly in the event of the failure of (or one or more faults within) some of its components. If its operating quality decreases at all, the decrease is proportional to the hardness of the failure, as compared to a naïvely-designed system in which even a small failure can cause total breakdown. Fault-tolerance is particularly sought-after in high-availability or life-critical systems.
Fault-tolerance is not just a property of individual machines; it may also characterize the rules by which they interact. For example, the Transmission Control Protocol (TCP) is designed to allow reliable two-way communication in a packet-switched network, even in the presence of communications links which are imperfect or overloaded. It does this by requiring the endpoints of the communication to expect packet loss, duplication, reordering and corruption, so that these conditions do not damage data integrity, and only reduce throughput by a proportional amount.
Recovery from errors in fault-tolerant systems can be characterized as either roll-forward or roll-back. When the system detects that it has made an error, roll-forward recovery takes the system state at that time and corrects it, to be able to move forward. Roll-back recovery reverts the system state back to some earlier, correct version, for example using check pointing, and moves forward from there. Roll-back recovery requires that the operations between the checkpoint and the detected erroneous state can be made idempotent. Some systems make use of both roll-forward and roll-back recovery for different errors or different parts of one error.
Within the scope of an individual system, fault-tolerance can be achieved by anticipating exceptional conditions and building the system to cope with them, and, in general, aiming for self-stabilization so that the system focuses towards an error-free state. However, if the consequences of a system failure are tragic, or the cost of making it sufficiently reliable is very high, a better solution may be to use some form of duplication. In any case, if the consequence of a system failure is tragic, the system must be able to use reversion to fall back to a safe mode. This is similar to roll-back recovery but can be a human action if humans are present in the loop.
1.2. Fault Tolerance Requirements:-
The basic characteristics of fault tolerance require:
1. No single point of repair.
2. Fault isolation to the failing component.
3. Fault containment to prevent propagation of the failure.
4. Availability of reversion modes.
In addition, fault tolerant systems are characterized in terms of both planned service outages and unplanned service outages. These are usually measured at the application level and not just at a hardware level. The figure of merit is called availability and is expressed as a percentage. Fault-tolerant systems are typically based on the concept of redundancy.
1.3 Fault-tolerant Computer Systems:-
Fault-tolerant computer systems are systems designed around the concepts of fault tolerance. In essence, they have to be able to keep working to a level of satisfaction in the presence of faults Most fault-tolerant computer systems are designed to be able to handle several possible failures, including hardware-related faults such as hard disk failures, input or output device failures, or other temporary or permanent failures; software bugs and errors; interface errors between the hardware and software, including driver failures; operator errors, such as erroneous keystrokes, bad command sequences, or installing unexpected software; and physical damage or other flaws introduced to the system from an outside source.
Hardware fault-tolerance is the most common application of these systems, designed to prevent failures due to hardware components. Typically, components have multiple backups and are separated into smaller "segments" that act to contain a fault, and extra redundancy is built into all physical connectors, power supplies, fans, etc. There are special software and instrumentation packages designed to detect failures, such as fault masking, which is a way to ignore faults by seamlessly preparing a backup component to execute something as soon as the instruction is sent, using a sort of voting protocol where if the main and backups don't give the same results, the flawed output is ignored.
Software fault-tolerance is based more around nullifying programming errors using real-time redundancy, or static "emergency" subprograms to fill in for programs that crash. There are many ways to conduct such fault-regulation, depending on the application and the available hardware. Fault tolerance verification and validation:- The most important requirement of design in a fault tolerant computer system is making sure it actually meets its requirements for reliability. This is done by using various failure models to simulate various failures, and analyzing how well the system reacts. These statistical models are very complex, involving probability curves and specific fault rates, latency curves, error rates, and the like. The most commony used models are HARP, SAVE, and SHARPE in the USA, and SURF or LASS in Europe.
2. Introduction Of Virtual Machine Environment:
A virtual machine (VM) is a software implementation of a machine (i.e. a computer) that executes programs like a physical machine. Virtual machines are separated into two major categories, based on their use and degree of correspondence to any real machine. A system virtual machine provides a complete system platform which supports the execution of a complete operating system (OS).
In contrast, a process virtual machine is designed to run a single program, which means that it supports a single process. An essential characteristic of a virtual machine is that the software running inside is limited to the resources and abstractions provided by the virtual machine-it cannot break out of its virtual world.
System Virtual Machines: System virtual machines (sometimes called hardware virtual machines) allow the sharing of the underlying physical machine resources between different virtual machines, each running its own operating system. The software layer providing the virtualization is called a virtual machine monitor.
The main advantages of VMs are:
• Multiple OS environments can co-exist on the same computer, in strong isolation from each other.
• The virtual machine can provide an instruction set architecture (ISA) that is somewhat different from that of the real machine.
• Application provisioning, maintenance, high availability and disaster recovery.
The main disadvantages of VMs are:
• A virtual machine is less efficient than a real machine when it accesses the hardware indirectly.
• When multiple VMs are concurrently running on the same physical host, each VM may exhibit a varying and unstable performance (Speed of Execution, and not results), which highly depends on the workload imposed on the system by other VMs, unless proper techniques are used for temporal isolation among virtual machines.
Process virtual machines: A process VM, sometimes called an application virtual machine, runs as a normal application inside an OS and supports a single process. It is created when that process is started and destroyed when it exits. Its purpose is to provide a platform independent programming environment that abstracts away details of the underlying hardware or operating system, and allows a program to execute in the same way on any platform. A process VM provides a high-level abstraction — that of a high level of programming language.
VMware (Virtual Machine Environment) states the following:-
“Maximize uptime in your datacenter and reduce downtime management costs by enabling VMware Fault Tolerance for your virtual machines. VMware Fault Tolerance, based on vLockstep technology, provides zero downtime, zero data loss continuous availability for your applications, without the cost and complexity of traditional hardware or software clustering solutions”.
3. Introduction Of VMware Fault Tolerance:-
VMware Fault Tolerance is a pioneering new component of VMware vSphere that provides continuous availability to applications, preventing downtime and data loss in the event of server failures. VMware Fault Tolerance, built using Vmware vLockstep technology, provides operational continuity and high levels of uptime in VMware vSphere environments, with simplicity and at a low cost.
Vmware Fault Tolerance provides zero-downtime, zero-data-loss continuous availability for any application, without the cost or complexity of traditional solutions.
Benefits:-
• Eliminate expensive downtime or data loss due to server failures.
• Provide continuous service to any application, regardless of operating system.
• Provide uninterrupted service through an intuitive administrative interface.
With VMware Fault Tolerance, IT organizations can:
• Eliminate even the smallest of disruptions due to server hardware failures. VMware Fault Tolerance provides instantaneous, non disruptive failover in the event of server failures, protecting organizations from even the smallest of disruption or data loss when downtime costs can run into thousands of dollars in lost business.
• Provide continuous availability to any critical application. All applications that run inside a VMware virtual machine can be protected by VMware Fault Tolerance, allowing continuous levels of availability to be possible even for homegrown or custom applications. Automatic detection of failures and seamless failover ensure that applications continue to run without interruptions, user disconnects or data loss during hardware failures.
• Deliver uninterrupted service with simplicity and low cost. VMware Fault Tolerance works with existing VMware High Availability (HA) or VMware Distributed Resource Scheduler (DRS) clusters and can be simply turned on or turned off for virtual machines. When applications require operational continuity during critical periods such as month end or quarter end time periods for financial applications, VMware Fault Tolerance can be turned on with the click of a button to provide extra assurance. The operational simplicity of Vmware Fault Tolerance is matched by its low cost. VMware Fault Tolerance is simply included as a component in VMware vSphere, and requires no specialized dedicated hardware.
3.1. How Fault tolerance works in VMware:-
VMware Fault Tolerance provides continuous availability for virtual machines by creating and maintaining a Secondary VM that is identical to, and continuously available to replace, the Primary VM in the event of a failover situation. You can enable Fault Tolerance for most mission critical virtual machines. A duplicate virtual machine, called the Secondary VM, is created and runs in virtual lockstep with the Primary VM. VMware vLockstep captures inputs and events that occur on the Primary VM and sends them to the Secondary VM, which is running on another host. Using this information, the Secondary VM's execution is identical to that of the Primary VM. Because the Secondary VM is in virtual lockstep with the Primary VM, it can take over execution at any point without interruption, thereby providing fault tolerant protection.
The Primary and Secondary VMs continuously exchange heartbeats. This allows the virtual machine pair to monitor the status of one another to ensure that Fault Tolerance is continually maintained. A transparent failover occurs if the host running the Primary VM fails, in which case the Secondary VM is immediately activated to replace the Primary VM. A new Secondary VM is started and Fault Tolerance redundancy is reestablished within a few seconds. If the host running the Secondary VM fails, it is also immediately replaced. In either case, users experience no interruption in service and no loss of data.
A fault tolerant virtual machine and its secondary copy are not allowed to run on the same host. Fault Tolerance uses anti-affinity rules, which ensure that the two instances of the fault tolerant virtual machine are never on the same host. This ensures that a host failure cannot result in the loss of both virtual machines. Fault Tolerance avoids "split-brain" situations, which can lead to two active copies of a virtual machine after recovery from a failure. Atomic file locking on shared storage is used to coordinate failover so that only one side continues running as the Primary VM and a new Secondary VM is respawned automatically.
Note: The anti-affinity check is performed when the Primary VM is powered on. It is possible that the Primary and Secondary VMs can be on the same host when they are both in a powered-off state. This is normal behavior and when the Primary VM is powered on, the Secondary VM is started on a different host at that time
3.2. vLockstep Technology:-
vLockstep technology was developed to deliver architectural guarantees that the states of the primary and secondary virtual machines are identical at any point in the execution of instructions
running in the virtual machine. vLockstep accomplishes this by having the primary and the secondary execute identical sequences of x86 instructions. The primary captures all non determinism from within the processor as well as from virtual I/O devices.
Examples of nondeterminism include events received from virtual network interface cards, network packets destined for the primary virtual machine, user inputs, and timer events. The captured nondeterminism is sent across a logging network to the secondary. The secondary virtual machine uses the logs received over the logging network to replay the nondeterminism in a manner identical to the actions of the primary. The secondary thus executes the same series of instructions as the primary. See Figure 3.2: vLockstep architecture.
Because both the primary and secondary virtual machines execute the same instruction sequence, both initiate I/O operations. The main difference between them is the treatment of the outputs. The output of the primary always takes effect: disk writes are committed to disk and network packets are transmitted,
For example:- All output of the secondary is suppressed by the hypervisor. The external world cannot detect the existence of the secondary and, at all times, treats a fault tolerant virtual machine as single unit executing the workload.
vLockstep technology provides full system guarantees at each guest instruction boundary the primary and the secondary are identical, including all guest operating system and guest application state as well as all virtual hardware state. Because the secondary needs to see only nondeterministic inputs, the logging network can use conventional 1Gbps NICs and switches. Because both the primary and secondary are active and execute the same instruction stream at similar speeds, the overall performance impact is minimal. vLockstep technology requires physical processor extensions and was developed in collaboration with Intel and AMD. All currently shipping x86 server processors are vLockstep capable. vLockstep technology is fully integrated into the ESX hypervisor.

Possibly Related Threads...
Thread		Author	Replies	Views	Last Post
	Fault Secure Encoder and Decoder For NanoMemory Applications	computer girl	2	2,926	25-02-2015, 07:17 AM Last Post: Guest
	An Adaptive Programming Model for Fault-Tolerant Distributed Computing	electronics seminars	3	2,586	20-02-2012, 01:13 PM Last Post: seminar paper
	On Fault Representativeness of Software Fault Injection	Projects9	0	788	24-01-2012, 05:28 PM Last Post: Projects9
	Fault Localization Using Passive End-to-End Measurements and Sequential Testing for W	Projects9	0	762	23-01-2012, 04:43 PM Last Post: Projects9
	DEVELOPMENT OF AN ACCURATE TRANSMISSION LINE FAULT LOCATOR	seminar class	0	1,409	05-05-2011, 09:27 AM Last Post: seminar class
	Fault-Tolerant Distributed Channel Allocation in Mobile Ad-Hoc Networks	project topics	0	1,165	02-05-2011, 11:21 AM Last Post: project topics
	FAULT LOCALIZATION USING PROBING IN COMPUTER NETWORKS	seminar class	0	1,202	09-04-2011, 04:49 PM Last Post: seminar class
	USING THE CONCEPTUAL COHESION OF CLASSES FOR FAULT PREDICTION IN OBJECT ORIENTED SYST	electronics seminars	0	1,415	13-01-2010, 07:22 AM Last Post: electronics seminars

Important Note..!

ASK HERE