22-03-2011, 04:53 PM
[attachment=10778]
FEATURES
Features of the RAIN system include scalability, dynamic reconfigurability, and high availability. Through software-implemented fault tolerance, the system tolerates multiple node, link, and switch failures, with no single point of failure. In addition to reliability, the RAIN architecture permits efficient use of network resources, such as multiple data paths and redundant storage, with graceful degradation in the presence of faults. A diagram of the RAIN testbed at Caltech is shown in Figure 3.1. We have 10 Intel Pentium workstations running the Linux operating system, each with two network interfaces. These are connected via four eight-way Myrinet switches. The RAIN technology has been transfered to RAINfinity, a start-up company focusing on creating clustered solutions for improving the performance and availability of Internet data centers.
We have identified the following key building blocks for distributed computing systems.
Communication: fault-tolerant interconnect topologies and reliable communication protocols. We describe network topologies that are resistant to partitioning, and a protocol guaran¬teeing a consistent history of page link failures. We also describe an implementation of the MPI standard on the RAIN communication layer.
Fault Management: techniques based on group membership. We describe an efficient token-based protocol that tolerates node and page link failures.
Storage: distributed data storage schemes based on error-control codes. We describe schemes that are optimal in terms of storage as well as encoding/decoding complexity.
We present three proof-of-concept applications based on the RAIN building blocks:
• A video server based on the RAIN communication and data storage components.
• A Web server based on the RAIN fault management component.
• A distributed checkpointing system based on the RAIN storage component, as well as a leader election protocol.
FEATURES
Features of the RAIN system include scalability, dynamic reconfigurability, and high availability. Through software-implemented fault tolerance, the system tolerates multiple node, link, and switch failures, with no single point of failure. In addition to reliability, the RAIN architecture permits efficient use of network resources, such as multiple data paths and redundant storage, with graceful degradation in the presence of faults. A diagram of the RAIN testbed at Caltech is shown in Figure 3.1. We have 10 Intel Pentium workstations running the Linux operating system, each with two network interfaces. These are connected via four eight-way Myrinet switches. The RAIN technology has been transfered to RAINfinity, a start-up company focusing on creating clustered solutions for improving the performance and availability of Internet data centers.
We have identified the following key building blocks for distributed computing systems.
Communication: fault-tolerant interconnect topologies and reliable communication protocols. We describe network topologies that are resistant to partitioning, and a protocol guaran¬teeing a consistent history of page link failures. We also describe an implementation of the MPI standard on the RAIN communication layer.
Fault Management: techniques based on group membership. We describe an efficient token-based protocol that tolerates node and page link failures.
Storage: distributed data storage schemes based on error-control codes. We describe schemes that are optimal in terms of storage as well as encoding/decoding complexity.
We present three proof-of-concept applications based on the RAIN building blocks:
• A video server based on the RAIN communication and data storage components.
• A Web server based on the RAIN fault management component.
• A distributed checkpointing system based on the RAIN storage component, as well as a leader election protocol.