article
Performance comparison of hierarchical checkpoint protocols grid computing
Autor
Massata NDIAYE, Ndeye
SENS, Pierre
THIARE, Ousmane
Institución
Resumen
Grid infrastructure is a large set of nodes
geographically distributed and connected by a communication. In
this context, fault tolerance is a necessity imposed by the
distribution that poses a number of problems related to the
heterogeneity of hardware, operating systems, networks,
middleware, applications, the dynamic resource, the scalability,
the lack of common memory, the lack of a common clock, the
asynchronous communication between processes. To improve the
robustness of supercomputing applications in the presence of
failures, many techniques have been developed to provide
resistance to these faults of the system. Fault tolerance is intended
to allow the system to provide service as specified in spite of
occurrences of faults. It appears as an indispensable element in
distributed systems. To meet this need, several techniques have
been proposed in the literature. We will study the protocols based
on rollback recovery. These protocols are classified into two
categories: coordinated checkpointing and rollback protocols and
log-based independent checkpointing protocols or message
logging protocols. However, the performance of a protocol
depends on the characteristics of the system, network and
applications running. Faced with the constraints of large-scale
environments, many of algorithms of the literature showed
inadequate. Given an application environment and a system, it is
not easy to identify the recovery protocol that is most appropriate
for a cluster or hierarchical environment, like grid computing.
While some protocols have been used successfully in small scale,
they are not suitable for use in large scale. Hence there is a need
to implement these protocols in a hierarchical fashion to compare
their performance in grid computing. In this paper, we propose
hierarchical version of four well-known protocols. We have
implemented and compare the performance of these protocols in
clusters and grid computing using the Omnet++ simulator.