Objeto de conferencia
H-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud Environments
Autor
Royo, Ambrosio
Villamayor, Jorge
Castro-León, Marcela
Rexachs del Rosario, Dolores
Luque Fadón, Emilio
Institución
Resumen
Even though the cloud platform promises to be reliable, several availability incidents prove that they are not. How can we be sure that a parallel application finishes the execution even if a site is affected by a failure? This paper presents H-RADIC, an approach based on RADIC architecture, that executes a parallel application in at least 3 different virtual clusters or sites. The execution state of each site is saved periodically in another site and it is recovered in case of failure. The paper details the configuration of the architecture and the experiments results using 3 virtual clusters running NAS parallel applications protected with DMTCP, a very well-known distributed multi-threaded checkpoint tool. Our experiments show that the execution time was increased between a 5% to 36% without failures and 27% to 66% in case of failures. Facultad de Informática