Actas de congresos
Non-blocking Synchronous Checkpointing Based On Rollback-dependency Trackability
Registro en:
0769526772; 9780769526775
Proceedings Of The Ieee Symposium On Reliable Distributed Systems. , v. , n. , p. 411 - 420, 2006.
10609857
10.1109/SRDS.2006.34
2-s2.0-38949211046
Autor
Sakata T.C.
Garcia I.C.
Institución
Resumen
This article proposes an original approach that applies the Rollback-Dependency Trackability (RDT) property to implement a new non-blocking synchronous checkpointing protocol, called RDT-NBS, that takes mutable checkpoints and efficiently supports concurrent initiators. Mutable checkpoints can be saved in non-stable storage and make it possible for non-blocking synchronous checkpointing protocols to save a minimal number of checkpoints in stable storage during the construction of a consistent global checkpoint. We prove that this minimality property does not hold in presence of concurrent checkpointing initiations. Even though, RDT-NBS uses mutable checkpoints to reduce the use of stable memory assuring the existence of a consistent global checkpoint in stable storage. We also present simulation results that compare RDT-NBS to quasisynchronous RDT. © 2006 IEEE.
411 420 Baldoni, R., Helary, J., Mostefaoui, A., Raynal, M., A Communication-Induced Checkpoint Protocol that Ensures Rollback Dependency Trackability (1997) IEEE Symp. on Fault Tolerant Computing Cao, G., Singhal, M., On Coordinated Checkpointing in Distributed Systems (1998) IEEE Trans. on Parallel and Distributed Systems, 9 (12), pp. 1213-1225. , Dec Cao, G., Singhal, M., On the Impossibility of Min-process Non-blocking Checkpointing and an Efficient Checkpointing Algorithm for Mobile Computing Systems (1998) Proc. 27th Internat. Conf. on Parallel Processing, pp. 37-44. , New York, IEEE Press Cao, G., Singhal, M., Checkpointing with Mutable Checkpoints (2003) Theoretical Computer Science, 290 (2), pp. 1127-1148. , jan Chandy, M., Lamport, L., Distributed Snapshots: Determining Global States of Distributed Systems (1985) ACM Transaction on Computing Systems, 3 (1), pp. 63-75. , Feb E. N. Elnozahy and D. B. J. ad W. Zwaenepoel. The Performance of Consistent Checkpointing. In Proc. of the 11th Symposium on Reliable Distributed Systems, pages 86-95, Oct. 1992Garcia, I.C., Buzato, L.E., Progressive Construction of Consistent Global Checkpoints (1999) 19th IEEE International Conference on Distributed Computing Systems, , Austin, Texas, USA, June I. C. Garcia and L. E. Buzato. Using Common Knowledge to Improve Fixed-Dependency-After-Send. In II Workshop de Testes e Tolerância a Falhas, Curitiba, Paraná, July 2000. Available as technical report number IC-99-22 (http://www.dcc.unicamp.br/ic-tr-ftp/1999/99-22.ps.gz)Garcia, I.C., Buzato, L.E., An Efficient Checkpointing Protocol for the Minimal Characterization of Operational Rollback-Dependency Trackability (2004) 23rd Symposium on Reliable Distributed Computing Systems, , Florianópolis, Santa Catarina, Oct Koo, R., Toueg, S., Checkpointing and RollbackRecovery for Distributed Systems (1987) IEEE Transaction on Software Engineering, 13, pp. 23-31. , Jan Kumar, P., Kumar, L., Chauhan, R., Gupta, V., A Non-intrusive Minimum Process Synchronous Checkpointing Protocol for Mobile Distributed Systems (2005) IEEE International Personal Wireless Communications, pp. 491-495. , jan Lamport, L., Time, Clocks, and the Ordering of Events in a Distributed System (1978) Commun. ACM, 21 (7), pp. 558-565. , July Manivannan, D., Netzer, R.H.B., Singhal, M., Finding Consistent Global Checkpoints in a Distributed Computation (1997) IEEE Trans. on Parallel and Distributed Systems, pp. 623-627. , June Mattern, F., Virtual Time and Global States of Distributed Systems (1989) Parallel and Distributed Algorithms, pp. 215-226. , Elsevier Science Publishers B.V, North-Holland Netzer, R.H.B., Xu, J., Necessary and Sufficient Conditions for Consistent Global Snapshots (1995) IEEE Transaction on Parallel and Distributed Systems, 6 (2), pp. 165-169 Prakash, R., Singhal, M., Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems (1996) IEEE Transaction on Parallel and Distributed Systems, 7 (10), pp. 1035-1048. , Oct Schmidt, F.P.R., Garcia, I.C., Buzato, L.E., Optimal Asynchronous Garbage Collection for RDT Checkpointing Protocols (2005) 25th IEEE International Conference on Distributed Computing Systems, , Columbus, Ohio, USA, June Randell, B., System Structure for Software Fault Tolerance (1975) IEEE Transaction on Software Engineering, 1 (2), pp. 220-232. , June Silva, L.M., Silva, J.G., Global Checkpointing for Distributed Programs (1992) Proc. of the 11th Symposium on Reliable Distributed Systems, pp. 155-162. , Oct Tsai, J., Kuo, S.-Y., Wang, Y.-M., Theoretical Analysis for Communication- Induced Checkpointing Protocols with Rollback-Dependency Trackability (1998) IEEE Transaction on Parallel and Distributed Systems, 9 (10), pp. 963-971. , Oct Vieira, G.M.D., Buzato, L.E., Distributed Checkpointing: Analysis and Benchmarks. Curitiba, Paraná (2006) Proceedings of Simpósio Brasileiro de Redes de Computadores, , May, To appear in Wang, Y.M., Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints (1997) IEEE Trans. on Computers, 46 (4), pp. 456-468. , Apr