Actas de congresos
An Efficient Checkpointing Protocol For The Minimal Characterization Of Operational Rollback-dependency Trackability
Registro en:
769522394
Proceedings Of The Ieee Symposium On Reliable Distributed Systems. , v. , n. , p. 126 - 135, 2004.
10609857
10.1109/RELDIS.2004.1353013
2-s2.0-16244388928
Autor
Garcia I.C.
Buzato L.E.
Institución
Resumen
A checkpointing protocol that enforces rollback-dependency trackability (RDT) during the progress of a distributed computation must induce processes to take forced checkpoints to avoid the formation of non-trackable rollback dependencies. A protocol based on the minimal characterization of RDT tests only the smallest set of non-trackable dependencies. The literature indicated that this approach would require the processes to maintain and propagate O(n 2) control information, where n is the number of processes in the computation. In this paper, we present a protocol that implements this approach using only O(n) control information. © 2004 IEEE.
126 135 Baldoni, R., Helary, J.M., Mostefaoui, A., Raynal, M., A communication-induced checkpoint protocol that ensures rollback dependency trackability (1997) IEEE Symposium on Fault Tolerant Computing (FTCS'97), pp. 68-77 Baldoni, R., Helary, J.M., Raynal, M., Rollback-dependency trackability: Visible characterizations (1999) 18th Symposium on the Principles of Distributed Computing (PODC'99), , Atlanta (USA), May Baldoni, R., Helary, J.M., Raynal, M., Rollback-dependency trackability: A minimal characterization and its protocol (2001) Information and Computation, 165 (2), pp. 144-173. , Mar Cao, G., Singhal, M., Checkpointing with mutable check-points (2003) Theoretical Computer Science, 209 (2), pp. 1127-1148 Babaoǧlu, Ö., Marzullo, K., Consistent global states of distributed systems: Fundamental concepts and mechanisms (1993) Distributed Systems, pp. 55-96. , S. Mullender, editor. Addison-Wesley Elnozahy, E.N., Alvisi, L., Wang, Y.M., Johnson, D.B., A survey of rollback-recovery protocols in message-passing systems (2002) ACM Computing Surveys, 34 (3), pp. 375-408. , Sept Garcia, I.C., Buzato, L.E., On the minimal characterization of rollback-dependency trackability property (2001) Proceedings of the 21th IEEE Int. Conf. on Distributed Computing Systems, , Phoenix, Arizona, EUA, Apr Garcia, I.C., Vieira, G.M.D., Buzato, L.E., RDT-partner: An efficient checkpointing protocol that enforces rollback-dependency trackability (2001) Simpósio Brasileiro de Redes de Computadores, , Florianópolis, Santa Catarina, May Venkatesh, T.R.K., Li, H.F., Optimal checkpointing and local recording for domino-free rollback recovery (1987) Information Processing Letters, 25 (5), pp. 295-303 Lamport, L., Time, clocks, and the ordering of events in a distributed system (1978) Commun. ACM, 21 (7), pp. 558-565. , July Manivannan, D., Singhal, M., Quasi-synchronous check-pointing: Models, characterization, and classification (1999) IEEE Trans, on Parallel and Distributed Systems, 10 (7). , July Netzer, R.H.B., Xu, J., Necessary and sufficient conditions for consistent global snapshots (1995) IEEE Trans. on Parallel and Distributed Systems, 6 (2), pp. 165-169 Prakash, R., Singhal, M., Low-cost checkpointing and failure recovery in mobile computing systems (1996) IEEE Trans. on Parallel and Distributed Systems, 7 (10), pp. 1035-1048. , Oct Tsai, J., On properties of RDT communication-induced checkpointing protocols (2003) IEEE Trans. on Parallel and Distributed Systems, 14 (8). , Aug Tsai, J., Kuo, S.Y., Wang, Y.M., Theoretical analysis for communication-induced checkpointing protocols with rollback-dependency trackability (1998) IEEE Trans. on Parallel and Distributed Systems, , Oct Wang, Y.M., Consistent global checkpoints that contain a given set of local checkpoints (1997) IEEE Trans, on Computers, 46 (4), pp. 456-468. , Apr