Objeto de conferencia
Some Issues to Consider in the Management of Energy Consumption in HPC Systems with Fault Tolerance
Registro en:
isbn:978-950-34-2126-0
Autor
Morán, Marina
Balladini, Javier
Rexachs del Rosario, Dolores
Rucci, Enzo
Institución
Resumen
Inquiring about different ways to reduce energy consumption during the execution of large-scale applications is essential to maintain and increase the enormous computing power achieved in HPC systems.
Fault tolerance methods can have an impact on power consumption. In particular, rollback-recovery methods using uncoordinated checkpoints prevent all processes from re-executing in the event of a failure. In this context, it is possible to take actions on the nodes of the processes that do not re-execute to reduce energy consumption. In this work, we describe some issues to consider when we extend the application of energy-saving strategies beyond the nodes that communicate directly with the failed one. Instituto de Investigación en Informática