Trabajo de grado - Doctorado
Improving heterogenous storage performance in HPC Cloud systems using efficient storage algorithms informed by statistical models
Fecha
2022-02-14Registro en:
Universidad Autónoma de Occidente
Repositorio Educativo Digital
Autor
Marquez Franco, Jack Daniels
Institución
Resumen
Scientific applications are widely used to solve complex problems from different do-
mains. These kinds of applications usually have demanding computational require-
ments. Hence they must be executed in HPC clusters to guarantee a successful
execution and find an optimal solution. In the last years, researchers have tried to
find an alternative to run their applications in cloud computing. Recent works have
been attempting to migrate the applications because they see a flexibility and sca-
lability model in cloud computing that can benefit them and their applications. The
cloud computing economic model, where you only pay for what you are using, can
reduce the cost of the acquisition, maintenance, and updates in comparison with
a HPC cluster. The deployment of HPC applications over cloud computing clusters
presents several challenges that have yet to be resolved. One potential problem con-
cerns storage systems and file systems, as cloud clusters do not use the same sto-
rage and file systems as HPC clusters. Therefore, HPC applications are affected by
overheads given by the different technologies and the entire environment. This dis-
sertation seeks to reduce HPC applications’ overhead, improving the performance of
applications running on heterogeneous storage systems in the HPC Cloud. To do so,
this dissertation characterizes the performance of High Performance Computing ap-
plications that make use of heterogeneous storage technologies in cloud computing
clusters. This dissertation also presents and validates the use of an Extreme Value
Theory-based model to characterize, analyze and predict the performance of these
applications. Finally, this dissertation presents a genetic algorithm that uses the pro-
posed model as input to solve an Integer Linear Programming problem formulated
for the data placement of the files used by the applications to the heterogeneous
storage devices in a HPC cloud system.