Learning budget assignment policies for autoscaling scientific workflows in the cloud

Garí Núñez, Yisel; Monge Bosdari, David Antonio; Mateos Diaz, Cristian Maximiliano; Garcia Garino, Carlos Gabriel

info:eu-repo/semantics/article

Fecha

2019-02

Registro en:

Garí Núñez, Yisel; Monge Bosdari, David Antonio; Mateos Diaz, Cristian Maximiliano; Garcia Garino, Carlos Gabriel; Learning budget assignment policies for autoscaling scientific workflows in the cloud; Springer; Cluster Computing-the Journal Of Networks Software Tools And Applications; 23; 1; 2-2019; 87-105

1386-7857

http://hdl.handle.net/11336/126104

CONICET Digital

CONICET

https://repositorioslatinoamericanos.uchile.cl/handle/2250/4406686

Autor

Garí Núñez, Yisel

Monge Bosdari, David Antonio

Mateos Diaz, Cristian Maximiliano

Garcia Garino, Carlos Gabriel

Institución

Consejo Nacional de Investigaciones Científicas y Tecnológicas (Argentina)

Resumen

Autoscalers exploit cloud-computing elasticity to cope with the dynamic computational demands of scientific workflows. Autoscalers constantly acquire or terminate virtual machines (VMs) on-the-fly to execute workflows minimizing makespan and economic cost. One key problem of workflow autoscaling under budget constraints (i.e. with a maximum limit in cost) is determining the right proportion between: (a) expensive but reliable VMs called on-demand instances, and (b) cheaper but subject-to-failure VMs called spot instances. Spot instances can potentially provide huge parallelism possibilities at low costs but they must be used wisely as they can fail unexpectedly hindering makespan. Given the unpredictability of failures and the inherent performance variability of clouds, designing a policy for assigning the budget for each kind of instance is not a trivial task. For such reason we formalize the described problem as a Markov decision process that allows us learning near-optimal policies from the experience of other baseline policies. Experiments over four well-known scientific workflows, demonstrate that learned policies outperform the baseline policies considering the aggregated relative percentage difference of makespan and execution cost. These promising results encourage the future study of new strategies aiming to find optimal budget policies applied to the execution of workflows in the cloud.

Materias

AUTOSCALING

CLOUD COMPUTING

MARKOV DECISION PROCESS

SCIENTIFIC WORKFLOWS

SPOT INSTANCES

Mostrar el registro completo del ítem