Escalonamento adaptativo para o Apache Hadoop

Cassales, Guilherme Weigert

Dissertação

Fecha

2016-03-11

Registro en:

http://repositorio.ufsm.br/handle/1/12025

Autor

Cassales, Guilherme Weigert

Institución

Universidade Federal de Santa Maria (Brasil)

Resumen

Many alternatives have been employed in order to process all the data generated by current applications in a timely manner. One of these alternatives, the Apache Hadoop, combines parallel and distributed processing with the MapReduce paradigm in order to provide an environment that is able to process a huge data volume using a simple programming model. However, Apache Hadoop has been designed for dedicated and homogeneous clusters, a limitation that creates challenges for those who wish to use the framework in other circumstances. Often, acquiring a dedicated cluster can be impracticable due to the cost, and the acquisition of reposition parts can be a threat to the homogeneity of a cluster. In these cases, an option commonly used by the companies is the usage of idle computing resources in their network, however the original distribution of Hadoop would show serious performance issues in these conditions. Thus, this study was aimed to improve Hadoop’s capacity of adapting to pervasive and shared environments, where the availability of resources will undergo variations during the execution. Therefore, context-awareness techniques were used in order to collect information about the available capacity in each worker node and distributed communication techniques were used to update this information on scheduler. The joint usage of both techniques aimed at minimizing and/or eliminating the overload that would happen on shared nodes, resulting in an improvement of up to 50% on performance in a shared cluster, when compared to the original distribution, and indicated that a simple solution can positively impact the scheduling, increasing the variety of environments where the use of Hadoop is possible.

Materias

Apache Hadoop

Escalonamento

Sensibilidade ao contexto

Scheduling

Context-aware

Mostrar el registro completo del ítem