doctoralThesis
Mutação de transformações para teste de programas Spark
Fecha
2020-07-31Registro en:
SOUZA NETO, João Batista de. Mutação de transformações para teste de programas Spark. 2020. 231f. Tese (Doutorado em Ciência da Computação) - Centro de Ciências Exatas e da Terra, Universidade Federal do Rio Grande do Norte, Natal, 2020.
Autor
Souza Neto, João Batista de
Resumen
The growth in the volume of data generated in the last years, a phenomenon known
as Big Data, presented a series of challenges for its collection, storage and, especially,
processing because they require important computational resources and adapted execution
environments. Different parallel and distributed processing systems are used for Big Data
processing. Some systems adopt a control flow model, such as Hadoop, that applies the
MapReduce programming style, while others adopt a data flow model, such as Apache
Spark. The reliability of large-scale data processing programs becomes important due
to the large amount of computational resources required for their execution, making
it important to test them before they run in production in an expensive distributed
computing infrastructure. This thesis proposes a mutation testing approach for programs
that follow a data flow model like Apache Spark. Mutation testing is a testing technique
that relies on simulating faults by modifying a program to create faulty versions called
mutants. The generation of mutants is carried by mutation operators that are able to
simulate specific faults in the program. Mutants are used in the test design and evaluation
process in order to have a test set capable of identifying the faults simulated by the
mutants. In order to apply the mutation testing process to Big Data processing programs,
it is important to be aware of the types of faults that can be found in this context to design
mutation operators that can simulate them. Thus, we conducted a study to characterize
faults and problems that can appear in Spark programs. Based on this study, we designed
a set of mutation operators for programs that follow a data flow model. These operators
simulate faults in the program through changes in its data flow and operations. The
mutation operators were formalized with a model we propose to represent data processing
programs based on data flow. To support the application of our mutation operators, we
developed the tool TRANSMUT-Spark that automates the main steps of the mutation
testing process in Spark programs. We conducted experiments to evaluate the mutation
operators and tool in terms of costs and effectiveness. The results of these experiments
showed the feasibility of applying the mutation testing process in Spark programs and
their contribution to the testing process in order to develop more reliable programs