Actas de congresos
Assessing the influence of multiple test case selection on mutation experiments
Fecha
2014-03-31Registro en:
IEEE International Conference on Software Testing, Verification, and Validation Workshops, 7, 2014, Cleveland, Ohio.
9780769551944
Autor
Delamaro, Márcio Eduardo
Offutt, Jeff
Institución
Resumen
Mutation testing is widely used in experiments.
Some papers experiment with mutation directly, while others
use it to introduce faults to measure the effectiveness of tests
created by other methods. There is some random variation in the
mutation score depending on the specific test values used. When
generating tests to use in experiments, a common, although not
universal practice, is to generate multiple sets of tests to satisfy
the same criterion or according to the same procedure, and then
to compute their average performance. Averaging over multiple
test sets is thought to reduce the variation in the mutation score.
This practice is extremely expensive when tests are generated by
hand (as is common) and as the number of programs increase (a
current positive trend in software engineering experimentation).
The research reported in this short paper asks a simple
and direct question: do we need to generate multiple sets of
test cases? That is, how do different test sets influence the
cost and effectiveness results? In a controlled experiment, we
generated 10 different test cases to be adequate for the Statement
Deletion (SSDL) mutation operator for 39 small programs and
functions, and then evaluated how they differ in terms of cost and
effectiveness. We found that averaging over multiple programs
was effective in reducing the variance in the mutation scores
introduced by specific tests