Dissertação de Mestrado
Assessing the Reliability of Visual Explanations of Deep Models through Adversarial Perturbation
Fecha
2019-03-27Autor
Dan Nascimento Gomes do Valle
Institución
Resumen
The increasing interest in complex deep neural networks for new applications demands transparency in their decisions, which leads to the need for reliable explanations of such models. Recent works have proposed new explanation methods to present interpretable visualizations of the relevance of input instances. These methods calculate relevance maps which often focus on different pixel regions and are commonly compared by visual inspection. This means that evaluations are based on human expectation instead of actual feature importance. In this work, we propose an effective metric for evaluating the reliability of the explanation of models. This metric is based on changes in the network's outcome resulted from the perturbation of input images in an adversarial way. These perturbations consider every relevance value and its inversion (irrelevance) so that the metric has characteristics of precision and recall. We also propose a direct application of this metric to filter relevance maps in order to create more interpretable images without any loss in essential explanation. We present a comparison between some widely-known explanation methods and their results using the proposed metric. We also expand the results into a discussion on visualization techniques and the amount of information lost to make them more interpretable. Then, we show the results of our filtering method which tackles this problem. In addition, we further present an in-depth analysis of the properties of the metric which make it appropriate for a variety of tasks. It shows the importance of using the irrelevance, the robustness to random values and misclassified images, and the correlation between the metric and the loss of the model evaluated.