Trabajo de grado - Maestría
Formal robust explanations for deep reinforcement learning models
Fecha
2023-07-10Registro en:
instname:Universidad de los Andes
reponame:Repositorio Institucional Séneca
Autor
Patiño Sáenz, Michel Andrés
Institución
Resumen
Deep neural networks are black box models for which there is no established formal solution on how to interpret their behavior. Abductive explanations are formal explanations that entail an observation within a logical system and satisfy certain minimality criteria. These explanations have been known to be computed for deep neural networks with binary input features and for neural networks with continuous input features. It is not currently known if a "deletion" algorithm designed to compute abductive explanations could be modified and extended to reinforcement learning tasks with continuous input features. Here, we show evidence that explanations generated by this algorithm may be biased. The algorithm favors the inclusion of features deleted later in the execution, a so called "order effect". We proposed a solution on how to fix this problem and designed an elementary algorithm to compute robust, formal and "non-biased" explanations to deep reinforcement learning model predictions. Our results suggest that this bias may be present in other implementations of the deletion algorithm for machine learning models in general, including the ones that have discrete input features, affecting models with bigger input dimensions more strongly. In the future, new methods to compute abductive explanations or other types formal explanations should be explored for deep reinforcement learning and machine learning in general