Formal robust explanations for deep reinforcement learning models

Patiño Sáenz, Michel Andrés

Trabajo de grado - Maestría

Fecha

2023-07-10

Registro en:

http://hdl.handle.net/1992/68513

instname:Universidad de los Andes

reponame:Repositorio Institucional Séneca

repourl:https://repositorio.uniandes.edu.co/

https://repositorioslatinoamericanos.uchile.cl/handle/2250/8726784

Autor

Patiño Sáenz, Michel Andrés

Institución

Universidad de los Andes (Colombia)

Resumen

Deep neural networks are black box models for which there is no established formal solution on how to interpret their behavior. Abductive explanations are formal explanations that entail an observation within a logical system and satisfy certain minimality criteria. These explanations have been known to be computed for deep neural networks with binary input features and for neural networks with continuous input features. It is not currently known if a "deletion" algorithm designed to compute abductive explanations could be modified and extended to reinforcement learning tasks with continuous input features. Here, we show evidence that explanations generated by this algorithm may be biased. The algorithm favors the inclusion of features deleted later in the execution, a so called "order effect". We proposed a solution on how to fix this problem and designed an elementary algorithm to compute robust, formal and "non-biased" explanations to deep reinforcement learning model predictions. Our results suggest that this bias may be present in other implementations of the deletion algorithm for machine learning models in general, including the ones that have discrete input features, affecting models with bigger input dimensions more strongly. In the future, new methods to compute abductive explanations or other types formal explanations should be explored for deep reinforcement learning and machine learning in general

Materias

Mostrar el registro completo del ítem