Formal robust explanations for deep reinforcement learning models

Patiño Sáenz, Michel Andrés

dc.contributor	Cardozo Álvarez, Nicolás
dc.contributor	Dusparic, Ivana
dc.contributor	Gauthier Umaña, Valerie Elisabeth
dc.contributor	FLAG research lab
dc.creator	Patiño Sáenz, Michel Andrés
dc.date.accessioned	2023-07-18T14:09:20Z
dc.date.accessioned	2023-09-06T23:44:10Z
dc.date.available	2023-07-18T14:09:20Z
dc.date.available	2023-09-06T23:44:10Z
dc.date.created	2023-07-18T14:09:20Z
dc.date.issued	2023-07-10
dc.identifier	http://hdl.handle.net/1992/68513
dc.identifier	instname:Universidad de los Andes
dc.identifier	reponame:Repositorio Institucional Séneca
dc.identifier	repourl:https://repositorio.uniandes.edu.co/
dc.identifier.uri	https://repositorioslatinoamericanos.uchile.cl/handle/2250/8726784
dc.description.abstract	Deep neural networks are black box models for which there is no established formal solution on how to interpret their behavior. Abductive explanations are formal explanations that entail an observation within a logical system and satisfy certain minimality criteria. These explanations have been known to be computed for deep neural networks with binary input features and for neural networks with continuous input features. It is not currently known if a "deletion" algorithm designed to compute abductive explanations could be modified and extended to reinforcement learning tasks with continuous input features. Here, we show evidence that explanations generated by this algorithm may be biased. The algorithm favors the inclusion of features deleted later in the execution, a so called "order effect". We proposed a solution on how to fix this problem and designed an elementary algorithm to compute robust, formal and "non-biased" explanations to deep reinforcement learning model predictions. Our results suggest that this bias may be present in other implementations of the deletion algorithm for machine learning models in general, including the ones that have discrete input features, affecting models with bigger input dimensions more strongly. In the future, new methods to compute abductive explanations or other types formal explanations should be explored for deep reinforcement learning and machine learning in general
dc.language	eng
dc.publisher	Universidad de los Andes
dc.publisher	Maestría en Ingeniería de Sistemas y Computación
dc.publisher	Facultad de Ingeniería
dc.publisher	Departamento de Ingeniería Sistemas y Computación
dc.relation	[1] A. Ignatiev, N. Narodytska, and J. Marques-Silva, "Abduction-Based Explanations for Machine Learning Models," Nov. 2018, doi: 10.48550/arXiv.1811.10656.
dc.relation	[2] C. Liu, T. Arnon, C. Lazarus, C. Strong, C. Barrett, and M. J. Kochenderfer, "Algorithms for Verifying Deep Neural Networks," Mar. 2019, doi: 10.48550/arXiv.1903.06758.
dc.relation	[3] J. Marques-Silva and A. Ignatiev, "Delivering Trustworthy AI through Formal XAI," Proc. AAAI Conf. Artif. Intell., vol. 36, no. 11, Art. no. 11, Jun. 2022, doi: 10.1609/aaai.v36i11.21499.
dc.relation	[4] A. Heuillet, F. Couthouis, and N. Díaz-Rodríguez, "Explainability in Deep Reinforcement Learning," Aug. 2020, doi: 10.48550/arXiv.2008.06693.
dc.relation	[5] P. Linardatos, V. Papastefanopoulos, and S. Kotsiantis, "Explainable AI: A Review of Machine Learning Interpretability Methods," Entropy, vol. 23, no. 1, Art. no. 1, Jan. 2021, doi: 10.3390/e23010018.
dc.relation	[6] A. Krajna, M. Brcic, T. Lipic, and J. Doncevic, "Explainability in reinforcement learning: perspective and position," Mar. 2022, doi: 10.48550/arXiv.2203.11547.
dc.relation	[7] E. Puiutta and E. M. Veith, "Explainable Reinforcement Learning: A Survey," ArXiv200506247 Cs Stat, May 2020, Accessed: Jun. 26, 2021. [Online]. Available: http://arxiv.org/abs/2005.06247
dc.relation	[8] A. Albarghouthi, "Introduction to Neural Network Verification," Sep. 2021, doi: 10.48550/arXiv.2109.10317.
dc.relation	[9] E. La Malfa, A. Zbrzezny, R. Michelmore, N. Paoletti, and M. Kwiatkowska, "On Guaranteed Optimal Robust Explanations for NLP Models," May 2021, doi: 10.24963/366.
dc.relation	[10] A. Ignatiev, N. Narodytska, N. Asher, and J. Marques-Silva, "On Relating ¿Why? and ¿Why Not?" Explanations." arXiv, Dec. 20, 2020. doi: 10.48550/arXiv.2012.11067.
dc.relation	[11] T. Eiter and G. Gottlob, "The complexity of logic-based abduction," J. ACM, vol. 42, no. 1, pp. 3-42, Jan. 1995, doi: 10.1145/200836.200838.
dc.relation	[12] A. Ignatiev, N. Narodytska, and J. Marques-Silva, "On Relating Explanations and Adversarial Examples," in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2019. Accessed: Dec. 12, 2022. [Online]. Available: https://papers.nips.cc/paper/2019/hash/7392ea4ca76ad2fb4c9c3b6a5c6e31e3-Abstract.html
dc.relation	[13] ¿,¿-CROWN (alpha-beta-CROWN): A Fast and Scalable Neural Network Verifier using the Bound Propagation Framework. Verified Intelligence, Jun. 22, 2023. Accessed: Jun. 26, 2023. [Online]. Available: https://github.com/Verified-Intelligence/alpha-beta-CROWN
dc.relation	[14] H. Zhang, T.-W. Weng, P.-Y. Chen, C.-J. Hsieh, and L. Daniel, "Efficient Neural Network Robustness Certification with General Activation Functions." arXiv, Nov. 02, 2018. doi: 10.48550/arXiv.1811.00866.
dc.relation	[15] H. Salman, G. Yang, H. Zhang, C.-J. Hsieh, and P. Zhang, "A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks." arXiv, Jan. 09, 2020. doi: 10.48550/arXiv.1902.08722.
dc.relation	[16] K. Xu et al., "Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond." arXiv, Oct. 25, 2020. doi: 10.48550/arXiv.2002.12920.
dc.relation	[17] K. Xu et al., "Fast and Complete: Enabling Complete Neural Network Verification with Rapid and Massively Parallel Incomplete Verifiers." arXiv, Mar. 16, 2021. doi: 10.48550/arXiv.2011.13824.
dc.relation	[18] H. Zhang et al., "General Cutting Planes for Bound-Propagation-Based Neural Network Verification." arXiv, Dec. 04, 2022. doi: 10.48550/arXiv.2208.05740.
dc.relation	[19] C. Brix, "vnncomp2022_benchmarks." Apr. 28, 2023. Accessed: Jun. 26, 2023. [Online]. Available: https://github.com/ChristopherBrix/vnncomp2022_benchmarks
dc.relation	[20] M. N. Müller, C. Brix, S. Bak, C. Liu, and T. T. Johnson, "The Third International Verification of Neural Networks Competition (VNN-COMP 2022): Summary and Results." arXiv, Feb. 16, 2023. doi: 10.48550/arXiv.2212.10376.
dc.relation	[21] OpenAI, "Gym: A toolkit for developing and comparing reinforcement learning algorithms." https://gym.openai.com (accessed Jul. 05, 2021).
dc.relation	[22] U. J. Ravaioli, J. Cunningham, J. McCarroll, V. Gangal, K. Dunlap, and K. L. Hobbs, "Safe Reinforcement Learning Benchmark Environments for Aerospace Control Systems," in 2022 IEEE Aerospace Conference (AERO), Mar. 2022, pp. 1-20. doi: 10.1109/AERO53065.2022.9843750.
dc.relation	[23] T. Freiesleben, "The Intriguing Relation Between Counterfactual Explanations and Adversarial Examples," Minds Mach., vol. 32, no. 1, pp. 77-109, Mar. 2022, doi: 10.1007/s11023-021-09580-9.
dc.relation	[24] X. Ma et al., "Understanding Adversarial Attacks on Deep Learning Based Medical Image Analysis Systems," Pattern Recognit., vol. 110, p. 107332, Feb. 2021, doi: 10.1016/j.patcog.2020.107332.
dc.relation	[25] A. Barto and S. Mahadevan, "Recent Advances in Hierarchical Reinforcement Learning," Discrete Event Dyn. Syst. Theory Appl., vol. 13, Dec. 2002, doi: 10.1023/A:1025696116075.
dc.relation	[26] V. Mnih et al., "Playing Atari with Deep Reinforcement Learning." arXiv, Dec. 19, 2013. doi: 10.48550/arXiv.1312.5602.
dc.relation	[27] M. Fischetti and J. Jo, "Deep neural networks and mixed integer linear optimization," Constraints, vol. 23, no. 3, pp. 296-309, Jul. 2018, doi: 10.1007/s10601-018-9285-6.
dc.relation	[28] D. Shriver, S. Elbaum, M. Dwyer, A. Silva, and K. R. M. Leino, "DNNV: A Framework for Deep Neural Network Verification," Computer Aided Verification - 33rd International Conference, CAV 2021, Virtual Event, July 20-23, 2021, Proceedings, Part I. in Computer Aided Verification. Springer International Publishing, pp. 137-150, Jul. 2021. doi: 10.1007/978-3-030-81685-8_6.
dc.relation	[29] S. Bak, C. Liu, and T. Johnson, "The Second International Verification of Neural Networks Competition (VNN-COMP 2021): Summary and Results." arXiv, Aug. 30, 2021. doi: 10.48550/arXiv.2109.00498.
dc.relation	[30] "Cart Pole - Gym Documentation." https://www.gymlibrary.dev/environments/classic_control/cart_pole/ (accessed Jun. 26, 2023).
dc.relation	[31] "Playing CartPole with the Actor-Critic method \| TensorFlow Core," TensorFlow. https://www.tensorflow.org/tutorials/reinforcement_learning/actor_critic (accessed Jun. 26, 2023).
dc.relation	[32] "Lunar Lander - Gym Documentation." https://www.gymlibrary.dev/environments/box2d/lunar_lander/ (accessed Jun. 26, 2023).
dc.relation	[33] "openai/gym." OpenAI, Jun. 27, 2023. Accessed: Jun. 27, 2023. [Online]. Available: https://github.com/openai/gym/blob/dcd185843a62953e27c2d54dc8c2d647d604b635/gym/envs/box2d/lunar_lander.py
dc.relation	[34] F. Doshi-Velez and B. Kim, "Towards A Rigorous Science of Interpretable Machine Learning." arXiv, Mar. 02, 2017. doi: 10.48550/arXiv.1702.08608.
dc.relation	[35] G. Vilone and L. Longo, "Explainable Artificial Intelligence: a Systematic Review," arXiv, arXiv:2006.00093, Oct. 2020. doi: 10.48550/arXiv.2006.00093.
dc.relation	[36] S. Huang, N. Papernot, I. Goodfellow, Y. Duan, and P. Abbeel, "Adversarial Attacks on Neural Network Policies." arXiv, Feb. 07, 2017. doi: 10.48550/arXiv.1702.02284.
dc.relation	[37] G. Amir, M. Schapira, and G. Katz, "Towards Scalable Verification of Deep Reinforcement Learning." arXiv, Aug. 13, 2021. doi: 10.48550/arXiv.2105.11931.
dc.relation	[38] L. Wells and T. Bednarz, "Explainable AI and Reinforcement Learning".A Systematic Review of Current Approaches and Trends," Front. Artif. Intell., vol. 4, 2021, Accessed: Jun. 05, 2022. [Online]. Available: https://www.frontiersin.org/article/10.3389/frai.2021.550030
dc.relation	[39] Z. Juozapaitis, A. Koul, A. Fern, M. Erwig, and F. Doshi-Velez, "Explainable Reinforcement Learning via Reward Decomposition," 2019.
dc.relation	[40] M. T. Ribeiro, S. Singh, and C. Guestrin, "¿Why Should I Trust You?": Explaining the Predictions of Any Classifier. arXiv, Aug. 09, 2016. doi: 10.48550/arXiv.1602.04938.
dc.relation	[41] M. T. Ribeiro, S. Singh, and C. Guestrin, "Anchors: High-Precision Model-Agnostic Explanations," Proc. AAAI Conf. Artif. Intell., vol. 32, no. 1, Art. no. 1, Apr. 2018, doi: 10.1609/aaai.v32i1.11491.
dc.rights	Atribución-NoComercial 4.0 Internacional
dc.rights	Atribución-NoComercial 4.0 Internacional
dc.rights	http://creativecommons.org/licenses/by-nc/4.0/
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	http://purl.org/coar/access_right/c_abf2
dc.title	Formal robust explanations for deep reinforcement learning models
dc.type	Trabajo de grado - Maestría

Este ítem pertenece a la siguiente institución

Universidad de los Andes (Colombia)