dc.contributorCardozo Álvarez, Nicolás
dc.contributorDusparic, Ivana
dc.contributorGauthier Umaña, Valerie Elisabeth
dc.contributorFLAG research lab
dc.creatorPatiño Sáenz, Michel Andrés
dc.date.accessioned2023-07-18T14:09:20Z
dc.date.accessioned2023-09-06T23:44:10Z
dc.date.available2023-07-18T14:09:20Z
dc.date.available2023-09-06T23:44:10Z
dc.date.created2023-07-18T14:09:20Z
dc.date.issued2023-07-10
dc.identifierhttp://hdl.handle.net/1992/68513
dc.identifierinstname:Universidad de los Andes
dc.identifierreponame:Repositorio Institucional Séneca
dc.identifierrepourl:https://repositorio.uniandes.edu.co/
dc.identifier.urihttps://repositorioslatinoamericanos.uchile.cl/handle/2250/8726784
dc.description.abstractDeep neural networks are black box models for which there is no established formal solution on how to interpret their behavior. Abductive explanations are formal explanations that entail an observation within a logical system and satisfy certain minimality criteria. These explanations have been known to be computed for deep neural networks with binary input features and for neural networks with continuous input features. It is not currently known if a "deletion" algorithm designed to compute abductive explanations could be modified and extended to reinforcement learning tasks with continuous input features. Here, we show evidence that explanations generated by this algorithm may be biased. The algorithm favors the inclusion of features deleted later in the execution, a so called "order effect". We proposed a solution on how to fix this problem and designed an elementary algorithm to compute robust, formal and "non-biased" explanations to deep reinforcement learning model predictions. Our results suggest that this bias may be present in other implementations of the deletion algorithm for machine learning models in general, including the ones that have discrete input features, affecting models with bigger input dimensions more strongly. In the future, new methods to compute abductive explanations or other types formal explanations should be explored for deep reinforcement learning and machine learning in general
dc.languageeng
dc.publisherUniversidad de los Andes
dc.publisherMaestría en Ingeniería de Sistemas y Computación
dc.publisherFacultad de Ingeniería
dc.publisherDepartamento de Ingeniería Sistemas y Computación
dc.relation[1] A. Ignatiev, N. Narodytska, and J. Marques-Silva, "Abduction-Based Explanations for Machine Learning Models," Nov. 2018, doi: 10.48550/arXiv.1811.10656.
dc.relation[2] C. Liu, T. Arnon, C. Lazarus, C. Strong, C. Barrett, and M. J. Kochenderfer, "Algorithms for Verifying Deep Neural Networks," Mar. 2019, doi: 10.48550/arXiv.1903.06758.
dc.relation[3] J. Marques-Silva and A. Ignatiev, "Delivering Trustworthy AI through Formal XAI," Proc. AAAI Conf. Artif. Intell., vol. 36, no. 11, Art. no. 11, Jun. 2022, doi: 10.1609/aaai.v36i11.21499.
dc.relation[4] A. Heuillet, F. Couthouis, and N. Díaz-Rodríguez, "Explainability in Deep Reinforcement Learning," Aug. 2020, doi: 10.48550/arXiv.2008.06693.
dc.relation[5] P. Linardatos, V. Papastefanopoulos, and S. Kotsiantis, "Explainable AI: A Review of Machine Learning Interpretability Methods," Entropy, vol. 23, no. 1, Art. no. 1, Jan. 2021, doi: 10.3390/e23010018.
dc.relation[6] A. Krajna, M. Brcic, T. Lipic, and J. Doncevic, "Explainability in reinforcement learning: perspective and position," Mar. 2022, doi: 10.48550/arXiv.2203.11547.
dc.relation[7] E. Puiutta and E. M. Veith, "Explainable Reinforcement Learning: A Survey," ArXiv200506247 Cs Stat, May 2020, Accessed: Jun. 26, 2021. [Online]. Available: http://arxiv.org/abs/2005.06247
dc.relation[8] A. Albarghouthi, "Introduction to Neural Network Verification," Sep. 2021, doi: 10.48550/arXiv.2109.10317.
dc.relation[9] E. La Malfa, A. Zbrzezny, R. Michelmore, N. Paoletti, and M. Kwiatkowska, "On Guaranteed Optimal Robust Explanations for NLP Models," May 2021, doi: 10.24963/366.
dc.relation[10] A. Ignatiev, N. Narodytska, N. Asher, and J. Marques-Silva, "On Relating ¿Why? and ¿Why Not?" Explanations." arXiv, Dec. 20, 2020. doi: 10.48550/arXiv.2012.11067.
dc.relation[11] T. Eiter and G. Gottlob, "The complexity of logic-based abduction," J. ACM, vol. 42, no. 1, pp. 3-42, Jan. 1995, doi: 10.1145/200836.200838.
dc.relation[12] A. Ignatiev, N. Narodytska, and J. Marques-Silva, "On Relating Explanations and Adversarial Examples," in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2019. Accessed: Dec. 12, 2022. [Online]. Available: https://papers.nips.cc/paper/2019/hash/7392ea4ca76ad2fb4c9c3b6a5c6e31e3-Abstract.html
dc.relation[13] ¿,¿-CROWN (alpha-beta-CROWN): A Fast and Scalable Neural Network Verifier using the Bound Propagation Framework. Verified Intelligence, Jun. 22, 2023. Accessed: Jun. 26, 2023. [Online]. Available: https://github.com/Verified-Intelligence/alpha-beta-CROWN
dc.relation[14] H. Zhang, T.-W. Weng, P.-Y. Chen, C.-J. Hsieh, and L. Daniel, "Efficient Neural Network Robustness Certification with General Activation Functions." arXiv, Nov. 02, 2018. doi: 10.48550/arXiv.1811.00866.
dc.relation[15] H. Salman, G. Yang, H. Zhang, C.-J. Hsieh, and P. Zhang, "A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks." arXiv, Jan. 09, 2020. doi: 10.48550/arXiv.1902.08722.
dc.relation[16] K. Xu et al., "Automatic Perturbation Analysis for Scalable Certified Robustness and Beyond." arXiv, Oct. 25, 2020. doi: 10.48550/arXiv.2002.12920.
dc.relation[17] K. Xu et al., "Fast and Complete: Enabling Complete Neural Network Verification with Rapid and Massively Parallel Incomplete Verifiers." arXiv, Mar. 16, 2021. doi: 10.48550/arXiv.2011.13824.
dc.relation[18] H. Zhang et al., "General Cutting Planes for Bound-Propagation-Based Neural Network Verification." arXiv, Dec. 04, 2022. doi: 10.48550/arXiv.2208.05740.
dc.relation[19] C. Brix, "vnncomp2022_benchmarks." Apr. 28, 2023. Accessed: Jun. 26, 2023. [Online]. Available: https://github.com/ChristopherBrix/vnncomp2022_benchmarks
dc.relation[20] M. N. Müller, C. Brix, S. Bak, C. Liu, and T. T. Johnson, "The Third International Verification of Neural Networks Competition (VNN-COMP 2022): Summary and Results." arXiv, Feb. 16, 2023. doi: 10.48550/arXiv.2212.10376.
dc.relation[21] OpenAI, "Gym: A toolkit for developing and comparing reinforcement learning algorithms." https://gym.openai.com (accessed Jul. 05, 2021).
dc.relation[22] U. J. Ravaioli, J. Cunningham, J. McCarroll, V. Gangal, K. Dunlap, and K. L. Hobbs, "Safe Reinforcement Learning Benchmark Environments for Aerospace Control Systems," in 2022 IEEE Aerospace Conference (AERO), Mar. 2022, pp. 1-20. doi: 10.1109/AERO53065.2022.9843750.
dc.relation[23] T. Freiesleben, "The Intriguing Relation Between Counterfactual Explanations and Adversarial Examples," Minds Mach., vol. 32, no. 1, pp. 77-109, Mar. 2022, doi: 10.1007/s11023-021-09580-9.
dc.relation[24] X. Ma et al., "Understanding Adversarial Attacks on Deep Learning Based Medical Image Analysis Systems," Pattern Recognit., vol. 110, p. 107332, Feb. 2021, doi: 10.1016/j.patcog.2020.107332.
dc.relation[25] A. Barto and S. Mahadevan, "Recent Advances in Hierarchical Reinforcement Learning," Discrete Event Dyn. Syst. Theory Appl., vol. 13, Dec. 2002, doi: 10.1023/A:1025696116075.
dc.relation[26] V. Mnih et al., "Playing Atari with Deep Reinforcement Learning." arXiv, Dec. 19, 2013. doi: 10.48550/arXiv.1312.5602.
dc.relation[27] M. Fischetti and J. Jo, "Deep neural networks and mixed integer linear optimization," Constraints, vol. 23, no. 3, pp. 296-309, Jul. 2018, doi: 10.1007/s10601-018-9285-6.
dc.relation[28] D. Shriver, S. Elbaum, M. Dwyer, A. Silva, and K. R. M. Leino, "DNNV: A Framework for Deep Neural Network Verification," Computer Aided Verification - 33rd International Conference, CAV 2021, Virtual Event, July 20-23, 2021, Proceedings, Part I. in Computer Aided Verification. Springer International Publishing, pp. 137-150, Jul. 2021. doi: 10.1007/978-3-030-81685-8_6.
dc.relation[29] S. Bak, C. Liu, and T. Johnson, "The Second International Verification of Neural Networks Competition (VNN-COMP 2021): Summary and Results." arXiv, Aug. 30, 2021. doi: 10.48550/arXiv.2109.00498.
dc.relation[30] "Cart Pole - Gym Documentation." https://www.gymlibrary.dev/environments/classic_control/cart_pole/ (accessed Jun. 26, 2023).
dc.relation[31] "Playing CartPole with the Actor-Critic method | TensorFlow Core," TensorFlow. https://www.tensorflow.org/tutorials/reinforcement_learning/actor_critic (accessed Jun. 26, 2023).
dc.relation[32] "Lunar Lander - Gym Documentation." https://www.gymlibrary.dev/environments/box2d/lunar_lander/ (accessed Jun. 26, 2023).
dc.relation[33] "openai/gym." OpenAI, Jun. 27, 2023. Accessed: Jun. 27, 2023. [Online]. Available: https://github.com/openai/gym/blob/dcd185843a62953e27c2d54dc8c2d647d604b635/gym/envs/box2d/lunar_lander.py
dc.relation[34] F. Doshi-Velez and B. Kim, "Towards A Rigorous Science of Interpretable Machine Learning." arXiv, Mar. 02, 2017. doi: 10.48550/arXiv.1702.08608.
dc.relation[35] G. Vilone and L. Longo, "Explainable Artificial Intelligence: a Systematic Review," arXiv, arXiv:2006.00093, Oct. 2020. doi: 10.48550/arXiv.2006.00093.
dc.relation[36] S. Huang, N. Papernot, I. Goodfellow, Y. Duan, and P. Abbeel, "Adversarial Attacks on Neural Network Policies." arXiv, Feb. 07, 2017. doi: 10.48550/arXiv.1702.02284.
dc.relation[37] G. Amir, M. Schapira, and G. Katz, "Towards Scalable Verification of Deep Reinforcement Learning." arXiv, Aug. 13, 2021. doi: 10.48550/arXiv.2105.11931.
dc.relation[38] L. Wells and T. Bednarz, "Explainable AI and Reinforcement Learning".A Systematic Review of Current Approaches and Trends," Front. Artif. Intell., vol. 4, 2021, Accessed: Jun. 05, 2022. [Online]. Available: https://www.frontiersin.org/article/10.3389/frai.2021.550030
dc.relation[39] Z. Juozapaitis, A. Koul, A. Fern, M. Erwig, and F. Doshi-Velez, "Explainable Reinforcement Learning via Reward Decomposition," 2019.
dc.relation[40] M. T. Ribeiro, S. Singh, and C. Guestrin, "¿Why Should I Trust You?": Explaining the Predictions of Any Classifier. arXiv, Aug. 09, 2016. doi: 10.48550/arXiv.1602.04938.
dc.relation[41] M. T. Ribeiro, S. Singh, and C. Guestrin, "Anchors: High-Precision Model-Agnostic Explanations," Proc. AAAI Conf. Artif. Intell., vol. 32, no. 1, Art. no. 1, Apr. 2018, doi: 10.1609/aaai.v32i1.11491.
dc.rightsAtribución-NoComercial 4.0 Internacional
dc.rightsAtribución-NoComercial 4.0 Internacional
dc.rightshttp://creativecommons.org/licenses/by-nc/4.0/
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rightshttp://purl.org/coar/access_right/c_abf2
dc.titleFormal robust explanations for deep reinforcement learning models
dc.typeTrabajo de grado - Maestría


Este ítem pertenece a la siguiente institución