dc.contributor | Takahashi Rodríguez, Silvia | |
dc.contributor | Takahashi Rodríguez, Silvia | |
dc.creator | Bayona Latorre, Andrés Leonardo | |
dc.date.accessioned | 2023-07-27T13:39:23Z | |
dc.date.accessioned | 2023-09-07T02:16:33Z | |
dc.date.available | 2023-07-27T13:39:23Z | |
dc.date.available | 2023-09-07T02:16:33Z | |
dc.date.created | 2023-07-27T13:39:23Z | |
dc.date.issued | 2023-07-25 | |
dc.identifier | http://hdl.handle.net/1992/68811 | |
dc.identifier | instname:Universidad de los Andes | |
dc.identifier | reponame:Repositorio Institucional Séneca | |
dc.identifier | repourl:https://repositorio.uniandes.edu.co/ | |
dc.identifier.uri | https://repositorioslatinoamericanos.uchile.cl/handle/2250/8729119 | |
dc.description.abstract | This document presents a comparative study of the Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) algorithms in the context of Multi-Agent Reinforcement Learning (MARL) using the Unity ML-Agents framework. The objective is to investigate the performance and adaptability of these algorithms in dynamic environments. A collaborative-competitive multi-agent problem is formulated in the context of a food-gathering task. The proposed solution includes a dynamic environment generator and reward-shaping training techniques. The results showcase the effectiveness of SAC and PPO in learning complex behaviors and strategies in the objective MARL task. Using dynamic environments and reward shaping enables the agents to exhibit intelligent and adaptive behaviors. This study highlights the potential of MARL algorithms in addressing real-world challenges and their suitability for training agents in dynamic environments with the Unity ML-Agents framework. | |
dc.language | eng | |
dc.publisher | Universidad de los Andes | |
dc.publisher | Ingeniería de Sistemas y Computación | |
dc.publisher | Facultad de Ingeniería | |
dc.publisher | Departamento de Ingeniería Sistemas y Computación | |
dc.relation | Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. Proceedings of the 26th Annual International Conference on Machine Learning, 41-48. https://doi.org/10.1145/1553374.1553380 | |
dc.relation | Busoniu, L., Babuska, R., & De Schutter, B. (2008). A Comprehensive Survey of Multiagent Reinforcement Learning. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(2), 156-172. https://doi.org/10.1109/TSMCC.2007.913919 | |
dc.relation | Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor (arXiv:1801.01290). arXiv. http://arxiv.org/abs/1801.01290 | |
dc.relation | Juliani, A., Berges, V.-P., Teng, E., Cohen, A., Harper, J., Elion, C., Goy, C., Gao, Y., Henry, H., Mattar, M., & Lange, D. (2020). Unity: A General Platform for Intelligent Agents (arXiv:1809.02627). arXiv. http://arxiv.org/abs/1809.02627 | |
dc.relation | Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J., & Muller, K.-R. (2021). Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications. Proceedings of the IEEE, 109(3), 247-278. https://doi.org/10.1109/JPROC.2021.3060483 | |
dc.relation | Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms (arXiv:1707.06347). arXiv. http://arxiv.org/abs/1707.06347 | |
dc.relation | Wong, A., Bäck, T., Kononova, A. V., & Plaat, A. (2023). Deep multiagent reinforcement learning: Challenges and directions. Artificial Intelligence Review, 56(6), 5023-5056. https://doi.org/10.1007/s10462-022-10299-x | |
dc.relation | U. T. (2022, December 14). ml-agents/Training-Configuration-File.md at develop · Unity-Technologies/ml-agents. GitHub. https://github.com/Unity-Technologies/ml-agents | |
dc.relation | Neumann, C, Duboscq, J, Dubuc, C, Ginting, A, Irwan, AM, Agil, M, Widdig, A and Engelhardt, A (2011). Assessing dominance hierarchies: validation and advantages of progressive evaluation with Elo-rating. Animal Behaviour, 82 (4). pp. 911-921. ISSN 0003-3472 | |
dc.relation | ABL. (2023, May 30). PPOvsSAC [Video]. YouTube. https://www.youtube.com/watch?v=ZtdtpRmoFSE | |
dc.relation | ABL. (2023a, May 30). PPOvsRandom [Video]. YouTube. https://www.youtube.com/watch?v=N-aRvKfYnpI | |
dc.relation | ABL. (2023c, May 30). SACvsRandom [Video]. YouTube. https://www.youtube.com/watch?v=744kTLEubK0 | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | |
dc.rights | https://repositorio.uniandes.edu.co/static/pdf/aceptacion_uso_es.pdf | |
dc.rights | info:eu-repo/semantics/openAccess | |
dc.rights | http://purl.org/coar/access_right/c_abf2 | |
dc.title | Comparative study of SAC and PPO in multi-agent reinforcement learning using unity ML-agents | |
dc.type | Trabajo de grado - Pregrado | |