dc.contributorJunca Peláez, Mauricio José
dc.contributorQuiroz Salazar, Adolfo José
dc.creatorPatrón Piñerez, Ana María
dc.date.accessioned2023-07-11T17:00:25Z
dc.date.accessioned2023-09-07T02:25:47Z
dc.date.available2023-07-11T17:00:25Z
dc.date.available2023-09-07T02:25:47Z
dc.date.created2023-07-11T17:00:25Z
dc.date.issued2023-06-06
dc.identifierhttp://hdl.handle.net/1992/68318
dc.identifierinstname:Universidad de los Andes
dc.identifierreponame:Repositorio Institucional Séneca
dc.identifierrepourl:https://repositorio.uniandes.edu.co/
dc.identifier.urihttps://repositorioslatinoamericanos.uchile.cl/handle/2250/8729313
dc.description.abstractEste trabajo estudia la evaluación de políticas en Aprendizaje Reforzado (RL) en escenarios de dimensión grande o con incertidumbre. En este caso, el valor de la política que se quiere evaluar se aproxima de manera lineal, y se desarrolla usando Aproximación Lineal Estocástica (LSA) con ruido Markoviano. Los métodos clásicos, Diferencias Temporales y Gradientes de Diferencias Temporales, son ineficientes al estimar la función valor. Por eso, se estudia la alternativa que ofrece el algoritmo de Online Bootstrap Inference, el cual promete ser una mejora a los métodos existentes.
dc.languagespa
dc.publisherUniversidad de los Andes
dc.publisherMatemáticas
dc.publisherFacultad de Ciencias
dc.publisherDepartamento de Matemáticas
dc.relationArfé, A. (2017). Stochastic approximation and martingale methods. Notas de clase.
dc.relationBach, F., Liu, Y., and Li, R. (2016). Statistical machine learning and convex optimization. Département d'Informatique de l'ENS (DI ENS).
dc.relationBercu, B. (2019a). Asymptotic behavior of stochastic algorithms with statistical applications. part i. University of Bordeaux. ETICS Annual Research School,Fréjus.
dc.relationBercu, B. (2019b). Asymptotic behavior of stochastic algorithms with statistical applications. part ii. University of Bordeaux. ETICS Annual Research School,Fréjus.
dc.relationBorkar, V. S. (2006). Stochastic approximation with controlled markov noise. Systems and Control Letters, 55(2):pp.139-145.
dc.relationBorkar, V. S. (2008). Stochastic approximation: A dynamical systems viewpoint. Cambridge University Press.Second Edition
dc.relationHaskell, W. B. (2018). Introduction to dynamic programming. National University of Singapore. ISE 6509: Theory and Algorithms for Dynamic Programming.
dc.relationKarmakar, P. (2020). Stochastic approximation with markov noise: Analysis and applications in reinforcement learning. CoRR, abs/2012.00805.
dc.relationKushner, H. and Yin, G. (2003). Stochastic approximation and recursive algorithms and applications. Springer New York.
dc.relationLevin, D., Peres, Y., andWilmer, E. L. (2017). Markov chains and mixing times. American Mathematical Society
dc.relationLiang, F. (2010). Trajectory averaging for stochastic approximation mcmc algorithms. The Annals of Statistics.Vol. 38, No. 5 (October 2010), pp. 2823- 2856.
dc.relationMaei, H. R. (2011). Gradient temporal-difference learning algorithms. University of Alberta. Department of Computing Science.
dc.relationNIHMS (2023). Sepsis. U.S. Department of Health and Human Services, National Institutes of Health
dc.relationOberst, M. and Sontag, D. (2019). Counterfactual offpolicy evaluation with gumbel-max structural causal models. Proceedings of the 36th International Conference on Machine Learning, pp. 4881-4890.
dc.relationRamprasad, P., Li, Y., Yang, Z., Wang, Z., Sun, W., and Cheng, G. (2022). Online bootrstrap inference for policy evaluation in reinforcement learning. Journal of the American Statistical Association, pp. 1-14.
dc.relationRobbins, H. and Monroe, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics. 22(3): pp. 400-407.
dc.relationSutton, R. and Barto, A. (2018). Reinforcement learning: An introduction. The MIT Press.
dc.relationXu, T., Wang, Z., Zhou, Y., and Liang, Y. (2020). Reanalysis of variance reduced temporal difference learning. International Conference on Learning Representations. CoRR, abs/2001.01898
dc.rightsAttribution-NoDerivatives 4.0 Internacional
dc.rightshttp://creativecommons.org/licenses/by-nd/4.0/
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rightshttp://purl.org/coar/access_right/c_abf2
dc.titleEvaluación de políticas bajo ruido Markoviano mediante el algoritmo de Online Bootstrap Inference
dc.typeTrabajo de grado - Pregrado


Este ítem pertenece a la siguiente institución