Evaluación de políticas bajo ruido Markoviano mediante el algoritmo de Online Bootstrap Inference

Patrón Piñerez, Ana María

dc.contributor	Junca Peláez, Mauricio José
dc.contributor	Quiroz Salazar, Adolfo José
dc.creator	Patrón Piñerez, Ana María
dc.date.accessioned	2023-07-11T17:00:25Z
dc.date.accessioned	2023-09-07T02:25:47Z
dc.date.available	2023-07-11T17:00:25Z
dc.date.available	2023-09-07T02:25:47Z
dc.date.created	2023-07-11T17:00:25Z
dc.date.issued	2023-06-06
dc.identifier	http://hdl.handle.net/1992/68318
dc.identifier	instname:Universidad de los Andes
dc.identifier	reponame:Repositorio Institucional Séneca
dc.identifier	repourl:https://repositorio.uniandes.edu.co/
dc.identifier.uri	https://repositorioslatinoamericanos.uchile.cl/handle/2250/8729313
dc.description.abstract	Este trabajo estudia la evaluación de políticas en Aprendizaje Reforzado (RL) en escenarios de dimensión grande o con incertidumbre. En este caso, el valor de la política que se quiere evaluar se aproxima de manera lineal, y se desarrolla usando Aproximación Lineal Estocástica (LSA) con ruido Markoviano. Los métodos clásicos, Diferencias Temporales y Gradientes de Diferencias Temporales, son ineficientes al estimar la función valor. Por eso, se estudia la alternativa que ofrece el algoritmo de Online Bootstrap Inference, el cual promete ser una mejora a los métodos existentes.
dc.language	spa
dc.publisher	Universidad de los Andes
dc.publisher	Matemáticas
dc.publisher	Facultad de Ciencias
dc.publisher	Departamento de Matemáticas
dc.relation	Arfé, A. (2017). Stochastic approximation and martingale methods. Notas de clase.
dc.relation	Bach, F., Liu, Y., and Li, R. (2016). Statistical machine learning and convex optimization. Département d'Informatique de l'ENS (DI ENS).
dc.relation	Bercu, B. (2019a). Asymptotic behavior of stochastic algorithms with statistical applications. part i. University of Bordeaux. ETICS Annual Research School,Fréjus.
dc.relation	Bercu, B. (2019b). Asymptotic behavior of stochastic algorithms with statistical applications. part ii. University of Bordeaux. ETICS Annual Research School,Fréjus.
dc.relation	Borkar, V. S. (2006). Stochastic approximation with controlled markov noise. Systems and Control Letters, 55(2):pp.139-145.
dc.relation	Borkar, V. S. (2008). Stochastic approximation: A dynamical systems viewpoint. Cambridge University Press.Second Edition
dc.relation	Haskell, W. B. (2018). Introduction to dynamic programming. National University of Singapore. ISE 6509: Theory and Algorithms for Dynamic Programming.
dc.relation	Karmakar, P. (2020). Stochastic approximation with markov noise: Analysis and applications in reinforcement learning. CoRR, abs/2012.00805.
dc.relation	Kushner, H. and Yin, G. (2003). Stochastic approximation and recursive algorithms and applications. Springer New York.
dc.relation	Levin, D., Peres, Y., andWilmer, E. L. (2017). Markov chains and mixing times. American Mathematical Society
dc.relation	Liang, F. (2010). Trajectory averaging for stochastic approximation mcmc algorithms. The Annals of Statistics.Vol. 38, No. 5 (October 2010), pp. 2823- 2856.
dc.relation	Maei, H. R. (2011). Gradient temporal-difference learning algorithms. University of Alberta. Department of Computing Science.
dc.relation	NIHMS (2023). Sepsis. U.S. Department of Health and Human Services, National Institutes of Health
dc.relation	Oberst, M. and Sontag, D. (2019). Counterfactual offpolicy evaluation with gumbel-max structural causal models. Proceedings of the 36th International Conference on Machine Learning, pp. 4881-4890.
dc.relation	Ramprasad, P., Li, Y., Yang, Z., Wang, Z., Sun, W., and Cheng, G. (2022). Online bootrstrap inference for policy evaluation in reinforcement learning. Journal of the American Statistical Association, pp. 1-14.
dc.relation	Robbins, H. and Monroe, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics. 22(3): pp. 400-407.
dc.relation	Sutton, R. and Barto, A. (2018). Reinforcement learning: An introduction. The MIT Press.
dc.relation	Xu, T., Wang, Z., Zhou, Y., and Liang, Y. (2020). Reanalysis of variance reduced temporal difference learning. International Conference on Learning Representations. CoRR, abs/2001.01898
dc.rights	Attribution-NoDerivatives 4.0 Internacional
dc.rights	http://creativecommons.org/licenses/by-nd/4.0/
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	http://purl.org/coar/access_right/c_abf2
dc.title	Evaluación de políticas bajo ruido Markoviano mediante el algoritmo de Online Bootstrap Inference
dc.type	Trabajo de grado - Pregrado

Este ítem pertenece a la siguiente institución

Universidad de los Andes (Colombia)