dc.contributor | Junca Peláez, Mauricio José | |
dc.contributor | Quiroz Salazar, Adolfo José | |
dc.creator | Patrón Piñerez, Ana María | |
dc.date.accessioned | 2023-07-11T17:00:25Z | |
dc.date.accessioned | 2023-09-07T02:25:47Z | |
dc.date.available | 2023-07-11T17:00:25Z | |
dc.date.available | 2023-09-07T02:25:47Z | |
dc.date.created | 2023-07-11T17:00:25Z | |
dc.date.issued | 2023-06-06 | |
dc.identifier | http://hdl.handle.net/1992/68318 | |
dc.identifier | instname:Universidad de los Andes | |
dc.identifier | reponame:Repositorio Institucional Séneca | |
dc.identifier | repourl:https://repositorio.uniandes.edu.co/ | |
dc.identifier.uri | https://repositorioslatinoamericanos.uchile.cl/handle/2250/8729313 | |
dc.description.abstract | Este trabajo estudia la evaluación de políticas en Aprendizaje Reforzado (RL) en escenarios de dimensión grande o con incertidumbre. En este caso, el valor de la política que se quiere evaluar se aproxima de manera lineal, y se desarrolla usando Aproximación Lineal Estocástica (LSA) con ruido Markoviano. Los métodos clásicos, Diferencias Temporales y Gradientes de Diferencias Temporales, son ineficientes al estimar la función valor. Por eso, se estudia la alternativa que ofrece el algoritmo de Online Bootstrap Inference, el cual promete ser una mejora a los métodos existentes. | |
dc.language | spa | |
dc.publisher | Universidad de los Andes | |
dc.publisher | Matemáticas | |
dc.publisher | Facultad de Ciencias | |
dc.publisher | Departamento de Matemáticas | |
dc.relation | Arfé, A. (2017). Stochastic approximation and martingale methods. Notas
de clase. | |
dc.relation | Bach, F., Liu, Y., and Li, R. (2016). Statistical machine learning and
convex optimization. Département d'Informatique de l'ENS (DI ENS). | |
dc.relation | Bercu, B. (2019a). Asymptotic behavior of stochastic algorithms with
statistical applications. part i. University of Bordeaux. ETICS Annual Research
School,Fréjus. | |
dc.relation | Bercu, B. (2019b). Asymptotic behavior of stochastic algorithms with
statistical applications. part ii. University of Bordeaux. ETICS Annual Research
School,Fréjus. | |
dc.relation | Borkar, V. S. (2006). Stochastic approximation with controlled markov
noise. Systems and Control Letters, 55(2):pp.139-145. | |
dc.relation | Borkar, V. S. (2008). Stochastic approximation: A dynamical systems
viewpoint. Cambridge University Press.Second Edition | |
dc.relation | Haskell, W. B. (2018). Introduction to dynamic programming. National
University of Singapore. ISE 6509: Theory and Algorithms for Dynamic
Programming. | |
dc.relation | Karmakar, P. (2020). Stochastic approximation with markov noise:
Analysis and applications in reinforcement learning. CoRR, abs/2012.00805. | |
dc.relation | Kushner, H. and Yin, G. (2003). Stochastic approximation
and recursive algorithms and applications. Springer New York. | |
dc.relation | Levin, D., Peres, Y., andWilmer, E. L. (2017). Markov chains and
mixing times. American Mathematical Society | |
dc.relation | Liang, F. (2010). Trajectory averaging for stochastic approximation
mcmc algorithms. The Annals of Statistics.Vol. 38, No. 5 (October 2010), pp. 2823-
2856. | |
dc.relation | Maei, H. R. (2011). Gradient temporal-difference learning algorithms.
University of Alberta. Department of Computing Science. | |
dc.relation | NIHMS (2023). Sepsis. U.S. Department of Health and Human Services,
National Institutes of Health | |
dc.relation | Oberst, M. and Sontag, D. (2019). Counterfactual offpolicy
evaluation with gumbel-max structural causal models. Proceedings of the
36th International Conference on Machine Learning, pp. 4881-4890. | |
dc.relation | Ramprasad, P., Li, Y., Yang, Z., Wang, Z., Sun, W., and
Cheng, G. (2022). Online bootrstrap inference for policy evaluation in reinforcement
learning. Journal of the American Statistical Association, pp. 1-14. | |
dc.relation | Robbins, H. and Monroe, S. (1951). A stochastic approximation
method. The Annals of Mathematical Statistics. 22(3): pp. 400-407. | |
dc.relation | Sutton, R. and Barto, A. (2018). Reinforcement learning:
An introduction. The MIT Press. | |
dc.relation | Xu, T., Wang, Z., Zhou, Y., and Liang, Y. (2020). Reanalysis of variance
reduced temporal difference learning. International Conference on Learning
Representations. CoRR, abs/2001.01898 | |
dc.rights | Attribution-NoDerivatives 4.0 Internacional | |
dc.rights | http://creativecommons.org/licenses/by-nd/4.0/ | |
dc.rights | info:eu-repo/semantics/openAccess | |
dc.rights | http://purl.org/coar/access_right/c_abf2 | |
dc.title | Evaluación de políticas bajo ruido Markoviano mediante el algoritmo de Online Bootstrap Inference | |
dc.type | Trabajo de grado - Pregrado | |