Actas de congresos
A Neural Architecture To Address Reinforcement Learning Problems
Registro en:
9781457710865
Proceedings Of The International Joint Conference On Neural Networks. , v. , n. , p. 2930 - 2935, 2011.
10.1109/IJCNN.2011.6033606
2-s2.0-80054769191
Autor
De Arruda R.L.S.
Von Zuben F.J.
Institución
Resumen
In this paper, the Reinforcement Learning problem is formulated equivalently to a Markov Decision Process. We address the solution of such problem using a novel Adaptive Dynamic Programming algorithm which is based on a Multilayer Perceptron Neural Network composed of a parameterized function approximator called Wire-Fitting. Extending such established model, this work makes use of concepts of eligibility to conceive faster learning algorithms. The advantage of the proposed approach is founded on the capability to handle continuous environments and to learn a better policy while following another. Simulation results involving the automatic control of an inverted pendulum are presented to indicate the effectiveness of the proposed algorithm. © 2011 IEEE.
2930 2935 International Neural Network Society (INNS),IEEE Computational Intelligence Society (CIS),National Science Foundation (NSF),Cognimem Technologies, Inc.,Univ. Cincinnati Coll. Eng. Appl. Sci. Kaelbling, L., Littman, M., Moore, A., (1996) Reinforcement Learning: A Survey, , Arxiv preprint cs/9605103 Doya, K., Reinforcement learning in continuous time and space (2000) Neural Computation, 12 (1), pp. 219-245 Sutton, R., Barto, A., (1998) Reinforcement Learning, , MIT Press, ch. 7 - Eligibility Traces Barto, A., Sutton, R., Watkins, C., (1989) Learning and Sequential Decision Making Gaskett, C., Wettergreen, D., Zelinsky, A., Q-learning in continuous state and action spaces (1999) Advanced Topics in Artificial Intelligence, pp. 417-428 Baird, L., Klopf, A., Reinforcement learning with high-dimensional continuous actions (1993) US Air Force Technical Report WL-TR-93-1147, , Wright Laboratory, Wright-Patterson Air Force Base, OH Busoniu, L., Babuska, R., De Schutter, B., Ernst, D., Approximate reinforcement learning: An overview (2011) Proceedings of the 2011 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-11) Vapnik, V., (2000) The Nature of Statistical Learning Theory, , Springer Verlag Puterman, M., (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming, , John Wiley & Sons, Inc. New York, NY, USA, ch. 2 - Model Formulation Blackett, P., Operational research (1950) Operational Research Quarterly, 1 (1), pp. 3-6 Howard, R., (1960) Dynamic Programming and Markov Process, , MIT Press Sutton, R., Barto, A., (1998) Reinforcement Learning, , MIT Press, ch. 4 - Dynamic Programming Sutton, R., Barto, A., (1998) Reinforcement Learning, , MIT Press, ch. 5 - Monte Carlo Methods Sutton, R., Learning to predict by the methods of temporal differences (1988) Machine Learning, 3 (1), pp. 9-44 Sutton, R., Barto, A., (1998) Reinforcement Learning, , MIT Press, ch. 6 - Temporal-Difference Learning Peng, J., Williams, R., Incremental multi-step Q-learning (1996) Machine Learning, 22 (1), pp. 283-290 Barto, A., Sutton, R., Watkins, C., Learning and sequential decision making (1989) Learning and Computational Neuroscience