A Neural Architecture To Address Reinforcement Learning Problems

In this paper, the Reinforcement Learning problem is formulated equivalently to a Markov Decision Process. We address the solution of such problem using a novel Adaptive Dynamic Programming algorithm which is based on a Multilayer Perceptron Neural Network composed of a parameterized function approximator called Wire-Fitting. Extending such established model, this work makes use of concepts of eligibility to conceive faster learning algorithms. The advantage of the proposed approach is founded on the capability to handle continuous environments and to learn a better policy while following another. Simulation results involving the automatic control of an inverted pendulum are presented to indicate the effectiveness of the proposed algorithm. © 2011 IEEE.

2930

2935

International Neural Network Society (INNS),IEEE Computational Intelligence Society (CIS),National Science Foundation (NSF),Cognimem Technologies, Inc.,Univ. Cincinnati Coll. Eng. Appl. Sci.

Kaelbling, L., Littman, M., Moore, A., (1996) Reinforcement Learning: A Survey, , Arxiv preprint cs/9605103

Doya, K., Reinforcement learning in continuous time and space (2000) Neural Computation, 12 (1), pp. 219-245

Sutton, R., Barto, A., (1998) Reinforcement Learning, , MIT Press, ch. 7 - Eligibility Traces

Barto, A., Sutton, R., Watkins, C., (1989) Learning and Sequential Decision Making

Gaskett, C., Wettergreen, D., Zelinsky, A., Q-learning in continuous state and action spaces (1999) Advanced Topics in Artificial Intelligence, pp. 417-428

Baird, L., Klopf, A., Reinforcement learning with high-dimensional continuous actions (1993) US Air Force Technical Report WL-TR-93-1147, , Wright Laboratory, Wright-Patterson Air Force Base, OH

Busoniu, L., Babuska, R., De Schutter, B., Ernst, D., Approximate reinforcement learning: An overview (2011) Proceedings of the 2011 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-11)

Vapnik, V., (2000) The Nature of Statistical Learning Theory, , Springer Verlag

Puterman, M., (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming, , John Wiley & Sons, Inc. New York, NY, USA, ch. 2 - Model Formulation

Blackett, P., Operational research (1950) Operational Research Quarterly, 1 (1), pp. 3-6

Howard, R., (1960) Dynamic Programming and Markov Process, , MIT Press

Sutton, R., Barto, A., (1998) Reinforcement Learning, , MIT Press, ch. 4 - Dynamic Programming

Sutton, R., Barto, A., (1998) Reinforcement Learning, , MIT Press, ch. 5 - Monte Carlo Methods

Sutton, R., Learning to predict by the methods of temporal differences (1988) Machine Learning, 3 (1), pp. 9-44

Sutton, R., Barto, A., (1998) Reinforcement Learning, , MIT Press, ch. 6 - Temporal-Difference Learning

Peng, J., Williams, R., Incremental multi-step Q-learning (1996) Machine Learning, 22 (1), pp. 283-290

Barto, A., Sutton, R., Watkins, C., Learning and sequential decision making (1989) Learning and Computational Neuroscience

Materias

Mostrar el registro completo del ítem