An Application Of Convex Optimization Concepts To Approximate Dynamic Programming

This paper deals with approximate value iteration (AVI) algorithms applied to discounted dynamic (DP) programming problems. The so-called Bellman residual is shown to be convex in the Banach space of candidate solutions to the DP problem. This fact motivates the introduction of an AVI algorithm with local search that seeks an approximate solution in a lower dimensional space called approximation architecture. The optimality of a point in the approximation architecture is characterized by means of convex optimization concepts and necessary and sufficient conditions to global optimality are derived. To illustrate the method, two examples are presented which were previously explored in the literature. ©2008 AACC.

4238

4243

Arruda, E.F., do Val, J.B.R., Approximate dynamic programming based on expansive projections (2006) Proceedings of the 45th IEEE International Conference on Decision and Control, pp. 5537-5542. , San Diego

Baird, L.C., Residual algorithms: Reinforcement learning with function approximation (1995) Proceedings of the 12th International Conference on Machine Learning, pp. 30-37. , Tahoe City CA

Baird, L.C., Moore, A., Gradient descent for general reinforcement learning (1999) Advances in Neural Information Processing Systems, 11

Bazaraa, M.S., Sherali, H.D., Shetty, C.M., (1993) Nonlinear programming: Theory and algorithms, , John Wiley & Sons, New York, 2 edition

Bellman, R., (1957) Dynamic programming, , Princeton University Press, Princeton, NJ

Bertsekas, D.P., (1995) Dynamic programming and optimal control, , Athena Scientific, Belmont, 2 edition

Bertsekas, D.P., Tsitsiklis, J.N., (1996) Neuro-dynamic programming, , Athena Scientific, Belmont

Boyan, J.A., Moore, A.W., Generalization in reinforcement learning: Safely approximating the value function (1995) Advances in Neural Information Processing Systems 7, pp. 369-376. , G. Tesauro, D. Touretzky, and T. Leen, editors, MIT Press, Cambridge MA

Gordon, G.J., Stable function approximation in dynamic programming (1995) Proceedings of the 12th International Conference on Machine Learning, pp. 261-268. , Tahoe City CA

Koller, D., Parr, R., Policy iteration for factored MDPs (1998) Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pp. 2130-2155. , Stanford CA

Lagoudakis, M.G., Parr, R., Least-squares policy iteration (2003) Journal of Machine Learning Research, 4, pp. 1107-1149

Puterman, M.L., (1994) Markov decision processes: Discrete stochastic dynamic programming, , John Wiley & Sons, New York

Si, J., Barto, A., Powell, W., Wunsch, D., (2004) Handbook of learning and approximate dynamic programming, , John Wiley & Sons-IEEE Press, Piscataway NJ

Sutton, R.S., Barto, A.G., (1998) Reinforcement learning: An introduction, , MIT Press, Cambridge

Tesauro, G., Practical issues in temporal difference learning (1992) Machine Learning, 8 (3), pp. 257-277

Materias

Mostrar el registro completo del ítem