Actas de congresos
An Application Of Convex Optimization Concepts To Approximate Dynamic Programming
Registro en:
9781424420797
Proceedings Of The American Control Conference. , v. , n. , p. 4238 - 4243, 2008.
7431619
10.1109/ACC.2008.4587159
2-s2.0-52449115940
Autor
Arruda E.F.
Fragoso M.D.
Do Val J.B.R.
Institución
Resumen
This paper deals with approximate value iteration (AVI) algorithms applied to discounted dynamic (DP) programming problems. The so-called Bellman residual is shown to be convex in the Banach space of candidate solutions to the DP problem. This fact motivates the introduction of an AVI algorithm with local search that seeks an approximate solution in a lower dimensional space called approximation architecture. The optimality of a point in the approximation architecture is characterized by means of convex optimization concepts and necessary and sufficient conditions to global optimality are derived. To illustrate the method, two examples are presented which were previously explored in the literature. ©2008 AACC.
4238 4243 Arruda, E.F., do Val, J.B.R., Approximate dynamic programming based on expansive projections (2006) Proceedings of the 45th IEEE International Conference on Decision and Control, pp. 5537-5542. , San Diego Baird, L.C., Residual algorithms: Reinforcement learning with function approximation (1995) Proceedings of the 12th International Conference on Machine Learning, pp. 30-37. , Tahoe City CA Baird, L.C., Moore, A., Gradient descent for general reinforcement learning (1999) Advances in Neural Information Processing Systems, 11 Bazaraa, M.S., Sherali, H.D., Shetty, C.M., (1993) Nonlinear programming: Theory and algorithms, , John Wiley & Sons, New York, 2 edition Bellman, R., (1957) Dynamic programming, , Princeton University Press, Princeton, NJ Bertsekas, D.P., (1995) Dynamic programming and optimal control, , Athena Scientific, Belmont, 2 edition Bertsekas, D.P., Tsitsiklis, J.N., (1996) Neuro-dynamic programming, , Athena Scientific, Belmont Boyan, J.A., Moore, A.W., Generalization in reinforcement learning: Safely approximating the value function (1995) Advances in Neural Information Processing Systems 7, pp. 369-376. , G. Tesauro, D. Touretzky, and T. Leen, editors, MIT Press, Cambridge MA Gordon, G.J., Stable function approximation in dynamic programming (1995) Proceedings of the 12th International Conference on Machine Learning, pp. 261-268. , Tahoe City CA Koller, D., Parr, R., Policy iteration for factored MDPs (1998) Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pp. 2130-2155. , Stanford CA Lagoudakis, M.G., Parr, R., Least-squares policy iteration (2003) Journal of Machine Learning Research, 4, pp. 1107-1149 Puterman, M.L., (1994) Markov decision processes: Discrete stochastic dynamic programming, , John Wiley & Sons, New York Si, J., Barto, A., Powell, W., Wunsch, D., (2004) Handbook of learning and approximate dynamic programming, , John Wiley & Sons-IEEE Press, Piscataway NJ Sutton, R.S., Barto, A.G., (1998) Reinforcement learning: An introduction, , MIT Press, Cambridge Tesauro, G., Practical issues in temporal difference learning (1992) Machine Learning, 8 (3), pp. 257-277