dc.contributorGeraldo Robson Mateus
dc.contributorhttp://lattes.cnpq.br/6289602045034353
dc.contributorAndré Carlos Ponce de Leon Ferreira de Carvalho
dc.contributorAdriano Alonso Veloso
dc.contributorCristiano Arbex Valle
dc.contributorDilson Lucas Pereira
dc.creatorJulio César Alves
dc.date.accessioned2021-10-30T19:47:03Z
dc.date.accessioned2022-10-03T22:21:50Z
dc.date.available2021-10-30T19:47:03Z
dc.date.available2022-10-03T22:21:50Z
dc.date.created2021-10-30T19:47:03Z
dc.date.issued2021-10-06
dc.identifierhttp://hdl.handle.net/1843/38570
dc.identifierhttps://orcid.org/0000-0002-4848-9453
dc.identifier.urihttp://repositorioslatinoamericanos.uchile.cl/handle/2250/3800350
dc.description.abstractDeep Reinforcement Learning (DRL) methods have been increasingly used in several areas of knowledge and, recently, this interest has also grown in the Optimization community. In this work, we apply and compare Policy Gradient methods in the problem of planning the production and distribution of products in a supply chain with multiple stages. Most of the previous works that use similar methods only consider serial supply chains or only two echelons, generally limiting the solution possibilities, and none of them consider stochastic lead times. We consider a chain with four echelons and two nodes per echelon, with uncertainties regarding seasonal demands from customers and lead times of production at suppliers and transport along the chain. To our knowledge, this work is the first to apply, in such chain configuration, DRL methods considering a centralized approach to the problem, in which all decisions are taken by a single agent based on the uncertain demands of the end customers. We propose a Markovian Decision Process (MDP) formulation and a Linear Programming (LP)model with uncertain parameters. The MDP formulation is adapted to obtain good results with the application of Policy Gradient methods. In the first phase, after an initial case study, we applied the Proximal Policy Optimization (PPO) algorithm in 17 experimental scenarios, considering seasonal and regular uncertain demands (with different levels of uncertainty) and constant and stochastic lead times. In this phase, an agent built from the solution of a Linear Programming model (given by considering expected demands and average lead times) is used as a baseline. In the second phase, we have compared five algorithms, Advantage Actor-Critic (A2C), Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO), SAC, and Twin Delayed DDPG (TD3), in 8 of the 17 previous scenarios, using statistical tools for proper comparison of the algorithms. The PPO and SAC algorithms had the best performance in the experiments, the first having a better execution time. Experimental results indicate that Policy Gradient methods, especially PPO, are suitable and competitive tools for the proposed problem. In the third phase, we started to work with a multi-product version of the problem, generalizing the MDP formulation and the LP model. Experiments were carried out with the PPO algorithm in 16 multi-product scenarios, considering two and three products and different cost and demand configurations. The results indicate that, as in the original problem, the PPO performs better than the baseline in scenarios with stochastic lead times.
dc.publisherUniversidade Federal de Minas Gerais
dc.publisherBrasil
dc.publisherPrograma de Pós-Graduação em Ciência da Computação
dc.publisherUFMG
dc.rightshttp://creativecommons.org/licenses/by-nc-nd/3.0/pt/
dc.rightsAcesso Aberto
dc.subjectCadeias de suprimentos multiestágio
dc.subjectTomada de decisão sequencial sob incerteza
dc.subjectAprendizado por reforço
dc.subjectAprendizado profundo
dc.subjectMétodos policy gradient
dc.titleAplicação e comparação de métodos policy gradient em problema de cadeias de suprimentos multiestágio com incertezas
dc.typeTese


Este ítem pertenece a la siguiente institución