Evaluación comparativa de modelos estadísticos y de aprendizaje automático para el pronóstico de series de tiempo

Mosquera Collazos , Cristian Camilo

dc.contributor	Toro Ocampo, Eliana Mirledy
dc.creator	Mosquera Collazos , Cristian Camilo
dc.date	2023-02-15T20:32:28Z
dc.date	2023-02-15T20:32:28Z
dc.date	2022
dc.date.accessioned	2023-06-05T15:18:27Z
dc.date.available	2023-06-05T15:18:27Z
dc.identifier	Universidad Tecnológica de Pereira
dc.identifier	Repositorio Institucional Universidad Tecnológica de Pereira
dc.identifier	https://repositorio.utp.edu.co/home
dc.identifier	https://hdl.handle.net/11059/14541
dc.identifier.uri	https://repositorioslatinoamericanos.uchile.cl/handle/2250/6645797
dc.description	En el trabajo se aborda un experimento mediante el cual se explora la idea de un aproximador universal de series de tiempo, comparando tres diferentes algoritmos de aprendizaje automático y dos estadísticos; Para esto primero se generan diferentes series de tiempo utilizando un algoritmo de caminata aleatoria con el fin de generar datos sintéticos que presentan distribuciones de probabilidad escogidas en la investigación. Con esos datos generados se analizan los resultados del pronóstico de los modelos a diferentes fronteras de tiempo. Al final se presentan tres casos de uso para los pronósticos de series de tiempo en la optimización de portafolios de inversión, en predecir el precio de la TRM y predecir el comportamiento de un índice económico.
dc.description	The paper deals with an experiment through which the idea of a universal approximator of time series, comparing three different machine learning algorithms and two statistics; For this, different time series are first generated using a random walk algorithm in order to generate synthetic data that present probability distributions chosen in the investigation. With these generated data, the forecast results of the models are analyzed at different time frontiers. At the end, three use cases are presented for time series forecasts in the optimization of investment portfolios, in predicting the price of the TRM and in predicting the behavior of an economic index.
dc.description	Pregrado
dc.description	Ingeniero(a) Industrial
dc.description	Indice general Introducción 11 Materias de Investigación 13 L´ımite o alcance 15 Problema de investigación 17 1. Planteamiento del problema . . . . . . . . . . . . . . . . . . . 18 2. Formulación del problema . . . . . . . . . . . . . . . . . . . . 18 3. Sistematización del problema . . . . . . . . . . . . . . . . . . 18 4. Beneficios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Justificación 20 Objetivos 23 5. Objetivo general . . . . . . . . . . . . . . . . . . . . . . . . . 23 6. Objetivos específicos . . . . . . . . . . . . . . . . . . . . . . . 23 Marcos de referencia 24 7. Marco te´orico . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 7.1. Pron´ostico de series de tiempo . . . . . . . . . . . . . . 26 7.2. Data Science y pron´ostico de series de tiempo . . . . . 29 7.3. Modelos estad´ısticos de pron´ostico . . . . . . . . . . . 29 7.4. Modelos de aprendizaje autom´atico . . . . . . . . . . . 32 7.5. Preprocesamiento de datos . . . . . . . . . . . . . . . . 41 7.6. Simulaci´on de series de tiempo utilizando el algoritmo Metropolis–Hastings . . . . . . . . . . . . . . . . . . . 42 7.7. Pruebas de bondad y ajuste . . . . . . . . . . . . . . . 43 8. Marco conceptual . . . . . . . . . . . . . . . . . . . . . . . . . 44 4 ´INDICE GENERAL 5 9. Marco espacial . . . . . . . . . . . . . . . . . . . . . . . . . . 45 10. Marco temporal . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Metodolog´ıa de la investigaci´on 47 11. Tipo de investigaci´on . . . . . . . . . . . . . . . . . . . . . . . 47 12. Dise˜no de la investigaci´on . . . . . . . . . . . . . . . . . . . . 47 Dise˜no del experimento 49 13. Generar muestras utilizando el algoritmo metropolis . . . . . . 51 14. Modelos Estad´ısticos y de Aprendizaje Autom´atico . . . . . . 56 14.1. M´etodo de la ventana deslizante . . . . . . . . . . . . . 56 14.2. Modelo ARIMA . . . . . . . . . . . . . . . . . . . . . . 58 14.3. Modelo Suavizamiento exponencial . . . . . . . . . . . 59 14.4. Modelo Perceptr´on Multicapa . . . . . . . . . . . . . . 60 14.5. Modelo kNN regresi´on . . . . . . . . . . . . . . . . . . 61 14.6. Modelo LSTM . . . . . . . . . . . . . . . . . . . . . . . 61 15. Preprocesamiento . . . . . . . . . . . . . . . . . . . . . . . . . 62 Resultados del experimento 63 16. ARIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 16.1. Resultado de los par´ametros ARIMA . . . . . . . . . . 63 16.2. ARIMA Resultados . . . . . . . . . . . . . . . . . . . . 65 17. Suavizamiento exponencial . . . . . . . . . . . . . . . . . . . . 71 17.1. Resultado de par´ametros Suavizamiento exponencial . 72 17.2. Resultados Suavizamiento Exponencial . . . . . . . . . 74 17.3. Complejidad computacional Suavizamiento Exponencial 76 18. kNN regresi´on . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 18.1. Resultados kNN regresi´on sin preprocesamiento . . . . 77 18.2. Resultados kNN con normalizaci´on . . . . . . . . . . . 78 18.3. Complejidad computacional kNN . . . . . . . . . . . . 82 18.4. Conclusiones kNN . . . . . . . . . . . . . . . . . . . . 83 19. MLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 19.1. Resultados MLP . . . . . . . . . . . . . . . . . . . . . 84 19.2. Complejidad computacional MLP . . . . . . . . . . . . 85 19.3. Conclusiones MLP . . . . . . . . . . . . . . . . . . . . 86 20. LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 20.1. LSTM Resultados . . . . . . . . . . . . . . . . . . . . . 86 20.2. Complejidad computacional LSTM . . . . . . . . . . . 87 ´INDICE GENERAL 6 20.3. Conclusiones LSTM . . . . . . . . . . . . . . . . . . . 88 21. Conclusiones del experimento . . . . . . . . . . . . . . . . . . 88 Casos de aplicaci´on 90 22. Producci´on de energ´ıa . . . . . . . . . . . . . . . . . . . . . . 90 23. Caso TRM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 24. Caso Portafolio de inversi´on . . . . . . . . . . . . . . . . . . . 100 A. Algoritmo Metropolis- KStest 106 B. Modelos Estad´ısticos 110 1. Ventana deslizante . . . . . . . . . . . . . . . . . . . . . . . . 110 2. ARIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 2.1. C´alculo de par´ametros . . . . . . . . . . . . . . . . . . 114 3. Suavizamiento Exponencial . . . . . . . . . . . . . . . . . . . 115 3.1. Algoritmo gen´etico . . . . . . . . . . . . . . . . . . . . 116 C. Modelos de Aprendizaje Autom´atico 118 1. kNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 2. MLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 3. LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 D. Caso de Aplicación 126 1. IPG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 2. TRM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 3. Portafolio de inversi´on . . . . . . . . . . . . . . . . . . . . . . 128
dc.format	138 Páginas
dc.format	application/pdf
dc.format	application/pdf
dc.language	spa
dc.publisher	Universidad Tecnológica de Pereira
dc.publisher	Facultad de Ciencias Empresariales
dc.publisher	Ingeniería Industrial
dc.relation	Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corra do, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Man´e, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., andh Ilya Sutskever, B. S., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Vi´egas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org.
dc.relation	Alaminos, A. (2005). Introducci´on a la sociolog´ıa matem´atica. A.F. Alaminos Chica, Santa Pola.
dc.relation	Armstrong, J. S. (1985). Long-Range Forecasting: From Crystal Ball to Com puter. Wiley-Interscience.
dc.relation	Azmi, N. ˙I. L. M. (2013). Parameters Estimation of Holt-Winters Smoot hing Method Using Genetic Algorithm. PhD thesis, Universiti Teknologi Malaysia
dc.relation	Bontempi, G., Taieb, S. B., and Borgne, L. (2013). Machine Learning Stra tegies for Time Series Forecasting. pages 62–77.
dc.relation	Cai, B., Meyer, R., and Perron, F. (2008). Metropolis–hastings algorithms with adaptive proposals. Statistics and Computing, 18(4):421–433.
dc.relation	Carlos Eduardo Vasco, G. G. O. O. Z. (2006). ESTANDARES B ´ ASICOS ´ DE COMPETENCIAS EN MATEM ´ ATICAS Potenciar el pensamiento ´ matem´atico: ¡un reto escolar!, volume 1. Primera ed edition.
dc.relation	Chase, R. (2009). Administraci´on de operaciones : producc´on y cadena de suministros. McGraw-Hill/Interamericana Editores, M´exico Bogota
dc.relation	Chatfield, C. (2005). Time-series forecasting. Significance, 2(3):131–133.
dc.relation	Chernoff, H. and Lehmann, E. L. (1954). The use of maximum likelihood estimates in χ 2 tests for goodness of fit. Ann. Math. Statist., 25(3):579–586.
dc.relation	Chib, S. and Greenberg, E. (1995). Understanding the metropolis-hastings algorithm. The American Statistician, 49(4):327.
dc.relation	de Myttenaere, A., Golden, B., Le Grand, B., and Rossi, F. (2016). Mean ab solute percentage error for regression models. Neurocomputing, 192:38–4
dc.relation	Flores, B. E. (1986). A Pragmatic View of Accuracy Measurement in Fore casting. 14(2):93–98
dc.relation	García, S., Ra, S., and Luengo, J. (2016). Big Data : Preprocesamiento. pages 17–23.
dc.relation	Gómez Fuentes, M. d. C. and Ojeda Cervantes, J. (2014). Introducci´on al An´alisis y al Dise˜no de Algoritmos. M´exico D.F., 1 edition.
dc.relation	Goodfellow, I. (2016). Deep learning. The MIT Press, Cambridge, Massa chusetts.
dc.relation	Han, J., Pei, J., and Kamber, M. (2011). Data Mining: Concepts and Tech niques. The Morgan Kaufmann Series in Data Management Systems. El sevier Science
dc.relation	Haroon, D. (2017). Python machine learning case studies : five case studies for the data scientist. Apress, Berkeley, CA.
dc.relation	Haykin, S. (1999). Neural networks : a comprehensive foundation. Prentice Hall, Upper Saddle River, N.J.
dc.relation	Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9:1735–80.
dc.relation	Hyndman, R. J. and Khandakar, Y. (2008). Automatic Time Series Forecas ting : The forecast Package for R. Journal of Statistical Software, 27(
dc.relation	Klimovsky, G. (1994). Desventuras del Conocimiento Cientifico. Ciencia y la Gente. A-Z Editora.
dc.relation	Kubat, M. (2015). An introduction to machine learning. Springer, Cham
dc.relation	Kurzak, L. et al. (2012). Importance of forecasting in enterprise management. Advanced Logistic systems, 6(1):173–182.
dc.relation	Liu, Q., Lee, J., and Jordan, M. (2016). A kernelized stein discrepancy for goodness-of-fit tests. In Balcan, M. F. and Weinberger, K. Q., editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 276–284, New York, New York, USA. PMLR
dc.relation	Makridakis, S. (1986). THE ART AND SCIENCE OF FORECASTING An Assessment and Future Directions. International Journal of Forecasting, 2:15–39.
dc.relation	Makridakis, S., Spiliotis, E., and Assimakopoulos, V. (2018). Statistical and machine learning forecasting methods: Concerns and ways forward. PLOS ONE, 13(3):1–26.
dc.relation	Markowitz, H. (1959). Portfolio selection.
dc.relation	Mart´ınez, F., Fr´ıas, M. P., P´erez, M. D., and Rivera, A. J. (2019). A met hodology for applying k-nearest neighbor to time series forecasting. Artif. Intell. Rev., 52(3):2019–2037.
dc.relation	Mauricio, J. A. (2007). An´alisis de Series Temporales. Universidad Complu tense de Madrid.
dc.relation	Mauricio, J. A. (2013). Introducci´on al An´alisis de Series Temporales. Uni versidad Complutense de Madrid, 1 e
dc.relation	McKinney, W. et al. (2010). Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference, volume 445, pages 51–56. Austin, TX
dc.relation	Mendenhall, W. (2010). Introducci´on a la probabilidad y estad´ıstica. 13e. Cengage Learning Editores S.A. de C.V, Mexico
dc.relation	Osorio-angarita, M. A. and Su´arez-Parra, A. B. (2013). Importancia de la probabilidad y la estad´ıstica en la formaci´on del Ingeniero Importance of the probability and the statistics in the formation of the engineer. (2):26– 37.
dc.relation	Parmezan, A. R. S., Souza, V. M., and Batista, G. E. (2019). Evaluation of statistical and machine learning models for time series prediction: Identif ying the state-of-the-art and the best conditions for the use of each model. Information Sciences, 484:302–337.
dc.relation	Perktold, J., Seabold, S., and Taylor, J. (2020). Time series analysis tsa
dc.relation	Radovilsky, Z., Hegde, V., Acharya, A., and Uma, U. (2018). Skills requi rements of business data analytics and data science jobs: A comparative analysis. Journal of Supply Chain and Operations Management, 16(1):82.
dc.relation	Rascha, S. and Mirjalili, V. (2017). Python Machine Learning: Aprendizaje autom´atico y aprendizaje profundo con Python, scikit-learn y TesnorFlow. Marcombo.
dc.relation	RIVADULLA, A. (2004). La filosof´ıa de la ciencia hoy. problemas y posicio nes. Perspectivas del pensamiento contempor´aneo, 2:109–163.
dc.relation	Rodriguez, J. L., Sandoval, D., and Pacheco, G. D. (2008). Deteccion de Outliers Aplicando Algoritmo de Optimizacion basado en el Apareo de Abejas.
dc.relation	Rose, D. (2016). Data Science: Create Teams That Ask the Right Questions and Deliver Real Value. Apress.
dc.relation	Seabold, S. and Perktold, J. (2010). statsmodels: Econometric and statistical modeling with python. In 9th Python in Science Conference.
dc.relation	Solano, H. (2006). Estad´ıstica inferencial. Uninorte, Barranquilla, Colombia.
dc.relation	Steele, B. (2016). Algorithms for data science. Springer, Cham, Switzerland.
dc.relation	Tofallis, C. (2015). A Better Measure of Relative Prediction Accuracy for Model Selection and Model Estimation. Journal of the Operational Re search Society, 66:1352–1362.
dc.relation	Vafaeipour, M., Rahbari, O., Rosen, M. A., Fazelpour, F., and Ansarirad, P. (2014). Application of sliding window technique for prediction of wind velocity time series. International Journal of Energy and Environmental Engineering, 5(2-3)
dc.relation	Valencia C´ardenas, M., Ram´ırez Agudelo, S., Tabares, J. F., and Vel´asquez Galvis, C. A. (2014). METODOS DE PRON ´ OSTICOS - CL ´ ASICOS Y ´ BAYESIANOS CON APLICACIONES. Number c. Universidad Nacional de Colombia Sede Medell´ın Facultad de Minas, Medell´ın
dc.relation	Van Der Walt, S., Colbert, S. C., and Varoquaux, G. (2011). The numpy array: a structure for efficient numerical computation. Computing in Scien ce & Engineering, 13(2):22.
dc.relation	Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cour napeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Jarrod Millman, K., Mayorov, N., Nel son, A. R. J., Jones, E., Kern, R., Larson, E., Carey, C., Polat, ˙I., Feng, Y., Moore, E. W., Vand erPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E. A., Harris, C. R., Archibald, A. M., Ribeiro, A. H., Pedregosa, F., van Mulbregt, P., and Contributors, S. . . (2020). SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272
dc.relation	Wackerly, D. (2008). Mathematical statistics with applications. Thomson Brooks/Cole, Belmont, CA.
dc.relation	Williams, T. A. (2015). Quantitative methods for business + cengagenow, 1 term access. South-Western, New York.
dc.relation	Yahmed, Y. B., Bakar, A. A., RazakHamdan, A., Ahmed, A., and Syed Ab dullah, S. M. (2015). Adaptive sliding window algorithm for weather data segmentation. Journal of Theoretical & Applied Information Technology, 80(2).
dc.relation	Zankova, E. (2016). High frequency financial time series prediction : machine learning approach. Master thesis in mathematics, University of Tromsø.
dc.relation	Zhang, S., Zhang, C., and Yang, Q. (2003). Data preparation for data mining. Applied Artificial Intelligence, 17(5-6):375–381
dc.rights	Manifiesto (Manifestamos) en este documento la voluntad de autorizar a la Biblioteca Jorge Roa Martínez de la Universidad Tecnológica de Pereira la publicación en el Repositorio institucional (http://biblioteca.utp.edu.co), la versión electrónica de la OBRA titulada: ________________________________________________________________________________________________ ________________________________________________________________________________________________ ________________________________________________________________________________________________ La Universidad Tecnológica de Pereira, entidad académica sin ánimo de lucro, queda por lo tanto facultada para ejercer plenamente la autorización anteriormente descrita en su actividad ordinaria de investigación, docencia y publicación. La autorización otorgada se ajusta a lo que establece la Ley 23 de 1982. Con todo, en mi (nuestra) condición de autor (es) me (nos) reservo (reservamos) los derechos morales de la OBRA antes citada con arreglo al artículo 30 de
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	http://purl.org/coar/access_right/c_abf2
dc.rights	Atribución-NoComercial-SinDerivadas 4.0 Internacional (CC BY-NC-ND 4.0)
dc.rights	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject	650 - Gerencia y servicios auxiliares::658 - Gerencia general
dc.subject	Análsis de series de tiempo
dc.subject	Pronostico del tiempo por estadistica
dc.subject	Analisis de varianza
dc.subject	Pronósticos de series de tiempo
dc.subject	Metropolis-Hastings Algorithm
dc.subject	Datos sintéticos
dc.title	Evaluación comparativa de modelos estadísticos y de aprendizaje automático para el pronóstico de series de tiempo
dc.type	Trabajo de grado - Pregrado
dc.type	http://purl.org/coar/resource_type/c_7a1f
dc.type	http://purl.org/coar/version/c_ab4af688f83e57aa
dc.type	Text
dc.type	info:eu-repo/semantics/bachelorThesis
dc.type	info:eu-repo/semantics/acceptedVersion

Este ítem pertenece a la siguiente institución

Universidad Tecnológica de Pereira (Colombia)