Development of a type two diabetes predictive model for mexicans applying to electronic health records dataset retrieved from National Public Data (ENSANUT 2018)

Fregoso Aparicio, Luis Martín

dc.contributor	Noguez Monroy, Juana Julieta
dc.contributor	School of Engineering and Sciences
dc.contributor	Cantú Ortiz, Francisco Javier
dc.contributor	González Mendoza, Miguel
dc.contributor	García García, José Antonio
dc.contributor	Montesinos Silva, Luis Arturo
dc.contributor	Campus Estado de México
dc.contributor	puemcuervo
dc.creator	Fregoso Aparicio, Luis Martín
dc.date.accessioned	2022-05-31T17:22:58Z
dc.date.accessioned	2022-10-13T20:20:29Z
dc.date.available	2022-05-31T17:22:58Z
dc.date.available	2022-10-13T20:20:29Z
dc.date.created	2022-05-31T17:22:58Z
dc.date.issued	2021-12-02
dc.identifier	Fregoso Aparicio, L. M. (2021). Development of a type two diabetes predictive model for Mexicans applying to Electronic Health Records dataset retrieved from National Public Data (ENSANUT 2018) [Unpublished master's thesis]. Instituto Tecnológico y de Estudios Superiores de Monterrey.
dc.identifier	https://hdl.handle.net/11285/648435
dc.identifier	https://orcid.org/0000-0003-4986-5745
dc.identifier	962778
dc.identifier.uri	https://repositorioslatinoamericanos.uchile.cl/handle/2250/4211960
dc.description.abstract	Diabetes mellitus is a chronic and severe disease that occurs when the glucose levels in the blood rise above the limits because the body of the patient cannot produce insulin hormone or the amount is insufficient. Likewise, when the produced hormone is not able to be used efficiently. The American Diabetes Association establish to diagnosis Diabetes when the test of HbA1c is higher or equal to 6.5\%. Likewise, if basal fasting blood glucose (GB) is higher than 126 mg/dL or blood glucose 2 hours after an oral glucose tolerance test with 75 g of glucose (SOG) is greater or equal to 200 mg/dL. Type 2 diabetes (T2D), formerly known as adult-onset diabetes, is a form of diabetes characterized by high blood sugar, insulin resistance, and a relative lack of insulin. In Mexico, ten-point four percent of the population had diabetes in 2016, compared with 7\% of the population in 2006. In the past years, Machine Learning has been used to create a predictive model for the onset of type 2 diabetes, making it achievable to develop one for the Mexican population. The model should have the capacity to detect undiagnosed diabetics, applying a national public dataset of diabetes mellitus 2 in Mexico (ENSANUT 2018). The objective is to develop a predictive model of type 2 diabetes for Mexicans as a support tool helping primary care physicians make a timely diagnosis, preventing the onset of diabetes or its complications, detecting diabetes early with higher accuracy than the few Mexican models. A systematic review with 91 studies is performed to detect possible optimal machine learning techniques and features to create novel type 2 diabetes predictive models. Based on the PRISMA methodology combined with the methodology of Keele University and Durham University. The related work section results found that tree-type clusters of machine learning algorithms developed the best predictive models. There are five possible models Decision Tree, Random Forest, Gradient Boosting Tree, K-Nearest Neighborhood, and Logistic Regression to choose for classification diabetes. The database selected for the model is the National Health and Nutrition Survey (ENSANUT 2018), a tool that shows the general health and nutrition conditions of a representative sample of the population of Mexico. It is divided into several datasets joined by a unique ID created with values of their variables. The target (HEMGLICLASS) is a binary categorical variable which zero corresponds to a healthy person, and one is diabetic, and the complete database has 11639 samples and 55 attributes. After cleaning it and balancing the samples for diabetics and healthy, the final database has 21696 observations and 26 variables composed of the surveyed's categorization eating habits and their corresponding blood chemistry test values. Based on their metrics, after performing a model selection and optimization applying to the ENSANUT database, from the techniques described in the systematic review, Random Forest Classifier has the best metric for the prediction and could be interpreted it the physicians. The proposed model is a Random Forest with the default values with fifteen attributes from the original ENSANUT database. The attributes are related to the values of the testing blood measurements as the classical models and add new features like the intake of vegetables and fruits during the whole week as a protector or the enhancer in the case of an excessive intake of meat milky products or candies. Once the model was done, it was validated with the actual data to assure that the performance of the accuracy and AUC(ROC) keep higher than the 90 percent further other three metrics also are estimated. The results are accuracy: (0.90 $\pm$ 0.154), F1-Score: (0.86 $\pm$ 0.286) Precision: ( 0.94 $\pm$ 0.069), Sensitivity: (0.87 $\pm$ 0.294), and AUC(ROC): (0.92 $\pm$ 0.191). For proving the superior prediction capacity of the new model versus the Olimpia Arrellano-Campos model, equality of the means test with unknown variances is done with the T-student as estimator and p-value as the criterion to reject. The result is a p-value equal to 0.00572, demonstrating the improvement in the capacity of prediction by the model. Finally, the relevance of this model is the possibility to anticipate a diagnosis before the onset of symptoms, and even in the long term, anticipate the development of chronic complications. The model reflected this importance showing the complexity inherent to the detection of diabetes, generating a tool as simple as possible to support physicians in making a diagnosis. The ideal is to predict the onset before it is possible to call a pre-diabetic stage, but this model offers the possibility to generate a diagnosis near this stage.
dc.language	eng
dc.publisher	Instituto Tecnológico y de Estudios Superiores de Monterrey
dc.relation	versión publicada
dc.rights	http://creativecommons.org/licenses/by/4.0
dc.rights	openAccess
dc.title	Development of a type two diabetes predictive model for mexicans applying to electronic health records dataset retrieved from National Public Data (ENSANUT 2018)
dc.type	Tesis de Maestría / master Thesis

Este ítem pertenece a la siguiente institución

Instituto Tecnológico de Monterrey (México)