SMOTEMD: Un algoritmo de balanceo de datos mixtos para Big Data en R.

Morales Oñate, Víctor Hugo; Moreta, Luis; Morales Oñate, Bolívar

dc.creator	Morales Oñate, Víctor Hugo
dc.creator	Moreta, Luis
dc.creator	Morales Oñate, Bolívar
dc.date.accessioned	2021-09-03T13:33:44Z
dc.date.accessioned	2022-10-20T19:22:50Z
dc.date.available	2021-09-03T13:33:44Z
dc.date.available	2022-10-20T19:22:50Z
dc.date.created	2021-09-03T13:33:44Z
dc.date.issued	2020-04-24
dc.identifier	http://dspace.espoch.edu.ec/handle/123456789/14586
dc.identifier.uri	https://repositorioslatinoamericanos.uchile.cl/handle/2250/4590022
dc.description.abstract	Analyzing samples with unbalanced data is a challenge for those who should use them in terms of modeling. A context in which this happens is when the response variable is binary and one of its classes is very small in proportion to the total. For the modeling of binary variables, probability models such as logit or probit are usually used. However, these models present problems when the sample is not balanced and it is desired to elaborate the confusion matrix from which the predictive power of the model is evaluated. One technique that allows the observed data to be balanced is the SMOTE algorithm, which works with numerical data exclusively. This work is an extension of SMOTE such that it allows the use of mixed data (numerical and categorical). By using mixed data, this proposal also makes it possible to overcome the barrier of 65536 observations that the R software has when working with categorical data distances. Through a simulation study, it is possible to verify the benefits of the proposed algorithm: SMOTEMD for mixed data.
dc.language	spa
dc.publisher	Escuela Superior Politécnica de Chimborazo
dc.rights	https://creativecommons.org/licenses/by-nc-sa/3.0/ec/
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	SMOTE
dc.subject	CLASIFICACIÓN
dc.subject	MUESTRAS DESBALANCEADAS
dc.subject	CLASSIFICATION
dc.subject	UNBALANCED SAMPLES
dc.title	SMOTEMD: Un algoritmo de balanceo de datos mixtos para Big Data en R.
dc.type	Artículos de revistas

Este ítem pertenece a la siguiente institución

Escuela Superior Politécnica de Chimborazo (Ecuador)