Classification on imbalanced data sets, taking advantage of errors to improve performance

Cervantes Canales, Jair; 101829; García Lamont, Farid; 216477; LOPEZ CHAU, ASDRUBAL; 100664; Cervantes Canales, Jair; García Lamont, Farid; LOPEZ CHAU, ASDRUBAL

dc.creator	Cervantes Canales, Jair; 101829
dc.creator	García Lamont, Farid; 216477
dc.creator	LOPEZ CHAU, ASDRUBAL; 100664
dc.creator	Cervantes Canales, Jair
dc.creator	García Lamont, Farid
dc.creator	LOPEZ CHAU, ASDRUBAL
dc.date	2016-05-11T16:14:29Z
dc.date	2016-05-11T16:14:29Z
dc.date	2015
dc.identifier	978-3-319-22052-9
dc.identifier	0302-9743
dc.identifier	http://hdl.handle.net/20.500.11799/41187
dc.description	Classification methods usually exhibit a poor performance when they are applied on imbalanced data sets. In order to overcome this problem, some algorithms have been proposed in the last decade. Most of them generate synthetic instances in order to balance data sets, regardless the classification algorithm. These methods work reasonably well in most cases; however, they tend to cause over-fitting. In this paper, we propose a method to face the imbalance problem. Our approach, which is very simple to implement, works in two phases; the first one detects instances that are difficult to predict correctly for classification methods. These instances are then categorized into “noisy” and “secure”, where the former refers to those instances whose most of their nearest neighbors belong to the opposite class. The second phase of our method, consists in generating a number of synthetic instances for each one of those that are difficult to predict correctly. After applying our method to data sets, the AUC area of classifiers is improved dramatically. We compare our method with others of the state-of-the-art, using more than 10 data sets.
dc.language	eng
dc.publisher	Springer
dc.relation	10.1007/978-3-319-22053-6_8;
dc.rights	openAccess
dc.rights	http://creativecommons.org/licenses/by-nc-nd/4.0
dc.subject	Imbalanced
dc.subject	Classification
dc.subject	Synthetic instances
dc.subject	INGENIERÍA Y TECNOLOGÍA
dc.title	Classification on imbalanced data sets, taking advantage of errors to improve performance
dc.type	Capítulos de libros
dc.type	Capítulos de libros

Este ítem pertenece a la siguiente institución

Universidad Autónoma del Estado de México