Reducing the overfitting in the gROC curve estimation

Martínez-Camblor, Pablo; Díaz-Coto, Susana

Article

Registro en:

10.1007/s00180-023-01344-6

09434062

https://hdl.handle.net/20.500.12728/11407

https://repositorioslatinoamericanos.uchile.cl/handle/2250/9509329

Autor

Martínez-Camblor, Pablo

Díaz-Coto, Susana

Institución

Universidad Autónoma de Chile

Resumen

The generalized receiver-operating characteristic, gROC, curve considers the classification ability of diagnostic tests when both larger and lower values of the marker are associated with higher probabilities of being positive. Its empirical estimation implies to select the best classification subsets among those satisfying particular condition. Both strong and weak consistency have already been proved. However, using the same data for both to select the classification subsets and to calculate its gROC curve leads to an over-optimistic estimate of the real performance of the diagnostic criteria on future samples. In this work, the bias of the empirical gROC curve estimator is explored through Monte Carlo simulations. Besides, two cross-validation based algorithms are proposed for reducing the overfitting. The practical application of the proposed algorithms is illustrated through the analysis of a real-world dataset. Simulation results suggest that the empirical gROC curve estimator returns optimistic approximations, especially, in situations in which the diagnostic capacity of the marker is poor and the sample size is small. The new proposed algorithms improve the estimation of the actual diagnostic test accuracy, and get almost unbiased gAUCs in most of the considered scenarios. However, the cross-validation based algorithms reported larger L1-errors than the standard empirical estimators, and increment the computational cost of the procedures. As online supplementary material, this manuscript includes an R function which wraps up the implemented routines. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023.

Ministerio de Ciencia e Innovación, MICINN

Materias

Binary classification problem

Mostrar el registro completo del ítem