An interpretable autoencoder for semi-supervised anomaly detection

MEDINA PEREZ, MIGUEL ANGEL; 388892; Aguilar, Diana Laura

Tesis de Maestría / master Thesis

Fecha

2021-10

Registro en:

Aguilar, D. L. (2021). An interpretable autoencoder for semi-supervised anomaly detection [Unpublished master's thesis]. Instituto Tecnológico y de Estudios Superiores de Monterrey. Recuperado de: https://hdl.handle.net/11285/650859

https://hdl.handle.net/11285/650859

https://orcid.org/0000-0003-2178-4193

1006864

https://repositorioslatinoamericanos.uchile.cl/handle/2250/7716124

Autor

MEDINA PEREZ, MIGUEL ANGEL; 388892

Aguilar, Diana Laura

Institución

Instituto Tecnológico de Monterrey (México)

Resumen

Anomaly detection is a continuing concern in the machine learning community. Within this framework, several attempts have been made to address this problem. Most research, nonethe- less, has only focused on accuracy and has not taken account of interpretability. When a model is interpretable, it can furnish the explanations behind its classification decisions. As reported by the literature, interpretability grows in importance when the application domain is high-stakes. As a result, people’s lives can be severely impacted. Moreover, this is the case of many application domains of anomaly detection. This dissertation seeks to account for it by proposing an interpretable autoencoder for semi-supervised anomaly detection. As far as is known, it is the first interpretable autoencoder based on decision trees to be used for this purpose. This study comprises an assessment of the performance of the proposal of this disserta- tion against other state-of-the-art one-class classifiers with two different classes of data: nomi- nal and numerical. There were 123 datasets, of whom 37 were nominal, whereas the rest were numerical. The nominal experimental framework included nine one-class classifiers, while the numerical experiments encompassed 13 state-of-the-art classifiers. Moreover, AUC was utilized as an evaluation criterion, and statistical tests were conducted to seek significance. The results of this research show that this proposal achieves competitive performance against its analogs from the literature working with nominal data. Furthermore, according to the statistical tests, there were no significant differences between the results of the proposal and the best-ranked benchmark classifier. Nevertheless, the findings also yield that the perfor- mance of the proposal in numerical data is not satisfactory as it was outperformed by several benchmark classifiers.

Materias

Mostrar el registro completo del ítem