Fast unsupervised online drift detection using incremental Kolmogorov-Smirnov test

Reis, Denis dos; Flach, Peter; Matwin, Stan; Batista, Gustavo Enrique de Almeida Prado Alves

dc.creator	Reis, Denis dos
dc.creator	Flach, Peter
dc.creator	Matwin, Stan
dc.creator	Batista, Gustavo Enrique de Almeida Prado Alves
dc.date.accessioned	2016-10-20T17:33:47Z
dc.date.accessioned	2018-07-04T17:12:21Z
dc.date.available	2016-10-20T17:33:47Z
dc.date.available	2018-07-04T17:12:21Z
dc.date.created	2016-10-20T17:33:47Z
dc.date.issued	2016-08
dc.identifier	ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, XXII, 2016, San Francisco.
dc.identifier	9781450342322
dc.identifier	http://www.producao.usp.br/handle/BDPI/51024
dc.identifier	http://dx.doi.org/10.1145/2939672.2939836
dc.identifier.uri	http://repositorioslatinoamericanos.uchile.cl/handle/2250/1646097
dc.description.abstract	Data stream research has grown rapidly over the last decade. Two major features distinguish data stream from batch learning: stream data are generated on the y, possibly in a fast and variable rate; and the underlying data distribution can be non-stationary, leading to a phenomenon known as concept drift. Therefore, most of the research on data stream classification focuses on proposing efficient models that can adapt to concept drifts and maintain a stable performance over time. However, specifically for the classification task, the majority of such methods rely on the instantaneous availability of true labels for all already classified instances. This is a strong assumption that is rarely fulfilled in practical applications. Hence there is a clear need for efficient methods that can detect concept drifts in an unsupervised way. One possibility is the well-known Kolmogorov-Smirnov test, a statistical hypothesis test that checks whether two samples differ. This work has two main contributions. The first one is the Incremental Kolmogorov-Smirnov algorithm that allows performing the Kolmogorov-Smirnov hypothesis test instantly using two samples that change over time, where the change is an insertion and/or removal of an observation. Our algorithm employs a randomized tree and is able to perform the insertion and removal operations in O(logN) with high probability and calculate the Kolmogorov-Smirnov test in O(1), where N is the number of sample observations. This is a significant speed-up compared to the O(N logN) cost of the non-incremental implementation. The second contribution is the use of the Incremental Kolmogorov-Smirnov test to detect concept drifts without true labels. Classification algorithms adapted to use the test rely on a limited portion of those labels just to update the classification model after a concept drift is detected.
dc.language	eng
dc.publisher	Association for Computing Machinery - ACM
dc.publisher	San Francisco
dc.relation	ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, XXII
dc.rights	Copyright ACM
dc.rights	closedAccess
dc.subject	Kolmogorov-Smirnov
dc.subject	Data Stream
dc.subject	Concept Drift
dc.subject	Cartesian Tree
dc.subject	Treap
dc.subject	Lazy Propagation
dc.title	Fast unsupervised online drift detection using incremental Kolmogorov-Smirnov test
dc.type	Actas de congresos

Este ítem pertenece a la siguiente institución

Universidade de São Paulo (Brasil)