On the effect of endpoints on dynamic time warping

Silva, Diego Furtado; Batista, Gustavo Enrique de Almeida Prado Alves; Keogh, Eamonn

Actas de congresos

Fecha

2016-08

Registro en:

SIGKDD Workshop on Mining and Learning from Time Series, II, 2016, San Francisco.

http://www.producao.usp.br/handle/BDPI/51206

http://www-bcf.usc.edu/~liu32/milets16/paper/MiLeTS_2016_paper_7.pdf

http://repositorioslatinoamericanos.uchile.cl/handle/2250/1646174

Autor

Silva, Diego Furtado

Batista, Gustavo Enrique de Almeida Prado Alves

Keogh, Eamonn

Institución

Universidade de São Paulo (Brasil)

Resumen

While there exist a plethora of classification algorithms for most data types, there is an increasing acceptance that the unique properties of time series mean that the combination of nearest neighbor classifiers and Dynamic Time Warping (DTW) is very competitive across a host of domains, from medicine to astronomy to environmental sensors. While there has been significant progress in improving the efficiency and effectiveness of DTW in recent years, in this work we demonstrate that an underappreciated issue can significantly degrade the accuracy of DTW in real-world deployments. This issue has probably escaped the attention of the very active time series research community because of its reliance on static highly contrived benchmark datasets, rather than real world dynamic datasets where the problem tends to manifest itself. In essence, the issue is that DTW’s eponymous invariance to warping is only true for the main “body” of the two time series being compared. However, for the “head” and “tail” of the time series, the DTW algorithm affords no warping invariance. The effect of this is that tiny differences at the beginning or end of the time series (which may be either consequential or simply the result of poor “cropping”) will tend to contribute disproportionally to the estimated similarity, producing incorrect classifications. In this work, we show that this effect is real, and reduces the performance of the algorithm. We further show that we can fix the issue with a subtle redesign of the DTW algorithm, and that we can learn an appropriate setting for the extra parameter we introduced. We further demonstrate that our generalization is amiable to all the optimizations that make DTW tractable for large datasets.

Materias

Time Series

Dynamic Time Warping

Similarity Measures

Mostrar el registro completo del ítem