’HALITE IND. DS’: fast and scalable subspace clustering for multidimensional data streams

Silva, Afonso E. da; Sanches, Lucas L.; Fraideinberze, Antonio C.; Cordeiro, Robson Leonardo Ferreira

dc.creator	Silva, Afonso E. da
dc.creator	Sanches, Lucas L.
dc.creator	Fraideinberze, Antonio C.
dc.creator	Cordeiro, Robson Leonardo Ferreira
dc.date.accessioned	2016-10-19T21:48:32Z
dc.date.accessioned	2018-07-04T17:12:04Z
dc.date.available	2016-10-19T21:48:32Z
dc.date.available	2018-07-04T17:12:04Z
dc.date.created	2016-10-19T21:48:32Z
dc.date.issued	2016-05
dc.identifier	SIAM International Conference on Data Mining, XVI, 2016, Miami.
dc.identifier	9781611974348
dc.identifier	2167-0102
dc.identifier	http://www.producao.usp.br/handle/BDPI/51004
dc.identifier	http://dx.doi.org/10.1137/1.9781611974348.40
dc.identifier.uri	http://repositorioslatinoamericanos.uchile.cl/handle/2250/1646031
dc.description.abstract	Given a data stream with many attributes and high frequency of events, how to cluster similar events? Can it be done in real time? For example, how to cluster decades of frequent measurements of tens of climatic attributes to aid real time alert systems in forecasting extreme climatic events, such as oods and hurricanes? The task of clustering data with many attributes is known as subspace clustering. Today, there exists a need for algorithms of this type well-suited to process multidimensional data streams, for which real time processing is highly desirable. This paper proposes the new algorithm 'HALITE ind. ds' - a fast, scalable and highly accurate subspace clustering algorithm for multidimensional data streams. It improves upon an existing technique that was originally designed to process static (not streams) data. Our main contributions are: (1) Analysis of Data Streams: the new algorithm takes advantage of the knowledge obtained from clustering past data to easy clustering data in the present. This fact allows our 'HALITE IND. DS' to be considerably faster than its base algorithm, yet obtaining the same accuracy of results; (2) Real Time Processing: as opposed to the state-of-the-art, 'HALITE IND. DS' is fast and scalable, making it feasible to analyze streams with many attributes and high frequency of events in real time; (3) Experiments: we ran experiments using synthetic data and a real multidimensional stream with almost one century of climatic data. Our 'HALITE IND. DS' was up to 217 times faster than 5 representative works, i.e., its base algorithm plus 4 others from the state-of-the-art, always presenting highly accurate results.
dc.language	eng
dc.publisher	Society for Industrial and Applied Mathematics - SIAM
dc.publisher	Miami
dc.relation	SIAM International Conference on Data Mining, XVI
dc.rights	Copyright SIAM
dc.rights	closedAccess
dc.subject	subspace clustering
dc.subject	moderate-to-high dimensional data streams
dc.subject	real time processing
dc.subject	climatic streams
dc.title	’HALITE IND. DS’: fast and scalable subspace clustering for multidimensional data streams
dc.type	Actas de congresos

Este ítem pertenece a la siguiente institución

Universidade de São Paulo (Brasil)