dc.creatorSilva, Afonso E. da
dc.creatorSanches, Lucas L.
dc.creatorFraideinberze, Antonio C.
dc.creatorCordeiro, Robson Leonardo Ferreira
dc.date.accessioned2016-10-19T21:48:32Z
dc.date.accessioned2018-07-04T17:12:04Z
dc.date.available2016-10-19T21:48:32Z
dc.date.available2018-07-04T17:12:04Z
dc.date.created2016-10-19T21:48:32Z
dc.date.issued2016-05
dc.identifierSIAM International Conference on Data Mining, XVI, 2016, Miami.
dc.identifier9781611974348
dc.identifier2167-0102
dc.identifierhttp://www.producao.usp.br/handle/BDPI/51004
dc.identifierhttp://dx.doi.org/10.1137/1.9781611974348.40
dc.identifier.urihttp://repositorioslatinoamericanos.uchile.cl/handle/2250/1646031
dc.description.abstractGiven a data stream with many attributes and high frequency of events, how to cluster similar events? Can it be done in real time? For example, how to cluster decades of frequent measurements of tens of climatic attributes to aid real time alert systems in forecasting extreme climatic events, such as oods and hurricanes? The task of clustering data with many attributes is known as subspace clustering. Today, there exists a need for algorithms of this type well-suited to process multidimensional data streams, for which real time processing is highly desirable. This paper proposes the new algorithm 'HALITE ind. ds' - a fast, scalable and highly accurate subspace clustering algorithm for multidimensional data streams. It improves upon an existing technique that was originally designed to process static (not streams) data. Our main contributions are: (1) Analysis of Data Streams: the new algorithm takes advantage of the knowledge obtained from clustering past data to easy clustering data in the present. This fact allows our 'HALITE IND. DS' to be considerably faster than its base algorithm, yet obtaining the same accuracy of results; (2) Real Time Processing: as opposed to the state-of-the-art, 'HALITE IND. DS' is fast and scalable, making it feasible to analyze streams with many attributes and high frequency of events in real time; (3) Experiments: we ran experiments using synthetic data and a real multidimensional stream with almost one century of climatic data. Our 'HALITE IND. DS' was up to 217 times faster than 5 representative works, i.e., its base algorithm plus 4 others from the state-of-the-art, always presenting highly accurate results.
dc.languageeng
dc.publisherSociety for Industrial and Applied Mathematics - SIAM
dc.publisherMiami
dc.relationSIAM International Conference on Data Mining, XVI
dc.rightsCopyright SIAM
dc.rightsclosedAccess
dc.subjectsubspace clustering
dc.subjectmoderate-to-high dimensional data streams
dc.subjectreal time processing
dc.subjectclimatic streams
dc.title’HALITE IND. DS’: fast and scalable subspace clustering for multidimensional data streams
dc.typeActas de congresos


Este ítem pertenece a la siguiente institución