Evolutionary k-means for distributed data sets

Naldi, M. C.; Campello, Ricardo José Gabrielli Barreto

dc.creator	Naldi, M. C.
dc.creator	Campello, Ricardo José Gabrielli Barreto
dc.date.accessioned	2014-05-26T20:20:28Z
dc.date.accessioned	2018-07-04T16:48:22Z
dc.date.available	2014-05-26T20:20:28Z
dc.date.available	2018-07-04T16:48:22Z
dc.date.created	2014-05-26T20:20:28Z
dc.date.issued	2014-03-15
dc.identifier	Neurocomputing, Amsterdam, v.127, p.30-42, 2014
dc.identifier	http://www.producao.usp.br/handle/BDPI/45049
dc.identifier	10.1016/j.neucom.2013.05.046
dc.identifier	http://dx.doi.org/10.1016/j.neucom.2013.05.046
dc.identifier.uri	http://repositorioslatinoamericanos.uchile.cl/handle/2250/1640612
dc.description.abstract	One of the challenges for clustering resides in dealing with data distributed in separated repositories, because most clustering techniques require the data to be centralized. One of them, k-means, has been elected as one of the most influential data mining algorithms for being simple, scalable and easily modifiable to a variety of contexts and application domains. Although distributed versions of k-means have been proposed, the algorithm is still sensitive to the selection of the initial cluster prototypes and requires the number of clusters to be specified in advance. In this paper, we propose the use of evolutionary algorithms to overcome the k-means limitations and, at the same time, to deal with distributed data. Two different distribution approaches are adopted: the first obtains a final model identical to the centralized version of the clustering algorithm; the second generates and selects clusters for each distributed data subset and combines them afterwards. The algorithms are compared experimentally from two perspectives: the theoretical one, through asymptotic complexity analyses; and the experimental one, through a comparative evaluation of results obtained from a collection of experiments and statistical tests. The obtained results indicate which variant is more adequate for each application scenario.
dc.language	eng
dc.publisher	Elsevier
dc.publisher	Amsterdam
dc.relation	Neurocomputing
dc.rights	Copyright Elsevier B.V.
dc.rights	restrictedAccess
dc.subject	Distributed clustering
dc.subject	Evolutionary k-means
dc.subject	Distributed data mining
dc.title	Evolutionary k-means for distributed data sets
dc.type	Artículos de revistas

Este ítem pertenece a la siguiente institución

Universidade de São Paulo (Brasil)