Uma arquitetura para análise de agrupamentos sobre bases de dados distribuídas

Gorgônio, Flavius da Luz e

doctoralThesis

Fecha

2009-03-06

Registro en:

GORGÔNIO, Flavius da Luz e. Uma arquitetura para análise de agrupamentos sobre bases de dados distribuídas. 2009. 156f. Tese (Doutorado em Engenharia Elétrica e Computação) - Centro de Tecnologia, Universidade Federal do Rio Grande do Norte, Natal, 2009.

https://repositorio.ufrn.br/jspui/handle/123456789/28672

http://repositorioslatinoamericanos.uchile.cl/handle/2250/3954176

Autor

Gorgônio, Flavius da Luz e

Institución

Universidade Federal do Rio Grande do Norte (Brasil)

Resumen

Data mining can be defined as a set of techniques for knowledge extraction and search of useful and previously unknown patterns in large multidimensional databases. Clustering is the process of discovering data clusters within high-dimensional databases, based on similarities, with a minimal knowledge of their structure. Distributed data clustering is a recent approach to deal with distributed databases, since traditional clustering algorithms require centering all databases in a single dataset. Moreover, current privacy requirements in distributed databases demand algorithms with the ability to process clustering securely. Thus, an increasing need of methods to mining data stored in a distributed way has motivated the development of algorithms to analyze each database separately and to combine the partial results to get a final result. This thesis presents a framework for cluster analysis in distributed databases using traditional algorithms, as K-means and self-organizing maps. This approach reduces significantly the amount of data transferred between remote units and the central unit. The framework includes a strategy, based on vectorial quantization, that extract a representatives subset, in order to get partial views of the existing clusters in each horizontal and/or vertical partitions of the database. Later, the representatives of each local unit are sent to the central unit, which carry out a combination of the partial results applying a clustering algorithm over all representative subsets. The experimental results with different datasets show that the framework proposed obtains results very close and with effectiveness comparable to conventional data mining techniques, where all the databases are transferred to a central unit in the pre-processing stage.

Materias

Análise de agrupamentos distribuída

Comitês de agrupamento

K-médias

Mapas auto-organizáveis

Mostrar el registro completo del ítem