Actas de congresos
ClusMAM: fast and effective unsupervised clustering of large complex datasets using metric access methods
Fecha
2016-04Registro en:
Symposium on Applied Computing, 31st, 2016, Pisa.
9781450337397
Autor
Souza, Jessica Andressa de
Cazzolato, Mirela Teixeira
Traina, Agma Juci Machado
Institución
Resumen
An efficient and effective clustering process is a core task of data mining analysis, and has become more important in the nowadays scenario of big data, where scalability is an issue. In this paper we present the ClusMAM method, which proposes a new strategy for clustering large complex datasets through metric access methods. ClusMAM aims at accelerating the process of relational partitional clustering by taking advantage of the inherent node separations of metric access methods. In comparison with other methods from the literature, ClusMAM is up to four orders of magnitude faster than the competitors maintaining clustering quality. Additionally, ClusMAM exploits the datasets to find compact and coherent clusters, suggesting the number of clusters k found in the data. The method was evaluated employing synthetic and real datasets, and the behavior of the method was consistent regarding the number of distance calculations and time required for the clustering process as well.