doctoralThesis
Uma metodologia para definição do número de grupos e do conjunto de centros iniciais para algoritmos particionais
Fecha
2021-02-05Registro en:
SILVA, Huliane Medeiros da. Uma metodologia para definição do número de grupos e do conjunto de centros iniciais para algoritmos particionais. 2021. 100f. Tese (Doutorado em Ciência da Computação) - Centro de Ciências Exatas e da Terra, Universidade Federal do Rio Grande do Norte, Natal, 2021.
Autor
Silva, Huliane Medeiros da
Resumen
Data clustering consists of grouping similar objects according to some characteristic. In
literature, there are several clustering algorithms, among which stands out the Fuzzy CMeans (FCM), one of the most discussed algorithms, being used in different applications.
Although it is a simple and easy to manipulate clustering method, the FCM requires as its
initial parameter the number of clusters. Usually, this information is unknown, beforehand
and this becomes a relevant problem in the data cluster analysis process. Moreover, the
design of the FCM algorithm strongly depends on the selection of the initial centers of
the clusters. In general, the selection of the initial set of centers is random, which may
compromise the performance of the FCM and, consequently, of the cluster analysis process.
In this context, this work proposes a new methodology to determine the number of clusters
and the set of initial centers of the partial algorithms, using the FCM algorithm and some
of its variants as a case study. The idea is to use a subset of the original data to define
the number of clusters and determine the set of initial centers through a method based
on mean type functions. With this new methodology, we intend to reduce the side effects
of the clusters definition phase, possibly speeding up the processing time and decreasing
the computational cost. To evaluate the proposed methodology, different cluster validation
indices will be used to evaluate the quality of the clusters obtained by the FCM algorithms
and some of its variants, when applied to different databases.