Artículos de revistas
Evolving clusters in gene-expression data
Registro en:
Information Sciences. Elsevier Science Inc, v. 176, n. 13, n. 1898, n. 1927, 2006.
0020-0255
WOS:000237857700005
10.1016/j.ins.2005.07.015
Autor
Hruschka, ER
Campello, RJGB
de Castro, LN
Institución
Resumen
Clustering is a useful. exploratory tool for gene-expression data. Although successful applications of clustering techniques have been reported in the literature, there is no method of choice in the gene-expression analysis community. Moreover, there are only a few works that deal with the problem of automatically estimating the number of clusters in bioinformatics datasets. Most clustering methods require the number k of clusters to be either specified in advance or selected a posteriori from a set of clustering solutions over a range of k. In both cases, the user has to select the number of clusters. This paper proposes improvements to a clustering genetic algorithm that is capable of automatically discovering an optimal number of clusters and its corresponding optimal partition based upon numeric criteria. The proposed improvements are mainly designed to enhance the efficiency of the original clustering genetic algorithm, resulting in two new clustering genetic algorithms and an evolutionary algorithm for clustering (EAC). The original clustering genetic algorithm and its modified versions are evaluated in several runs using six gene-expression datasets in which the right clusters are known a priori. The results illustrate that all the proposed algorithms perform well in gene-expression data, although statistical comparisons in terms of the computational efficiency of each algorithm point out that EAC outperforms the others. Statistical evidence also shows that EAC is able to outperform a traditional method based on multiple runs of k-means over a range of k. (C) 2005 Elsevier Inc. All rights reserved. 176 13 1898 1927