documento de trabajo
Clustering binary data by application of combinatorial optimization heuristics
Fecha
2019-08-09Autor
Trejos Zelaya, Javier
Amaya Briceño, Luis Eduardo
Jiménez Romero, Alejandra
Murillo Fernández, Alex
Piza Volio, Eduardo
Villalobos Arias, Mario Alberto
Institución
Resumen
We study clustering methods for binary data, first defining aggregation criteria that measure the compactness of clusters. Five new and original methods are introduced, using neighborhoods and population behavior combinatorial optimization metaheuristics: first ones are simulated annealing, threshold accepting and tabu search, and the others are a genetic algorithm and ant colony optimization.
The methods are implemented, performing the proper calibration of parameters in the case of heuristics, to ensure good results. From a set of 16 data tables generated by a quasi-Monte Carlo experiment, a comparison is performed for one of the aggregations using L1 dissimilarity, with hierarchical clustering, and a version of k-means: partitioning around medoids or PAM.
Simulated annealing perform very well, especially compared to classical methods.