Actas de congresos
Model selection for semi-supervised clustering
Fecha
2014-03Registro en:
International Conference on Extending Database Technology, 17, 2013, Athens.
9783893180653
Autor
Pourrajabi, Mojgan
Moulavi, Davoud
Campello, Ricardo José Gabrielli Barreto
Zimek, Arthur
Sander, Jörg
Goebel, Randy
Institución
Resumen
Although there is a large and growing literature that tackles the semi-supervised clustering problem (i.e., using some labeled objects or cluster-guiding constraints like \must-link" or \cannot-link"), the evaluation of semi-supervised clustering approaches has rarely been discussed. The application of cross-validation techniques, for example, is far from straightforward in the semi-supervised setting, yet the problems associated with evaluation have yet to be addressed. Here we
summarize these problems and provide a solution.
Furthermore, in order to demonstrate practical applicability of semi-supervised clustering methods, we provide a method for model selection in semi-supervised clustering based on this sound evaluation procedure. Our method allows the user to select, based on the available information
(labels or constraints), the most appropriate clustering model (e.g., number of clusters, density-parameters) for a given problem.