On biclusters aggregation and its benefits for enumerative solutions = Agregação de biclusters e seus benefícios para soluções enumerativas

Oliveira, Saullo Haniell Galvão de, 1988-

Agregação de biclusters e seus benefícios para soluções enumerativas

dc.creator	Oliveira, Saullo Haniell Galvão de, 1988-
dc.date	2015
dc.date	2017-04-02T15:23:34Z
dc.date	2017-07-13T19:38:54Z
dc.date	2017-04-02T15:23:34Z
dc.date	2017-07-13T19:38:54Z
dc.date.accessioned	2018-03-29T03:47:15Z
dc.date.available	2018-03-29T03:47:15Z
dc.identifier	OLIVEIRA, Saullo Haniell Galvão de. On biclusters aggregation and its benefits for enumerative solutions = Agregação de biclusters e seus benefícios para soluções enumerativas. 2015. 494 p. Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação, Campinas, SP. Disponível em: <http://www.bibliotecadigital.unicamp.br/document/?code=000947203>. Acesso em: 2 abr. 2017.
dc.identifier	http://repositorio.unicamp.br/jspui/handle/REPOSIP/259072
dc.identifier.uri	http://repositorioslatinoamericanos.uchile.cl/handle/2250/1336274
dc.description	Orientador: Fernando José Von Zuben
dc.description	Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação
dc.description	Resumo: Biclusterização envolve a clusterização simultânea de objetos e seus atributos, definindo mo- delos locais de relacionamento entre os objetos e seus atributos. Assim como a clusterização, a biclusterização tem uma vasta gama de aplicações, desde suporte a sistemas de recomendação, até análise de dados de expressão gênica. Inicialmente, diversas heurísticas foram propostas para encontrar biclusters numa base de dados numérica. No entanto, tais heurísticas apresen- tam alguns inconvenientes, como não encontrar biclusters relevantes na base de dados e não maximizar o volume dos biclusters encontrados. Algoritmos enumerativos são uma proposta recente, especialmente no caso de bases numéricas, cuja solução é um conjunto de biclusters maximais e não redundantes. Contudo, a habilidade de enumerar biclusters trouxe mais um cenário desafiador: em bases de dados ruidosas, cada bicluster original se fragmenta em vá- rios outros biclusters com alto nível de sobreposição, o que impede uma análise direta dos resultados obtidos. Essa fragmentação irá ocorrer independente da definição escolhida de co- erência interna no bicluster, sendo mais relacionada com o próprio nível de ruído. Buscando reverter essa fragmentação, nesse trabalho propomos duas formas de agregação de biclusters a partir de resultados que apresentem alto grau de sobreposição: uma baseada na clusteriza- ção hierárquica com single linkage, e outra explorando diretamente a taxa de sobreposição dos biclusters. Em seguida, um passo de poda é executado para remover objetos ou atributos indesejados que podem ter sido incluídos como resultado da agregação. As duas propostas foram comparadas entre si e com o estado da arte, em diversos experimentos, incluindo bases de dados artificiais e reais. Essas duas novas formas de agregação não só reduziram significa- tivamente a quantidade de biclusters, essencialmente defragmentando os biclusters originais, mas também aumentaram consistentemente a qualidade da solução, medida em termos de precisão e recuperação, quando os biclusters são conhecidos previamente
dc.description	Abstract: Biclustering involves the simultaneous clustering of objects and their attributes, thus defin- ing local models for the two-way relationship of objects and attributes. Just like clustering, biclustering has a broad set of applications, ranging from an advanced support for recom- mender systems of practical relevance to a decisive role in data mining techniques devoted to gene expression data analysis. Initially, heuristics have been proposed to find biclusters, and their main drawbacks are the possibility of losing some existing biclusters and the inca- pability of maximizing the volume of the obtained biclusters. Recently efficient algorithms were conceived to enumerate all the biclusters, particularly in numerical datasets, so that they compose a complete set of maximal and non-redundant biclusters. However, the ability to enumerate biclusters revealed a challenging scenario: in noisy datasets, each true bicluster becomes highly fragmented and with a high degree of overlapping, thus preventing a direct analysis of the obtained results. Fragmentation will happen no matter the boundary condi- tion adopted to specify the internal coherence of the valid biclusters, though the degree of fragmentation will be associated with the noise level. Aiming at reverting the fragmentation, we propose here two approaches for properly aggregating a set of biclusters exhibiting a high degree of overlapping: one based on single linkage and the other directly exploring the rate of overlapping. A pruning step is then employed to filter intruder objects and/or attributes that were added as a side effect of aggregation. Both proposals were compared with each other and also with the actual state-of-the-art in several experiments, including real and artificial datasets. The two newly-conceived aggregation mechanisms not only significantly reduced the number of biclusters, essentially defragmenting true biclusters, but also consistently in- creased the quality of the whole solution, measured in terms of Precision and Recall when the composition of the dataset is known a priori
dc.description	Mestrado
dc.description	Engenharia de Computação
dc.description	Mestre em Engenharia Elétrica
dc.format	494 p. : il.
dc.format	application/pdf
dc.language	Inglês
dc.publisher	[s.n.]
dc.subject	Aprendizado de máquina
dc.subject	Análise de Cluster
dc.subject	Mineração de dados (Computação)
dc.subject	Outliers (Estatística)
dc.subject	Problemas de enumeração combinatória
dc.subject	Machine learning
dc.subject	Cluster analysis
dc.subject	Data mining and knowledge discovery
dc.subject	Outliers (statistics)
dc.subject	Combinatorial enumeration problems
dc.title	On biclusters aggregation and its benefits for enumerative solutions = Agregação de biclusters e seus benefícios para soluções enumerativas
dc.title	Agregação de biclusters e seus benefícios para soluções enumerativas
dc.type	Tesis

Este ítem pertenece a la siguiente institución

Universidade Estadual de Campinas (Brasil)