dc.contributorDouglas Eduardo Valente Pires
dc.contributorhttp://lattes.cnpq.br/2675409574553301
dc.contributorhttp://lattes.cnpq.br/8989178759075946
dc.contributorGlaura da Conceição Franco
dc.contributorLaurence Rodrigues do Amaral
dc.contributorFabíola Souza Fernandes Pereira
dc.creatorPâmela Marinho Rezende
dc.date.accessioned2023-05-16T15:17:27Z
dc.date.accessioned2023-06-16T15:26:07Z
dc.date.available2023-05-16T15:17:27Z
dc.date.available2023-06-16T15:26:07Z
dc.date.created2023-05-16T15:17:27Z
dc.date.issued2022-07-25
dc.identifierhttp://hdl.handle.net/1843/53447
dc.identifier.urihttps://repositorioslatinoamericanos.uchile.cl/handle/2250/6678976
dc.description.abstractThe exponential growth in the generation and availability of biological data in recent decades has increased the importance of databases as a resource to guide innovation and the generation of new biological insights. The broad experimental characterization of these data is, in general, unfeasible, given their complexity and scale, which makes automatic data classification using Machine Learning an essential, faster, and cheaper alternative. Biological datasets are often hierarchical in nature, with varying degrees of complexity, imposing different challenges to train, test, and validate accurate and generalizable classification models. Although some approaches to classify hierarchical data have been proposed, no guidelines regarding their utility, applicability, and limitations have been explored or implemented. These include Local approaches considering the hierarchy, building models per level or node, and Global hierarchical classification, using a flat classification approach. To fill this gap, here we have systematically contrasted the performance of Local per Level and Local per Node approaches with a Global approach applied to two different hierarchical datasets: BioLiP and CATH. The results show how different components of hierarchical datasets, such as variation coefficient and prediction by depth can guide the choice of appropriate classification schemes. Finally, we provide guidelines to support this process when embarking on a hierarchical classification task, which will help optimize computational resources and predictive performance.
dc.publisherUniversidade Federal de Minas Gerais
dc.publisherBrasil
dc.publisherICB - INSTITUTO DE CIÊNCIAS BIOLOGICAS
dc.publisherPrograma de Pós-Graduação em Bioinformatica
dc.publisherUFMG
dc.rightshttp://creativecommons.org/licenses/by-nc-nd/3.0/pt/
dc.rightsAcesso Aberto
dc.subjectBase de dados biológica
dc.subjectHierarquia de classes
dc.subjectClassificação hierárquica
dc.subjectPredição de função de proteínas
dc.subjectClassificação estrutural de proteínas
dc.titleAvaliação de abordagens hierárquicas de aprendizado de máquina aplicadas a bancos de dados biológicos
dc.typeTese


Este ítem pertenece a la siguiente institución