Árvore de predição semi-supervisionada para predição de localização subcelular de proteínas

Alcantara, Leonardo Utida

dc.contributor	Cerri, Ricardo
dc.contributor	http://lattes.cnpq.br/6266519868438512
dc.contributor	Velázquez, Isaac Triguero
dc.contributor	http://lattes.cnpq.br/7631837051398731
dc.creator	Alcantara, Leonardo Utida
dc.date.accessioned	2022-04-21T13:37:40Z
dc.date.accessioned	2022-10-10T21:39:37Z
dc.date.available	2022-04-21T13:37:40Z
dc.date.available	2022-10-10T21:39:37Z
dc.date.created	2022-04-21T13:37:40Z
dc.date.issued	2021-11-19
dc.identifier	ALCANTARA, Leonardo Utida. Árvore de predição semi-supervisionada para predição de localização subcelular de proteínas. 2021. Trabalho de Conclusão de Curso (Graduação em Engenharia de Computação) – Universidade Federal de São Carlos, São Carlos, 2021. Disponível em: https://repositorio.ufscar.br/handle/ufscar/15893.
dc.identifier	https://repositorio.ufscar.br/handle/ufscar/15893
dc.identifier.uri	http://repositorioslatinoamericanos.uchile.cl/handle/2250/4045941
dc.description.abstract	Protein subcellular localization is a really important classification task, because the location of proteins inside a cell is directly related to these protein’s functions. As there are a lot of proteins that reside at the same time in two or more locations in a cell or move between locations, usually supervised multi-label classification methods are designed to attack this problem. This approach is well-established in the literature; however, it presents some disadvantages such as: (i) the need for a large amount of labeled instances to train the classifier; (ii) this approach ignores the fact that unlabeled instances can provide valuable information for the classification; and (iii) there are a lot of areas in which unlabeled data is abundant but manually labelling an instance is too expensive and time-consuming. Semi-Supervised Learning (SSL) is a subfield of traditional machine learning, in which the learner tries to exploit both labeled and unlabeled data at the same time. Semi-Supervised Classification is a in a subcategory of SSL which uses the available unlabeled data to improve the classification prformance of a classification process that already uses labeled data. The main goal of this project was the develop a semi-supervised multi-label classifier able to use the abundant number of unlabeled proteins to improve the prediction of protein subcellular localization. The SSL algorithm developed in this work is based on the predictive clustering tree framework and it was constructed, tested and analysed in many SSL scenarios in order to test whether or not the classifier was able to use the unlabeled instances to help during the classification process in a set of Multi-Label protein subcellular localization datasets, from 3 different taxonomies: Viridiplantae, Virus and Fungi.
dc.language	por
dc.publisher	Universidade Federal de São Carlos
dc.publisher	UFSCar
dc.publisher	Câmpus São Carlos
dc.publisher	Engenharia de Computação - EC
dc.rights	http://creativecommons.org/licenses/by-nc-nd/3.0/br/
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Brazil
dc.subject	Aprendizado de máquina
dc.subject	Classificação
dc.subject	Bioinformática
dc.subject	Aprendizado de máquina multirrótulo
dc.subject	Aprendizado de máquina semi-supervisionado
dc.subject	Machine learning
dc.subject	Classification
dc.subject	Bioinformatics
dc.subject	Multi-label machine learning
dc.subject	Semi-supervised machine learning
dc.title	Árvore de predição semi-supervisionada para predição de localização subcelular de proteínas
dc.type	Otros

Este ítem pertenece a la siguiente institución

Universidade Federal de São Carlos (Brasil)