masterThesis
Feature extraction and selection analysis in biological sequence: a case study with metaheuristics and mathematical models
Fecha
2020-02-12Registro en:
BONIDIA, Robson Parmezan. Feature extraction and selection analysis in biological sequence: a case study with metaheuristics and mathematical models. 2020. Dissertação (Mestrado em Bioinformática) - Universidade Tecnológica Federal do Paraná, Cornélio Procópio, 2020.
Autor
Bonidia, Robson Parmezan
Resumen
The number of available biological sequences has increased in large amounts in past years, due to various genomic sequencing projects, creating a huge volume of data. Consequently, new computational methods are needed for the analysis and information extraction from these sequences. Machine learning methods have shown broad applicability in computational biology and bioinformatics. The application of machine learning methods has helped to extract relevant information from various biological datasets. However, there are still several challenging problems that motivate new algorithms and pipeline proposals. Therefore, this work proposes a generic machine learning pipeline for biological sequence analysis, following two main steps: (1) feature extraction and (2) feature selection. Essentially, we focus our work on the study of dimensionality reduction and feature extraction techniques, using metaheuristics and mathematical models. As a case study, we analyze Long Non-Coding RNA sequences. Moreover, we divided this dissertation into two parts, e.g., Experimental Test I (feature selection) and Experimental Test II (feature extraction). The experimental results indicated four main contributions: (1) A pipeline with five distinct metaheuristics, using a voting scheme and execution rounds, to the feature selection problem in biological sequences; (2) The metaheuristic efficiency, providing competitive classification performance; (3) A feature extraction pipeline using nine mathematical models and (4) its generalization and robustness for distinct biological sequence classification.