Deep Learning para Classificação Hierárquica de Elementos Transponíveis
Nakano, Felipe Kenji
Transposable Elements (TEs) are DNA sequences that can change its location within a cell's genome. They contribute directly to the genetic variety of species. Besides, their transposition mechanisms can affect the functionality of genes. The correct identification and classification of TEs play a central role in comprehension of genomes. Generally, identification and classification of TEs are performed using tools that employs homology, by comparing a sequence to many sequences from a labeled TE database. Since the literature proposes hierarchical taxonomies to classify TEs according to classes and subclasses, this project aims to develop new classification methods employing Machine Learning (AM) and Artificial Neural Networks (RNA) trained using Deep Learning (DP) concepts. Deep Neural Networks have extend the state-of-art of many field of study, including bioinformatics. As the first step, DNA sequences labelled with previously identified TEs will be collected and mapped according to hierarchies provided by the literature. Next, Deep Learning's neural networks Restricted Booltzman Machine, Auto-encoders, MultiLayer Perceptrons and their stacked version were tested. With these datasets, different classification methods are proposed and compared with literature's methods. As contributions, two new strategies were proposed, nLLCPN (non-Leaf Local Classifier per Parent Node) and LCPNB (Classifier per Parent Node and Branch). Both of then adapt LCPN (Local Classifier per Parent Node) in order to allow classifications in inner nodes. Additionally, the deep neural networks presented superior or competitive results in most of the cases.