info:eu-repo/semantics/article
GeM-Pro: a tool for genome functional mining and microbial profiling
Fecha
2019-04Registro en:
Torres Manno, Mariano Alberto; Pizarro, María Dolores; Prunello, Marcos Miguel; Magni, Christian; Daurelio, Lucas Damian; et al.; GeM-Pro: a tool for genome functional mining and microbial profiling; Springer; Applied Microbiology and Biotechnology; 103; 4-2019; 3123-3134
0175-7598
CONICET Digital
CONICET
Autor
Torres Manno, Mariano Alberto
Pizarro, María Dolores
Prunello, Marcos Miguel
Magni, Christian
Daurelio, Lucas Damian
Espariz, Martin
Resumen
Gem-Pro is a new tool for gene mining and functional profiling of bacteria. It initially identifies homologous genes using BLAST and then applies three filtering steps to select orthologous gene pairs. The first one uses BLAST score values to identify trivial paralogs. The second filter uses the shared identity percentages of found trivial paralogs as internal witnesses of non-orthology to set orthology cutoff values. The third filtering step uses conditional probabilities of orthology and non-orthology to define new cutoffs and generate supportive information of orthology assignations. Additionally, a subsidiary tool, called q-GeM, was also developed to mine traits of interest using logistic regression (LR) or linear discriminant analysis (LDA) classifiers. q-GeM is more efficient in the use of computing resources than Gem-Pro but needs an initial classified set of homologous genes in order to train LR and LDA classifiers. Hence, q-GeM could be used to analyze new set of strains with available genome sequences, without the need to rerun a complete Gem-Pro analysis. Finally, Gem-Pro and q-GeM perform a synteny analysis to evaluate the integrity and genomic arrangement of specific pathways of interest to infer their presence. The tools were applied to more than 2 million homologous pairs encoded by Bacillus strains generating statistical supported predictions of trait contents. The different patterns of encoded traits of interest were successfully used to perform a descriptive bacterial profiling.