PepExplorer: A Similarity-driven Tool for Analyzing de Novo Sequencing Results

Leprevost, Felipe da Veiga; Valente, Richard Hemmi; Lima, Diogo Borges; Perales, Jonas; Melani, Rafael; Yates III, John R.; Barbosa, Valmir Carneiro; Junqueira, Magno; Carvalho, Paulo Costa

dc.creator	Leprevost, Felipe da Veiga
dc.creator	Valente, Richard Hemmi
dc.creator	Lima, Diogo Borges
dc.creator	Perales, Jonas
dc.creator	Melani, Rafael
dc.creator	Yates III, John R.
dc.creator	Barbosa, Valmir Carneiro
dc.creator	Junqueira, Magno
dc.creator	Carvalho, Paulo Costa
dc.date	2014-11-24T11:56:13Z
dc.date	2015-09-01T07:30:06Z
dc.date	2014
dc.date.accessioned	2023-09-26T23:17:55Z
dc.date.available	2023-09-26T23:17:55Z
dc.identifier	LEPREVOST, Felipe da Veiga et al. PepExplorer: a similarity-driven tool for analyzing de novo sequencing results. Molecular & Cellular Proteomics, v. 13, p. 2480-2489, 2014.
dc.identifier	1535-9476
dc.identifier	https://www.arca.fiocruz.br/handle/icict/8941
dc.identifier	10.1074/mcp.M113.037002
dc.identifier.uri	https://repositorioslatinoamericanos.uchile.cl/handle/2250/8889018
dc.description	Peptide spectrum matching is the current gold standard for protein identification via mass-spectrometry-based proteomics. Peptide spectrum matching compares experimental mass spectra against theoretical spectra generated from a protein sequence database to perform identification, but protein sequences not present in a database cannot be identified unless their sequences are in part conserved. The alternative approach, de novo sequencing, can make it possible to infer a peptide sequence directly from a mass spectrum, but interpreting long lists of peptide sequences resulting from large-scale experiments is not trivial. With this as motivation, PepExplorer was developed to use rigorous pattern recognition to assemble a list of homologue proteins using de novo sequencing data coupled to sequence alignment to allow biological interpretation of the data. PepExplorer can read the output of various widely adopted de novo sequencing tools and converge to a list of proteins with a global false-discovery rate. To this end, it employs a radial basis function neural network that considers precursor charge states, de novo sequencing scores, peptide lengths, and alignment scores to select similar protein candidates, from a target-decoy database, usually obtained from phylogenetically related species. Alignments are performed using a modified Smith–Waterman algorithm tailored for the task at hand. We verified the effectiveness of our approach using a reference set of identifications generated by ProLuCID when searching for Pyrococcus furiosus mass spectra on the corresponding NCBI RefSeq database. We then modified the sequence database by swapping amino acids until ProLuCID was no longer capable of identifying any proteins. By searching the mass spectra using PepExplorer on the modified database, we were able to recover most of the identifications at a 1% false-discovery rate. Finally, we employed PepExplorer to disclose a comprehensive proteomic assessment of the Bothrops jararaca plasma, a known biological source of natural inhibitors of snake toxins. PepExplorer is integrated into the PatternLab for Proteomics environment, which makes available various tools for downstream data analysis, including resources for quantitative and differential proteomics.
dc.description	2015-08-31
dc.format	application/pdf
dc.language	eng
dc.publisher	Molecular & Cellular Proteomics
dc.rights	open access
dc.subject	PepExplorer
dc.subject	Proteomics
dc.subject	Peptide
dc.title	PepExplorer: A Similarity-driven Tool for Analyzing de Novo Sequencing Results
dc.type	Article

Este ítem pertenece a la siguiente institución

Instituto de Comunicação e Informação Científica e Tecnológica em Saúde (Brasil)