Leveraging the partition selection bias to achieve a high-quality clustering of mass spectra

Silva, André R.F.; Lima, Diogo B.; Kurt, Louise U.; Dupré, Mathieu; Chamot-Rooke, Julia; Santos, Marlon D.M.; Nicolau, Carolina Alves; Valente, Richard Hemmi; Barbosa, Valmir C.; Carvalho, Paulo C.

dc.creator	Silva, André R.F.
dc.creator	Lima, Diogo B.
dc.creator	Kurt, Louise U.
dc.creator	Dupré, Mathieu
dc.creator	Chamot-Rooke, Julia
dc.creator	Santos, Marlon D.M.
dc.creator	Nicolau, Carolina Alves
dc.creator	Valente, Richard Hemmi
dc.creator	Barbosa, Valmir C.
dc.creator	Carvalho, Paulo C.
dc.date	2022-02-14T20:01:45Z
dc.date	2022-02-14T20:01:45Z
dc.date	2021
dc.date.accessioned	2023-09-26T20:16:04Z
dc.date.available	2023-09-26T20:16:04Z
dc.identifier	SILVA, André R. F. et al. Leveraging the partition selection bias to achieve a high-quality clustering of mass spectra. Journal of Proteomics, v. 245, 104282, p. 1 - 8, June 2021.
dc.identifier	1874-3919
dc.identifier	https://www.arca.fiocruz.br/handle/icict/51188
dc.identifier	10.1016/j.jprot.2021.104282
dc.identifier.uri	https://repositorioslatinoamericanos.uchile.cl/handle/2250/8852893
dc.description	In proteomics, the identification of peptides from mass spectral data can be mathematically described as the partitioning of mass spectra into clusters (i.e., groups of spectra derived from the same peptide). The way partitions are validated is just as important, having evolved side by side with the clustering algorithms themselves and given rise to many partition assessment measures. An assessment measure is said to have a selection bias if, and only if, the probability that a randomly chosen partition scoring a high value depends on the number of clusters in the partition. In the context of clustering mass spectra, this might mislead the validation process to favor clustering algorithms that generate too many (or few) spectral clusters, regardless of the underlying peptide sequence. A selection bias toward the number of peptides is desirable for proteomics as it estimates the number of peptides in a complex protein mixture. Here, we introduce an assessment measure that is purposely biased toward the number of peptide ion species. We also introduce a partition assessment framework for proteomics, called the Partition Assessment Tool, and demonstrate its importance by evaluating the performance of eight clustering algorithms on seven proteomics datasets while discussing the trade-offs involved. Significance: Clustering algorithms are widely adopted in proteomics for undertaking several tasks such as speeding up search engines, generating consensus mass spectra, and to aid in the classification of proteomic profiles. Choosing which algorithm is most fit for the task at hand is not simple as each algorithm has advantages and disadvantages; furthermore, specifying clustering parameters is also a necessary and fundamental step. For example, deciding on whether to generate “pure clusters” or fewer clusters but accepting noise. With this as motivation, we verify the performance of several widely adopted algorithms on proteomic datasets and introduce a theoretical framework for drawing conclusions on which approach is suitable for the task at hand.
dc.format	application/pdf
dc.language	eng
dc.publisher	Elsevier
dc.rights	open access
dc.subject	Agrupamento
dc.subject	Espectros de massa em tandem
dc.subject	Ferramenta de avaliação de partição
dc.subject	Clustering
dc.subject	Tandem mass spectra
dc.subject	Partition assessment tool
dc.title	Leveraging the partition selection bias to achieve a high-quality clustering of mass spectra
dc.type	Article

Este ítem pertenece a la siguiente institución

Instituto de Comunicação e Informação Científica e Tecnológica em Saúde (Brasil)