info:eu-repo/semantics/article
Thousands of protein linear motif classes may still be undiscovered
Fecha
2021-05-03Registro en:
Bulavka, Denys; Aptekmann, Ariel; Méndez, Nicolás Agustín; Krick, Teresa Elena Genoveva; Sánchez Miguel, Ignacio Enrique; Thousands of protein linear motif classes may still be undiscovered; Public Library of Science; Plos One; 3-5-2021; 1-20
1932-6203
CONICET Digital
CONICET
Autor
Bulavka, Denys
Aptekmann, Ariel
Méndez, Nicolás Agustín
Krick, Teresa Elena Genoveva
Sánchez Miguel, Ignacio Enrique
Resumen
Linear motifs are short protein subsequences that mediate protein interactions. Hundreds of motif classes including thousands of motif instances are known. Our theory estimates how many motif classes remain undiscovered. As commonly done, we describe motif classes asregular expressions specifying motif length and the allowed amino acids at each motif position.We measure motif specificity for a pair of motif classes by quantifying how many motifdiscriminatingpositions prevent a protein subsequence from matching the two classes atonce. We derive theorems for the maximal number of motif classes that can simultaneouslymaintain a certain number of motif-discriminating positions between all pairs of classes inthe motif universe, for a given amino acid alphabet. We also calculate the fraction of all proteinsubsequences that would belong to a motif class if all potential motif classes came intoexistence. Naturally occurring pairs of motif classes present most often a single motif-discriminatingposition. This mild specificity maximizes the potential number of coexisting motifclasses, the expansion of the motif universe due to amino acid modifications and the fractionof amino acid sequences that code for a motif instance. As a result, thousands of linearmotif classes may remain undiscovered.