dc.creatorPagnuco, Inti Anabela
dc.creatorRevuelta, María Victoria
dc.creatorBondino, Hernán Gabriel
dc.creatorBrun, Marcel
dc.creatorTen Have, Arjen
dc.date.accessioned2020-03-26T17:58:17Z
dc.date.accessioned2022-10-15T00:18:03Z
dc.date.available2020-03-26T17:58:17Z
dc.date.available2022-10-15T00:18:03Z
dc.date.created2020-03-26T17:58:17Z
dc.date.issued2018-03
dc.identifierPagnuco, Inti Anabela; Revuelta, María Victoria; Bondino, Hernán Gabriel; Brun, Marcel; Ten Have, Arjen; HMMER cut-off threshold tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold; Public Library of Science; Plos One; 13; 3; 3-2018; 1-20
dc.identifier1932-6203
dc.identifierhttp://hdl.handle.net/11336/100949
dc.identifierCONICET Digital
dc.identifierCONICET
dc.identifier.urihttps://repositorioslatinoamericanos.uchile.cl/handle/2250/4323879
dc.description.abstractBackground: Protein superfamilies can be divided into subfamilies of proteins with different functional characteristics. Their sequences can be classified hierarchically, which is part of sequence function assignation. Typically, there are no clear subfamily hallmarks that would allow pattern-based function assignation by which this task is mostly achieved based on the similarity principle. This is hampered by the lack of a score cut-off that is both sensitive and specific. Results: HMMER Cut-off Threshold Tool (HMMERCTTER) adds a reliable cut-off threshold to the popular HMMER. Using a high quality superfamily phylogeny, it clusters a set of training sequences such that the cluster-specific HMMER profiles show cluster or subfamily member detection with 100% precision and recall (P&R), thereby generating a specific threshold as inclusion cut-off. Profiles and thresholds are then used as classifiers to screen a target dataset. Iterative inclusion of novel sequences to groups and the corresponding HMMER profiles results in high sensitivity while specificity is maintained by imposing 100% P&R self detection. In three presented case studies of protein superfamilies, classification of large datasets with 100% precision was achieved with over 95% recall. Limits and caveats are presented and explained. Conclusions: HMMERCTTER is a promising protein superfamily sequence classifier provided high quality training datasets are used. It provides a decision support system that aids in the difficult task of sequence function assignation in the twilight zone of sequence similarity. All relevant data and source codes are available from the Github repository at the following.
dc.languageeng
dc.publisherPublic Library of Science
dc.relationinfo:eu-repo/semantics/altIdentifier/url/https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0193757
dc.relationinfo:eu-repo/semantics/altIdentifier/doi/http://dx.doi.org/10.1371/journal.pone.0193757
dc.rightshttps://creativecommons.org/licenses/by/2.5/ar/
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectClustering
dc.subjectClassification
dc.subjectPhylogenomics
dc.subjectBioinformatics
dc.subjectFunction annotation
dc.titleHMMER cut-off threshold tool (HMMERCTTER): Supervised classification of superfamily protein sequences with a reliable cut-off threshold
dc.typeinfo:eu-repo/semantics/article
dc.typeinfo:ar-repo/semantics/artículo
dc.typeinfo:eu-repo/semantics/publishedVersion


Este ítem pertenece a la siguiente institución