dc.creator | Abi-Haidar, Alaa | |
dc.creator | Kaur, Jasleen | |
dc.creator | Maguitman, Ana Gabriela | |
dc.creator | Radivojac, Pedrag | |
dc.creator | Rechtsteiner, Andreas | |
dc.creator | Verspoor, Karin | |
dc.creator | Wang, Zhiping | |
dc.creator | Rocha, Luis | |
dc.date.accessioned | 2019-04-26T16:42:46Z | |
dc.date.accessioned | 2022-10-15T04:10:00Z | |
dc.date.available | 2019-04-26T16:42:46Z | |
dc.date.available | 2022-10-15T04:10:00Z | |
dc.date.created | 2019-04-26T16:42:46Z | |
dc.date.issued | 2008-09-01 | |
dc.identifier | Abi-Haidar, Alaa; Kaur, Jasleen; Maguitman, Ana Gabriela; Radivojac, Pedrag; Rechtsteiner, Andreas; et al.; Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks; BioMed Central; Genome Biology; 9; Supl. 2; 1-9-2008; S11-S30 | |
dc.identifier | 1474-760X | |
dc.identifier | http://hdl.handle.net/11336/75086 | |
dc.identifier | CONICET Digital | |
dc.identifier | CONICET | |
dc.identifier.uri | https://repositorioslatinoamericanos.uchile.cl/handle/2250/4343655 | |
dc.description.abstract | Background: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks. Results: Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. Conclusion: Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to similar problems. In full-text problems, the expansion of word features with word proximity networks is shown to be useful, although the need for some improvements is discussed. | |
dc.language | eng | |
dc.publisher | BioMed Central | |
dc.relation | info:eu-repo/semantics/altIdentifier/url/https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559982/ | |
dc.relation | info:eu-repo/semantics/altIdentifier/doi/http://dx.doi.org/10.1186/gb-2008-9-S2-S11 | |
dc.relation | info:eu-repo/semantics/altIdentifier/url/https://genomebiology.biomedcentral.com/articles/10.1186/gb-2008-9-s2-s11 | |
dc.rights | https://creativecommons.org/licenses/by/2.5/ar/ | |
dc.rights | info:eu-repo/semantics/openAccess | |
dc.subject | Support Vector Machine | |
dc.subject | Singular Value Decomposition | |
dc.subject | Word Pair | |
dc.subject | Singular Value Decomposition Method | |
dc.subject | Proximity Network | |
dc.title | Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks | |
dc.type | info:eu-repo/semantics/article | |
dc.type | info:ar-repo/semantics/artículo | |
dc.type | info:eu-repo/semantics/publishedVersion | |