Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks

Abi-Haidar, Alaa; Kaur, Jasleen; Maguitman, Ana Gabriela; Radivojac, Pedrag; Rechtsteiner, Andreas; Verspoor, Karin; Wang, Zhiping; Rocha, Luis

dc.creator	Abi-Haidar, Alaa
dc.creator	Kaur, Jasleen
dc.creator	Maguitman, Ana Gabriela
dc.creator	Radivojac, Pedrag
dc.creator	Rechtsteiner, Andreas
dc.creator	Verspoor, Karin
dc.creator	Wang, Zhiping
dc.creator	Rocha, Luis
dc.date.accessioned	2019-04-26T16:42:46Z
dc.date.accessioned	2022-10-15T04:10:00Z
dc.date.available	2019-04-26T16:42:46Z
dc.date.available	2022-10-15T04:10:00Z
dc.date.created	2019-04-26T16:42:46Z
dc.date.issued	2008-09-01
dc.identifier	Abi-Haidar, Alaa; Kaur, Jasleen; Maguitman, Ana Gabriela; Radivojac, Pedrag; Rechtsteiner, Andreas; et al.; Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks; BioMed Central; Genome Biology; 9; Supl. 2; 1-9-2008; S11-S30
dc.identifier	1474-760X
dc.identifier	http://hdl.handle.net/11336/75086
dc.identifier	CONICET Digital
dc.identifier	CONICET
dc.identifier.uri	https://repositorioslatinoamericanos.uchile.cl/handle/2250/4343655
dc.description.abstract	Background: We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (interaction article subtask [IAS]), discovery of protein pairs (interaction pair subtask [IPS]), and identification of text passages characterizing protein interaction (interaction sentences subtask [ISS]) in full-text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam detection techniques, as well as an uncertainty-based integration scheme. We also used a support vector machine and singular value decomposition on the same features for comparison purposes. Our approach to the full-text subtasks (protein pair and passage identification) includes a feature expansion method based on word proximity networks. Results: Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of measures of performance used in the challenge evaluation (accuracy, F-score, and area under the receiver operating characteristic curve). We also report on a web tool that we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full-text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. Conclusion: Our approach to abstract classification shows that a simple linear model, using relatively few features, can generalize and uncover the conceptual nature of protein-protein interactions from the bibliome. Because the novel approach is based on a rather lightweight linear model, it can easily be ported and applied to similar problems. In full-text problems, the expansion of word features with word proximity networks is shown to be useful, although the need for some improvements is discussed.
dc.language	eng
dc.publisher	BioMed Central
dc.relation	info:eu-repo/semantics/altIdentifier/url/https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2559982/
dc.relation	info:eu-repo/semantics/altIdentifier/doi/http://dx.doi.org/10.1186/gb-2008-9-S2-S11
dc.relation	info:eu-repo/semantics/altIdentifier/url/https://genomebiology.biomedcentral.com/articles/10.1186/gb-2008-9-s2-s11
dc.rights	https://creativecommons.org/licenses/by/2.5/ar/
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	Support Vector Machine
dc.subject	Singular Value Decomposition
dc.subject	Word Pair
dc.subject	Singular Value Decomposition Method
dc.subject	Proximity Network
dc.title	Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks
dc.type	info:eu-repo/semantics/article
dc.type	info:ar-repo/semantics/artículo
dc.type	info:eu-repo/semantics/publishedVersion

Este ítem pertenece a la siguiente institución

Consejo Nacional de Investigaciones Científicas y Tecnológicas (Argentina)