dc.creatorCardellino, Cristian
dc.creatorAlonso i Alemany, Laura
dc.date2017-09
dc.date2017
dc.date2018-04-03T16:27:36Z
dc.identifierhttp://sedici.unlp.edu.ar/handle/10915/65941
dc.identifierhttp://www.clei2017-46jaiio.sadio.org.ar/sites/default/files/Mem/ASAI/asai-05.pdf
dc.identifierissn:2451-7585
dc.descriptionThis work explores the use of word embeddings, also known as word vectors, trained on Spanish corpora, to use as features for Spanish verb sense disambiguation (VSD). This type of learning technique is named disjoint semisupervised learning [1]: an unsupervised algorithm is trained on unlabeled data separately as a first step, and then its results (i.e. the word embeddings) are fed to a supervised classifier. Throughout this paper we try to assert two hypothesis: (i) representations of training instances based on word embeddings improve the performance of supervised models for VSD, in contrast to more standard feature engineering techniques based on information taken from the training data; (ii) using word embeddings trained on a specific domain, in this case the same domain the labeled data is gathered from, has a positive impact on the model’s performance, when compared to general domain’s word embeddings. The performance of a model over the data is not only measured using standard metric techniques (e.g. accuracy or precision/recall) but also measuring the model tendency to overfit the available data by analyzing the learning curve. Measuring this overfitting tendency is important as there is a small amount of available data, thus we need to find models to generalize better the VSD problem. For the task we use SenSem [2], a corpus and lexicon of Spanish and Catalan disambiguated verbs, as our base resource for experimentation.
dc.descriptionSociedad Argentina de Informática e Investigación Operativa
dc.formatapplication/pdf
dc.format26-34
dc.languageen
dc.rightshttp://creativecommons.org/licenses/by-sa/4.0/
dc.rightsCreative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
dc.subjectCiencias Informáticas
dc.subjectword embeddings
dc.subjectdisjoint semisupervised learning
dc.subjectverb sense disambiguation
dc.titleDisjoint Semi-supervised Spanish Verb Sense Disambiguation Using Word Embeddings
dc.typeObjeto de conferencia
dc.typeObjeto de conferencia


Este ítem pertenece a la siguiente institución