dc.creatorDuran, Magali Sanches
dc.creatorAvanço, Lucas Vinicius
dc.creatorNunes, Maria das Graças Volpe
dc.date.accessioned2016-01-06T18:19:02Z
dc.date.accessioned2018-07-04T17:06:20Z
dc.date.available2016-01-06T18:19:02Z
dc.date.available2018-07-04T17:06:20Z
dc.date.created2016-01-06T18:19:02Z
dc.date.issued2015-07
dc.identifierWorkshop on Noisy User-generated Text, 2015, Beijing.
dc.identifier9781941643693
dc.identifierhttp://www.producao.usp.br/handle/BDPI/49416
dc.identifierhttp://aclweb.org/anthology/W/W15/W15-4305.pdf
dc.identifier.urihttp://repositorioslatinoamericanos.uchile.cl/handle/2250/1644723
dc.description.abstractUser-generated contents (UGC) represent an important source of information for governments, companies, political candidates and consumers. However, most of the Natural Language Processing tools and techniques are developed from and for texts of standard language, and UGC is a type of text especially full of creativity and idiosyncrasies, which represents noise for NLP purposes. This paper presents UGCNormal, a lexicon-based tool for UGC normalization. It encompasses a tokenizer, a sentence segmentation tool, a phonetic-based speller and some lexicons, which were originated from a deep analysis of a corpus of product reviews in Brazilian Portuguese. The normalizer was evaluated in two different data sets and carried out from 31% to 89% of the appropriate corrections, depending on the type of text noise. The use of UGCNormal was also validated in a task of POS tagging, which improved from 91.35% to 93.15% in accuracy and in a task of opinion classification, which improved the average of F1-score measures (F1-score positive and F1-score negative) from 0.736 to 0.758.
dc.languageeng
dc.publisherAssociation for Computational Linguistics - ACL
dc.publisherBeijing
dc.relationWorkshop on Noisy User-generated Text
dc.rightsCopyright Association for Computational Linguistics
dc.rightsrestrictedAccess
dc.titleA normalizer for UGC in brazilian portuguese
dc.typeActas de congresos


Este ítem pertenece a la siguiente institución