dc.creatorAdriana Gabriela Ramírez de la Rosa
dc.creatorManuel Montes y Gómez
dc.creatorTHAMAR IVETTE SOLORIO MARTINEZ
dc.creatorLuis Villaseñor Pineda
dc.date2013
dc.date.accessioned2023-07-25T16:25:31Z
dc.date.available2023-07-25T16:25:31Z
dc.identifierhttp://inaoe.repositorioinstitucional.mx/jspui/handle/1009/2394
dc.identifier.urihttps://repositorioslatinoamericanos.uchile.cl/handle/2250/7807570
dc.descriptionDuring the last decades the Web has become the greatest repository of digital information. In order to organize all this information, several text categori- zation methods have been developed, achieving accurate results in most cases and in very different domains. Due to the recent usage of Internet as communication media, short texts such as news, tweets, blogs, and product reviews are more common every day. In this context, there are two main challenges; on the one hand, the length of these documents is short, and therefore, the word frequencies are not informative enough, making text categorization even more difficult than usual. On the other hand, topics are changing constantly at a fast rate, causing the lack of adequate amounts of training data. In order to deal with these two problems we consider a text classification method that is supported on the idea that similar documents may belong to the same category. Mainly, we propose a neighborhood consensus classification method that classifies documents by considering their own information as well as information about the category assigned to other similar documents from the same target collection. In particular, the short texts we used in our evaluation are news titles with an average of 8 words. Experimental results are encouraging; they indicate that leveraging information from similar documents helped to improve classification accuracy and that the proposed method is especially useful when labeled training resources are limited.
dc.formatapplication/pdf
dc.languageeng
dc.publisherSpringer Science+Business Media B.V.
dc.relationcitation:Rámirez, G., et al., (2013). A document is known by the company it keeps: neighborhood consensus for short text categorization, Language Resources & Evaluation, Vol. 2013 (47): 127–149
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rightshttp://creativecommons.org/licenses/by-nc-nd/4.0
dc.subjectinfo:eu-repo/classification/Short text categorization/Short text categorization
dc.subjectinfo:eu-repo/classification/Unlabeled information/Unlabeled information
dc.subjectinfo:eu-repo/classification/Prototype-based classification/Prototype-based classification
dc.subjectinfo:eu-repo/classification/News titles/News titles
dc.subjectinfo:eu-repo/classification/cti/1
dc.subjectinfo:eu-repo/classification/cti/12
dc.subjectinfo:eu-repo/classification/cti/1203
dc.subjectinfo:eu-repo/classification/cti/1203
dc.titleA document is known by the company it keeps: neighborhood consensus for short text categorization
dc.typeinfo:eu-repo/semantics/article
dc.typeinfo:eu-repo/semantics/acceptedVersion
dc.audiencestudents
dc.audienceresearchers
dc.audiencegeneralPublic


Este ítem pertenece a la siguiente institución