Using the Web as corpus for self-training text categorization

RAFAEL GUZMAN CABRERA; MANUEL MONTES Y GOMEZ; Paolo ROSSO; LUIS VILLASEÑOR PINEDA

dc.creator	RAFAEL GUZMAN CABRERA
dc.creator	MANUEL MONTES Y GOMEZ
dc.creator	Paolo ROSSO
dc.creator	LUIS VILLASEÑOR PINEDA
dc.date	2009
dc.date.accessioned	2022-10-12T19:48:09Z
dc.date.available	2022-10-12T19:48:09Z
dc.identifier	http://inaoe.repositorioinstitucional.mx/jspui/handle/1009/1190
dc.identifier.uri	https://repositorioslatinoamericanos.uchile.cl/handle/2250/4122307
dc.description	Most current methods for automatic text categorization are based on supervised learning techniques and, therefore, they face the problem of requiring a great number of training instances to construct an accurate classifier. In order to tackle this problem, this paper proposes a new semi-supervised method for text categorization, which considers the automatic extraction of unlabeled examples from the Web and the application of an enriched self-training approach for the construction of the classifier. This method, even though language independent, is more pertinent for scenarios where large sets of labeled resources do not exist. That, for instance, could be the case of several application domains in different non-English languages such as Spanish. The experimental evaluation of the method was carried out in three different tasks and in two different languages. The achieved results demonstrate the applicability and usefulness of the proposed method.
dc.format	application/pdf
dc.language	eng
dc.publisher	Springer Science+Business Media
dc.relation	citation:Guzmán-Cabrera, R., et al., (2009). Using the Web as corpus for self-training text categorization, Springer Science Inf. Retrieval (12): 400–415
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	http://creativecommons.org/licenses/by-nc-nd/4.0
dc.subject	info:eu-repo/classification/Text categorization/Text categorization
dc.subject	info:eu-repo/classification/Semi-supervised learning/Semi-supervised learning
dc.subject	info:eu-repo/classification/Self-training/Self-training
dc.subject	info:eu-repo/classification/Web as corpus/Web as corpus
dc.subject	info:eu-repo/classification/Authorship attribution/Authorship attribution
dc.subject	info:eu-repo/classification/cti/1
dc.subject	info:eu-repo/classification/cti/12
dc.subject	info:eu-repo/classification/cti/1203
dc.title	Using the Web as corpus for self-training text categorization
dc.type	info:eu-repo/semantics/article
dc.type	info:eu-repo/semantics/acceptedVersion
dc.audience	students
dc.audience	researchers
dc.audience	generalPublic

Este ítem pertenece a la siguiente institución

Conacyt (México)

Using the Web as corpus for self-training text categorization

Este ítem pertenece a la siguiente institución

Ítems relacionados

Compendio de innovaciones socioambientales en la frontera sur de México ﻿

Caminar el cafetal: perspectivas socioambientales del café y su gente ﻿

Material de empaque para biofiltración con base en poliuretano modificado con almidón, metodos para la manufactura del mismo y sistema de biofiltración ﻿

Compendio de innovaciones socioambientales en la frontera sur de México

Caminar el cafetal: perspectivas socioambientales del café y su gente

Material de empaque para biofiltración con base en poliuretano modificado con almidón, metodos para la manufactura del mismo y sistema de biofiltración