An automatic approach to generate corpus in Spanish

Puertas E.; Alvarado‑Valencia, Jorge Andres; Moreno-Sandoval L.G.; Pomares-Quimbaya A.

dc.contributor	Serrano C. J.E.
dc.contributor	Martínez-Santos, Juan Carlos
dc.creator	Puertas E.
dc.creator	Alvarado‑Valencia, Jorge Andres
dc.creator	Moreno-Sandoval L.G.
dc.creator	Pomares-Quimbaya A.
dc.date.accessioned	2020-03-26T16:32:36Z
dc.date.available	2020-03-26T16:32:36Z
dc.date.created	2020-03-26T16:32:36Z
dc.date.issued	2018
dc.identifier	Communications in Computer and Information Science; Vol. 885, pp. 150-161
dc.identifier	9783319989976
dc.identifier	18650929
dc.identifier	https://hdl.handle.net/20.500.12585/8916
dc.identifier	10.1007/978-3-319-98998-3_12
dc.identifier	Universidad Tecnológica de Bolívar
dc.identifier	Repositorio UTB
dc.identifier	57202285682
dc.identifier	8738428200
dc.identifier	57194828933
dc.identifier	57203852380
dc.description.abstract	A corpus is an indispensable linguistic resource for any application of natural language processing. Some corpora have been created manually or semi-automatically for a specific domain. In this paper, we present an automatic approach to generate corpus from digital information sources such as Wikipedia and web pages. The information extracted by Wikipedia is done by delimiting the domain, using a propagation algorithm to determine the categories associated with a domain region and a set of seeds to delimit the search. The information extracted from the web pages is carried out efficiently, determining the patterns associated with the structure of each page with the purpose of defining the quality of the extraction. © Springer Nature Switzerland AG 2018.
dc.language	eng
dc.publisher	Springer Verlag
dc.relation	26 September 2018 through 28 September 2018
dc.rights	http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights	info:eu-repo/semantics/restrictedAccess
dc.rights	Atribución-NoComercial 4.0 Internacional
dc.source	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85054377708&doi=10.1007%2f978-3-319-98998-3_12&partnerID=40&md5=d8689ca7ab863965c5539711ded485c1
dc.source	13th Colombian Conference on Computing, CCC 2018
dc.title	An automatic approach to generate corpus in Spanish

Este ítem pertenece a la siguiente institución

Universidad Tecnológica de Bolivar UTB (Colombia)