dc.contributorSerrano C. J.E.
dc.contributorMartínez-Santos, Juan Carlos
dc.creatorPuertas E.
dc.creatorAlvarado‑Valencia, Jorge Andres
dc.creatorMoreno-Sandoval L.G.
dc.creatorPomares-Quimbaya A.
dc.date.accessioned2020-03-26T16:32:36Z
dc.date.available2020-03-26T16:32:36Z
dc.date.created2020-03-26T16:32:36Z
dc.date.issued2018
dc.identifierCommunications in Computer and Information Science; Vol. 885, pp. 150-161
dc.identifier9783319989976
dc.identifier18650929
dc.identifierhttps://hdl.handle.net/20.500.12585/8916
dc.identifier10.1007/978-3-319-98998-3_12
dc.identifierUniversidad Tecnológica de Bolívar
dc.identifierRepositorio UTB
dc.identifier57202285682
dc.identifier8738428200
dc.identifier57194828933
dc.identifier57203852380
dc.description.abstractA corpus is an indispensable linguistic resource for any application of natural language processing. Some corpora have been created manually or semi-automatically for a specific domain. In this paper, we present an automatic approach to generate corpus from digital information sources such as Wikipedia and web pages. The information extracted by Wikipedia is done by delimiting the domain, using a propagation algorithm to determine the categories associated with a domain region and a set of seeds to delimit the search. The information extracted from the web pages is carried out efficiently, determining the patterns associated with the structure of each page with the purpose of defining the quality of the extraction. © Springer Nature Switzerland AG 2018.
dc.languageeng
dc.publisherSpringer Verlag
dc.relation26 September 2018 through 28 September 2018
dc.rightshttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rightsinfo:eu-repo/semantics/restrictedAccess
dc.rightsAtribución-NoComercial 4.0 Internacional
dc.sourcehttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85054377708&doi=10.1007%2f978-3-319-98998-3_12&partnerID=40&md5=d8689ca7ab863965c5539711ded485c1
dc.source13th Colombian Conference on Computing, CCC 2018
dc.titleAn automatic approach to generate corpus in Spanish


Este ítem pertenece a la siguiente institución