Text mining for history: first steps on building a large dataset

Higuchi, Suemi; Freitas, Cláudia; Claro, Bruno Cuconato; Alexandre, Rademaker

dc.contributor	Demais unidades::RPCA
dc.creator	Higuchi, Suemi
dc.creator	Freitas, Cláudia
dc.creator	Claro, Bruno Cuconato
dc.creator	Alexandre, Rademaker
dc.date.accessioned	2020-05-25T21:16:08Z
dc.date.accessioned	2022-11-03T20:18:18Z
dc.date.available	2020-05-25T21:16:08Z
dc.date.available	2022-11-03T20:18:18Z
dc.date.created	2020-05-25T21:16:08Z
dc.date.issued	2018
dc.identifier	https://hdl.handle.net/10438/29143
dc.identifier.uri	https://repositorioslatinoamericanos.uchile.cl/handle/2250/5035719
dc.description.abstract	This paper presents the initial efforts towards the creation of a new corpus on the history domain. Motivated by the historians’ need to interrogate a vast material - almost 9 million words - in a non-linear way, our approach privileges deep linguistic analysis on an encyclopaedic style data. In this context, the work presented here focuses on the preparation of the corpus, which is prior to the mining activity: the morphosyntactic annotation, the definition of semantic types for named entity (NE) and named entities relations relevant to the History domain. Taking advantage of the semantic nature of appositive structures, we manually analysed a sample of 1,049 sentences in order to verify its potential as additional semantic clues to be considered. The results show that we are on the right track.
dc.language	en_US
dc.subject	Digital humanities
dc.subject	Text mining
dc.subject	Corpus annotation
dc.subject	Appositives
dc.title	Text mining for history: first steps on building a large dataset
dc.type	Preprint

Este ítem pertenece a la siguiente institución

Fundação Getulio Vargas (Brasil)