Construcción de recursos de texto para la identificación automática de información clínica en narrativas no estructuradas

Báez, Pablo; Villena, Fabián; Zúñiga, Karen; Jones, Natalia; Fernández, Gustavo; Durán, Manuel; Dunstan Escudero, Jocelyn Mariel

dc.creator	Báez, Pablo
dc.creator	Villena, Fabián
dc.creator	Zúñiga, Karen
dc.creator	Jones, Natalia
dc.creator	Fernández, Gustavo
dc.creator	Durán, Manuel
dc.creator	Dunstan Escudero, Jocelyn Mariel
dc.date.accessioned	2022-05-03T16:35:55Z
dc.date.accessioned	2022-10-17T16:15:10Z
dc.date.available	2022-05-03T16:35:55Z
dc.date.available	2022-10-17T16:15:10Z
dc.date.created	2022-05-03T16:35:55Z
dc.date.issued	2021
dc.identifier	Rev Med Chile 2021; 149: 1014-1022
dc.identifier	0034-9887
dc.identifier	https://repositorio.uchile.cl/handle/2250/185227
dc.identifier.uri	https://repositorioslatinoamericanos.uchile.cl/handle/2250/4420696
dc.description.abstract	A significant proportion of the clinical record is in free text format, making it difficult to extract key information and make secondary use of patient data. Automatic detection of information within narratives initially requires humans, following specific protocols and rules, to identify medical entities of interest. Aim: To build a linguistic resource of annotated medical entities on texts produced in Chilean hospitals. Material and Methods: A clinical corpus was constructed using 150 referrals in public hospitals. Three annotators identified six medical entities: clinical findings, diagnoses, body parts, medications, abbreviations, and family members. An annotation scheme was designed, and an iterative approach to train the annotators was applied. The F1-Score metric was used to assess the progress of the annotator’s agreement during their training. Results: An average F1-Score of 0.73 was observed at the beginning of the project. After the training period, it increased to 0.87. Annotation of clinical findings and body parts showed significant discrepancy, while abbreviations, medications, and family members showed high agreement. Conclusions: A linguistic resource with annotated medical entities on texts produced in Chilean hospitals was built and made available, working with annotators related to medicine. The iterative annotation approach allowed us to improve performance metrics. The corpus and annotation protocols will be released to the research community.
dc.language	es
dc.publisher	Soc Medica Santiago
dc.rights	http://creativecommons.org/licenses/by-nc-nd/3.0/us/
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States
dc.source	Revista Médica de Chile
dc.subject	Data curation
dc.subject	Data mining
dc.subject	Medical informatics
dc.subject	Natural language processing
dc.subject	Supervised machine learning
dc.title	Construcción de recursos de texto para la identificación automática de información clínica en narrativas no estructuradas
dc.type	Artículos de revistas

Este ítem pertenece a la siguiente institución

Universidad de Chile