MorphoMap: mapeamento automático de narrativas clínicas para uma terminologia médica

Pacheco, Edson José

dc.contributor	Nohama, Percy
dc.contributor	Schulz, Stefan
dc.creator	Pacheco, Edson José
dc.date.accessioned	2010-10-14T18:49:57Z
dc.date.accessioned	2022-12-06T14:17:48Z
dc.date.available	2010-10-14T18:49:57Z
dc.date.available	2022-12-06T14:17:48Z
dc.date.created	2010-10-14T18:49:57Z
dc.date.issued	2009
dc.identifier	PACHECO, Edson José. MorphoMap: mapeamento automático de narrativas clínicas para uma terminologia médica. 2009. 155 f. Tese (Doutorado em Engenharia Elétrica e Informática Industrial) – Universidade Tecnológica Federal do Paraná, Curitiba, 2009.
dc.identifier	http://repositorio.utfpr.edu.br/jspui/handle/1/124
dc.identifier.uri	https://repositorioslatinoamericanos.uchile.cl/handle/2250/5245815
dc.description.abstract	Clinical documentation requires the representation of fine-grained descriptions of patients' history, evolution, and treatment. These descriptions are materialized in findings reports, medical orders, as well as in evolution and discharge summaries. In most clinical environments natural language is the main carrier of documentation. Written clinical jargon is commonly characterized by idiosyncratic terminology, a high frequency of highly context-dependent ambiguous expressions (especially acronyms and abbreviations). Violations of spelling and grammar rules are common. The purpose of this work is to map free text from clinical narratives to a domain ontology (SNOMED CT). To this end, natural language processing (NLP) tools will be combined with a heuristic of semantic mapping. The study uses discharge summaries from the Hospital das Clínicas de Porto Alegre, RS, Brazil. Parts of these texts are used for creating a training corpus, using manual annotation supported by active learning technology, used for the training of NLP tools that are used for the identification of parts of speech, the cleansing of "dirty" text passages. Thus it was possible to obtain relatively well-formed and unambiguous noun phrases, heuristics was implemented to semantic mapping between these noun phrases (in Portuguese) and the terms describing the SNOMED CT concepts (English and Spanish) uses the technology of morphosemantic indexing, using a multilingual subword thesaurus, provided by the MorphoSaurus system, the resolution of acronyms, and the identification of named entities (e.g. numbers). In this study, 80 per cent of the summaries are analyzed and manually annotated, resulting in a domain corpus that supports the specialization of the OpenNLP system, mainly following the paradigm of statistical natural language processing (the accuracy of the tagger obtained was 93.67%). Simultaneously, several techniques have been used for validating and improving the subword thesaurus. To this end, the semantic representations of comparable test corpora from the medical domain in English, Spanish, and Portuguese were compared with regard to the relative frequency of semantic identifiers, improving the corpus coverage (2% to Portuguese, and 50% to Spanish). The result was used as an input by a team of lexicon curators, which continuously fix errors and fill gaps in the trilingual thesaurus underlying the MorphoSaurus system. The progress of this work could be objectified using OHSUMED, a standard medical information retrieval benchmark. The mapping of text-encoded clinical information to a domain ontology constitutes an area of high scientific and practical interest due to the need for the analysis of structured data, whereas the clinical information is routinely recorded in a largely unstructured way. In this work the ontology used was SNOMED CT. The evaluation of mapping methodology indicates accuracy of 83.9%.
dc.publisher	Universidade Tecnológica Federal do Paraná
dc.publisher	Curitiba
dc.publisher	Programa de Pós-Graduação em Engenharia Elétrica e Informática Industrial
dc.rights	openAccess
dc.subject	Ontologia
dc.subject	Sistemas de processamento da fala
dc.subject	Linguística - Processamento de dados
dc.subject	Sistemas de recuperação da informação
dc.subject	Informática médica
dc.subject	Medicina - Processamento de dados
dc.subject	Registros médicos
dc.subject	Engenharia elétrica
dc.subject	Ontology
dc.subject	Speech processing systems
dc.subject	Computational linguistics
dc.subject	Information storage and retrieval systems
dc.subject	Medical informatics
dc.subject	Medical - Data processing
dc.subject	Medical records
dc.subject	Electric engineering
dc.title	MorphoMap: mapeamento automático de narrativas clínicas para uma terminologia médica
dc.type	doctoralThesis

Este ítem pertenece a la siguiente institución

Universidade Tecnológica Federal do Paraná (Brasil)