Extração automática de relações semânticas a partir de textos escritos em português do Brasil

Taba, Leonardo Sameshima

dc.contributor	Caseli, Helena de Medeiros
dc.contributor	http://lattes.cnpq.br/6608582057810385
dc.contributor	http://lattes.cnpq.br/2945193976624030
dc.creator	Taba, Leonardo Sameshima
dc.date.accessioned	2013-09-27
dc.date.accessioned	2016-06-02T19:06:08Z
dc.date.available	2013-09-27
dc.date.available	2016-06-02T19:06:08Z
dc.date.created	2013-09-27
dc.date.created	2016-06-02T19:06:08Z
dc.date.issued	2013-07-11
dc.identifier	TABA, Leonardo Sameshima. Extração automática de relações semânticas a partir de textos escritos em português do Brasil. 2013. 98 f. Dissertação (Mestrado em Ciências Exatas e da Terra) - Universidade Federal de São Carlos, São Carlos, 2013.
dc.identifier	https://repositorio.ufscar.br/handle/ufscar/543
dc.description.abstract	Information extraction (IE) is one of the many applications in Natural Language Processing (NLP); it focuses on processing texts in order to retrieve specific information about a certain entity or concept. One of its subtasks is the automatic extraction of semantic relations between terms, which is very useful in the construction and improvement of linguistic resources such as ontologies and lexical bases. Moreover, there s a rising demand for semantic knowledge, as many computational NLP systems need that information in their processing. Applications such as information retrieval from web documents and automatic translation to other languages could benefit from that kind of knowledge. However, there aren t sufficient human resources to produce that knowledge at the same rate of its demand. Aiming to solve that semantic data scarcity problem, this work investigates how binary semantic relations can be automatically extracted from Brazilian Portuguese texts. These relations are based on Minsky s (1986) theory and are used to represent common sense knowledge in the Open Mind Common Sense no Brasil (OMCS-Br) project developed at LIA (Laboratório de Interação Avanc¸ada), partner of LaLiC (Laborat´orio de Lingu´ıstica Computacional), where this research was conducted, both in Universidade Federal de São Carlos (UFSCar). The first strategies for this task were based on searching textual patterns in texts, where a certain textual expression indicates that there is a specific relation between two terms in a sentence. This approach has high precision but low recall, which led to the research of methods that use machine learning as their main model, encompassing techniques such as probabilistic and statistical classifiers and also kernel methods, which currently figure among the state of the art. Therefore, this work investigates, implements and evaluates some of these techniques in order to determine how and to which extent they can be applied to the automatic extraction of binary semantic relations in Portuguese texts. In that way, this work is an important step in the advancement of the state of the art in information extraction for the Portuguese language, which still lacks resources in the semantic area, and also advances the Portuguese language NLP scenario as a whole.
dc.publisher	Universidade Federal de São Carlos
dc.publisher	BR
dc.publisher	UFSCar
dc.publisher	Programa de Pós-Graduação em Ciência da Computação - PPGCC
dc.rights	Acesso Aberto
dc.subject	Inteligência artificial
dc.subject	Processamento de linguagem natural (Computação)
dc.subject	Extração de informação
dc.subject	Extração de relações semânticas
dc.title	Extração automática de relações semânticas a partir de textos escritos em português do Brasil
dc.type	Tesis

Este ítem pertenece a la siguiente institución

Universidade Federal de São Carlos (Brasil)