Aplicação de conhecimento léxico-conceitual na Sumarização Automática Multidocumento

Luca, Rejeane Cassia de

dc.contributor	Di Felippo, Ariani
dc.contributor	http://lattes.cnpq.br/8648412103197455
dc.contributor	http://lattes.cnpq.br/1599276853975000
dc.creator	Luca, Rejeane Cassia de
dc.date.accessioned	2019-03-28T17:44:54Z
dc.date.available	2019-03-28T17:44:54Z
dc.date.created	2019-03-28T17:44:54Z
dc.date.issued	2019-02-28
dc.identifier	LUCA, Rejeane Cassia de. Aplicação de conhecimento léxico-conceitual na Sumarização Automática Multidocumento. 2019. Dissertação (Mestrado em Linguística) – Universidade Federal de São Carlos, São Carlos, 2019. Disponível em: https://repositorio.ufscar.br/handle/ufscar/11163.
dc.identifier	https://repositorio.ufscar.br/handle/ufscar/11163
dc.description.abstract	Automatic Multi-document Summarization (MDS) aims at creating automatically a single summary from a collection of texts on the same topic in order to provide an alternative way to deal with the massive amount of information on the web. Since such summary is often an extract (i.e., a summary composed of unchanged excerpts extracted from the source texts that convey the main idea of the collection), it is required the selection of the most important sentences of the collection. For sentence selection, there are superficial (linguistic or statistical), deep linguistic, and hybrid methods. Despite being less robust and more expensive, the deep methods produce extracts that are not only more informative but also have higher linguistic quality. Considering the promising results of lexical-conceptual methods in incipient MDS or in multilingual MDS surveys, we investigated 4 methods in monolingual MDS for Portuguese, which is based on the frequency the lexical concepts in the cluster for content selection. We selected CSTNews, a reference multi-document corpus in Portuguese, whose verbs and 10% of the most frequent nouns are annotated with their correspondent synsets from Princeton WordNet. Specifically, we selected 5 clusters from the 50 in CSTNews, and extended the conceptual annotation to all nouns. Then, we applied 4 methods to the 5 clusters (i) LCFSummN, based on simple frequency of nominal concepts in the cluster, (ii) based on simple frequency of nominal and verbal concepts in the cluster, (iii) based on weighted-average for nominal concepts, and (iv) based on weighted-average frequency for nominal and verbal concepts. We intrinsically evaluated the extracts generated by each method regarding linguistic quality and informativeness. When compared to a deep state-of-art MDS method for Portuguese, the results of our investigation show the good performances of the lexical-conceptual methods.
dc.language	por
dc.publisher	Universidade Federal de São Carlos
dc.publisher	UFSCar
dc.publisher	Programa de Pós-Graduação em Linguística - PPGL
dc.publisher	Câmpus São Carlos
dc.rights	Acesso aberto
dc.subject	Sumarização Automática Multidocumento
dc.subject	Conhecimento léxico-conceitual
dc.subject	Processamento Automático de Linguas Naturais
dc.title	Aplicação de conhecimento léxico-conceitual na Sumarização Automática Multidocumento
dc.type	Tesis

Este ítem pertenece a la siguiente institución

Universidade Federal de São Carlos (Brasil)