Investigação de estratégias de seleção de conteúdo baseadas na UNL (Universal Networking Language)

Chaud, Matheus Rigobelo

dc.contributor	Di Felippo, Ariani
dc.contributor	http://lattes.cnpq.br/8648412103197455
dc.contributor	http://lattes.cnpq.br/4655951844884252
dc.creator	Chaud, Matheus Rigobelo
dc.date.accessioned	2015-04-13
dc.date.accessioned	2016-06-02T20:25:24Z
dc.date.available	2015-04-13
dc.date.available	2016-06-02T20:25:24Z
dc.date.created	2015-04-13
dc.date.created	2016-06-02T20:25:24Z
dc.date.issued	2015-03-03
dc.identifier	CHAUD, Matheus Rigobelo. Investigação de estratégias de seleção de conteúdo baseadas na UNL (Universal Networking Language). 2015. 171 f. Dissertação (Mestrado em Ciências Humanas) - Universidade Federal de São Carlos, São Carlos, 2015.
dc.identifier	https://repositorio.ufscar.br/handle/ufscar/5799
dc.description.abstract	The field of Natural Language Processing (NLP) has witnessed increased attention to Multilingual Multidocument Summarization (MMS), whose goal is to process a cluster of source documents in more than one language and generate a summary of this collection in one of the target languages. In MMS, the selection of sentences from source texts for summary generation may be based on either shallow or deep linguistic features. The purpose of this research was to investigate whether the use of deep knowledge, obtained from a conceptual representation of the source texts, could be useful for content selection in texts within the newspaper genre. In this study, we used a formal representation system the UNL (Universal Networking Language). In order to investigate content selection strategies based on this interlingua, 3 clusters of texts were represented in UNL, each consisting of 1 text in Portuguese, 1 text in English and 1 human-written reference summary. Additionally, in each cluster, the sentences of the source texts were aligned to the sentences of their respective human summaries, in order to identify total or partial content overlap between these sentences. The data collected allowed a comparison between content selection strategies based on conceptual information and a traditional selection method based on a superficial feature - the position of the sentence in the source text. According to the results, content selection based on sentence position was more closely correlated with the selection made by the human summarizer, compared to the conceptual methods investigated. Furthermore, the sentences in the beginning of the source texts, which, in newspaper articles, usually convey the most relevant information, did not necessarily contain the most frequent concepts in the text collection; on several occasions, the sentences with the most frequent concepts were in the middle or at the end of the text. These results indicate that, at least in the clusters analyzed, other criteria besides concept frequency help determine the relevance of a sentence. In other words, content selection in human multidocument summarization may not be limited to the selection of the sentences with the most frequent concepts. In fact, it seems to be a much more complex process.
dc.publisher	Universidade Federal de São Carlos
dc.publisher	BR
dc.publisher	UFSCar
dc.publisher	Programa de Pós-Graduação em Linguística - PPGL
dc.rights	Acesso Aberto
dc.subject	Linguística aplicada
dc.subject	Sumarização automática
dc.subject	Estratégias de seleção de conteúdo
dc.subject	Interlíngua UNL (Universal Networking Language)
dc.subject	Processamento automático de línguas naturais
dc.subject	Sistemas de representação de conhecimento
dc.subject	Automatic summarization
dc.subject	Multilingual multidocument summarization
dc.subject	Natural language processing
dc.subject	Knowledge representation systems
dc.subject	Universal networking language
dc.subject	Content selection
dc.title	Investigação de estratégias de seleção de conteúdo baseadas na UNL (Universal Networking Language)
dc.type	Tesis

Este ítem pertenece a la siguiente institución

Universidade Federal de São Carlos (Brasil)