dc.contributorDi Felippo, Ariani
dc.contributorhttp://lattes.cnpq.br/8648412103197455
dc.contributorhttp://lattes.cnpq.br/4655951844884252
dc.creatorChaud, Matheus Rigobelo
dc.date.accessioned2015-04-13
dc.date.accessioned2016-06-02T20:25:24Z
dc.date.available2015-04-13
dc.date.available2016-06-02T20:25:24Z
dc.date.created2015-04-13
dc.date.created2016-06-02T20:25:24Z
dc.date.issued2015-03-03
dc.identifierCHAUD, Matheus Rigobelo. Investigação de estratégias de seleção de conteúdo baseadas na UNL (Universal Networking Language). 2015. 171 f. Dissertação (Mestrado em Ciências Humanas) - Universidade Federal de São Carlos, São Carlos, 2015.
dc.identifierhttps://repositorio.ufscar.br/handle/ufscar/5799
dc.description.abstractThe field of Natural Language Processing (NLP) has witnessed increased attention to Multilingual Multidocument Summarization (MMS), whose goal is to process a cluster of source documents in more than one language and generate a summary of this collection in one of the target languages. In MMS, the selection of sentences from source texts for summary generation may be based on either shallow or deep linguistic features. The purpose of this research was to investigate whether the use of deep knowledge, obtained from a conceptual representation of the source texts, could be useful for content selection in texts within the newspaper genre. In this study, we used a formal representation system the UNL (Universal Networking Language). In order to investigate content selection strategies based on this interlingua, 3 clusters of texts were represented in UNL, each consisting of 1 text in Portuguese, 1 text in English and 1 human-written reference summary. Additionally, in each cluster, the sentences of the source texts were aligned to the sentences of their respective human summaries, in order to identify total or partial content overlap between these sentences. The data collected allowed a comparison between content selection strategies based on conceptual information and a traditional selection method based on a superficial feature - the position of the sentence in the source text. According to the results, content selection based on sentence position was more closely correlated with the selection made by the human summarizer, compared to the conceptual methods investigated. Furthermore, the sentences in the beginning of the source texts, which, in newspaper articles, usually convey the most relevant information, did not necessarily contain the most frequent concepts in the text collection; on several occasions, the sentences with the most frequent concepts were in the middle or at the end of the text. These results indicate that, at least in the clusters analyzed, other criteria besides concept frequency help determine the relevance of a sentence. In other words, content selection in human multidocument summarization may not be limited to the selection of the sentences with the most frequent concepts. In fact, it seems to be a much more complex process.
dc.publisherUniversidade Federal de São Carlos
dc.publisherBR
dc.publisherUFSCar
dc.publisherPrograma de Pós-Graduação em Linguística - PPGL
dc.rightsAcesso Aberto
dc.subjectLinguística aplicada
dc.subjectSumarização automática
dc.subjectEstratégias de seleção de conteúdo
dc.subjectInterlíngua UNL (Universal Networking Language)
dc.subjectProcessamento automático de línguas naturais
dc.subjectSistemas de representação de conhecimento
dc.subjectAutomatic summarization
dc.subjectMultilingual multidocument summarization
dc.subjectNatural language processing
dc.subjectKnowledge representation systems
dc.subjectUniversal networking language
dc.subjectContent selection
dc.titleInvestigação de estratégias de seleção de conteúdo baseadas na UNL (Universal Networking Language)
dc.typeTesis


Este ítem pertenece a la siguiente institución