Tesis
Descrição linguística da complementaridade para a sumarização automática multidocumento
Fecha
2015-11-11Registro en:
Autor
Souza, Jackson Wilke da Cruz
Institución
Resumen
Automatic Multidocument Summarizarion (AMS) is a computational alternative
to process the large quantity of information available online. In AMS, we try to
automatically generate a single coherent and cohesive summary from a set of
documents which have same subject, each these documents are originate from
different sources. Furthermore, some methods of AMS select the most important
information from the collection to compose the summary. The selection of main
content sometimes requires the identification of redundancy, complementarity and
contradiction, characterized by being the multidocument phenomena. The
identification of complementarity, in particular, is relevant inasmuch as some
information may be selected to the summary as a complement of another
information that was already selected, ensuring more coherence and most
informative. Some AMS methods to condense the content of the documents based
on the identification of relations from the Cross-document Structure Theory
(CST), which is established between sentences of different documents. These
relationships (for example Historical background) capture the phenomenon of
complementarity. Automatic detection of these relationships is often made based
on lexical similarity between a pair of sentences, since research on AMS not count
on studies that have characterized the phenomenon and show other relevant
linguistic strategies to automatically detect the complementarity. In this work, we
present the linguistic description of complementarity based on corpus. In addition,
we elaborate the characteristics of this phenomenon in attributes that support the
automatic identification. As a result, we obtained sets of rules that demonstrate the
most relevant attributes for complementary CST relations (Historical background,
Follow-up and Elaboration) and its types (temporal and timeless)
complementarity. According this, we hope to contribute to the Descriptive
Linguistics, with survey-based corpus of linguistic characteristics of this
phenomenon, as of Automatic Processing of Natural Languages, by means of
rules that can support the automatic identification of CST relations and types
complementarity.