dc.creatorMaziero, Erick Galani
dc.creatorJorge, María Lucía Del Rosario Castro
dc.creatorPardo, Thiago Alexandre Salgueiro
dc.date.accessioned2014-05-07T19:15:48Z
dc.date.accessioned2018-07-04T16:47:44Z
dc.date.available2014-05-07T19:15:48Z
dc.date.available2018-07-04T16:47:44Z
dc.date.created2014-05-07T19:15:48Z
dc.date.issued2014-03
dc.identifierInformation Processing and Management, Oxford, v. 50, n.2, p. 297-314, 2014
dc.identifierhttp://www.producao.usp.br/handle/BDPI/44759
dc.identifier10.1016/j.ipm.2013.12.003
dc.identifierhttp://dx.doi.org/10.1016/j.ipm.2013.12.003
dc.identifier.urihttp://repositorioslatinoamericanos.uchile.cl/handle/2250/1640470
dc.description.abstractMulti-document discourse parsing aims to automatically identify the relations among textual spans from different texts on the same topic. Recently, with the growing amount of information and the emergence of new technologies that deal with many sources of information, more precise and efficient parsing techniques are required. The most relevant theory to multi-document relationship, Cross-document Structure Theory (CST), has been used for parsing purposes before, though the results had not been satisfactory. CST has received many critics because of its subjectivity, which may lead to low annotation agreement and, consequently, to poor parsing performance. In this work, we propose a refinement of the original CST, which consists in (i) formalizing the relationship definitions, (ii) pruning and combining some relations based on their meaning, and (iii) organizing the relations in a hierarchical structure. The hypothesis for this refinement is that it will lead to better agreement in the annotation and consequently to better parsing results. For this aim, it was built an annotated corpus according to this refinement and it was observed an improvement in the annotation agreement. Based on this corpus, a parser was developed using machine learning techniques and hand-crafted rules. Specifically, hierarchical techniques were used to capture the hierarchical organization of the relations according to the proposed refinement of CST. These two approaches were used to identify the relations among texts spans and to generate multi-document annotation structure. Results outperformed other CST parsers, showing the adequacy of the proposed refinement in the theory.
dc.languageeng
dc.publisherElsevier
dc.publisherOxford
dc.relationInformation Processing and Management
dc.rightsCopyright Elsevier Ltd.
dc.rightsclosedAccess
dc.subjectDiscourse parsing
dc.subjectMulti-document processing
dc.subjectCross-document Structure Theory
dc.subjectMachine learning
dc.titleRevisiting Cross-document Structure Theory for multi-document discourse parsing
dc.typeArtículos de revistas


Este ítem pertenece a la siguiente institución