dc.creatorAmancio, Diego Raphael
dc.creatorAltmann, Eduardo G.
dc.creatorOliveira Junior, Osvaldo Novais de
dc.creatorCosta, Luciano da Fontoura
dc.date.accessioned2015-12-23T17:42:16Z
dc.date.accessioned2018-07-04T16:53:54Z
dc.date.available2015-12-23T17:42:16Z
dc.date.available2018-07-04T16:53:54Z
dc.date.created2015-12-23T17:42:16Z
dc.date.issued2011-12
dc.identifierNew Journal of Physics, Bristol : Institute of Physics - IOP, v. 13, 123024-1-123024-17, Dez. 2011
dc.identifier1367-2630
dc.identifierhttp://www.producao.usp.br/handle/BDPI/49400
dc.identifier10.1088/1367-2630/13/12/123024
dc.identifier.urihttp://repositorioslatinoamericanos.uchile.cl/handle/2250/1641872
dc.description.abstractMany features of texts and languages can now be inferred from statistical analyses using concepts from complex networks and dynamical systems. In this paper, we quantify how topological properties of word co-occurrence networks and intermittency (or burstiness) in word distribution depend on the style of authors. Our database contains 40 books by eight authors who lived in the nineteenth and twentieth centuries, for which the following network measurements were obtained: the clustering coefficient, average shortest path lengths and betweenness. We found that the two factors with stronger dependence on authors were skewness in the distribution of word intermittency and the average shortest paths. Other factors such as betweenness and Zipf's law exponent show only weak dependence on authorship. Also assessed was the contribution from each measurement to authorship recognition using three machine learning methods. The best performance was about 65% accuracy upon combining complex networks and intermittency features with the nearest-neighbor algorithm of automatic authorship. From a detailed analysis of the interdependence of the various metrics, it is concluded that the methods used here are complementary for providing short- and long-scale perspectives on texts, which are useful for applications such as the identification of topical words and information retrieval.
dc.languageeng
dc.publisherInstitute of Physics - IOP
dc.publisherBristol
dc.relationNew Journal of Physics
dc.rightsCopyrigth IOP Publishing Ltd and Deutsche Physikalische Gesellschaft
dc.rightsopenAccess
dc.titleComparing intermittency and network measurements of words and their dependence on authorship
dc.typeArtículos de revistas


Este ítem pertenece a la siguiente institución