Actas de congresos
RSC: mining and modeling temporal activity in social media
Fecha
2015-08Registro en:
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 21th, 2015, Sydney.
9781450336642
Autor
Costa, Alceu Ferraz
Yamaguchi, Yuto
Traina, Agma Juci Machado
Traina Junior, Caetano
Faloutsos, Christos
Institución
Resumen
Can we identify patterns of temporal activities caused by human communications in social media? Is it possible to model these patterns and tell if a user is a human or a bot based only on the timing of their postings? Social media services allow users to make postings, generating large datasets of human activity time-stamps. In this paper we analyze time-stamp data from social media services and find that the distribution of postings inter-arrival times (IAT) is characterized by four patterns: (i) positive correlation between consecutive IATs, (ii) heavy tails, (iii) periodic spikes and (iv) bimodal distribution. Based on our findings, we propose Rest-Sleep-and-
Comment (RSC), a generative model that is able to match all four discovered patterns. We demonstrate the utility of RSC by showing that it can accurately fit real time-stamp data from Reddit and Twitter. We also show that RSC can be used to spot outliers and detect users with non-human behavior, such as bots. We validate RSC using real data consisting of over 35 million postings from Twitter and Reddit. RSC consistently provides a better fit to real data and clearly outperform existing models for human dynamics. RSC was also able to detect bots with a precision higher than 94%.