Artículos de revistas
Information Approach to Co-occurrence of Words in Written Language
Fecha
2015-06Registro en:
Hernández Lahme, Damián Gabriel; Information Approach to Co-occurrence of Words in Written Language; Complex Systems Publications; Complex systems; 24; 2; 6-2015; 1-21
0891-2513
CONICET Digital
CONICET
Autor
Hernández Lahme, Damián Gabriel
Resumen
In this paper we study the distribution of words across the different parts of a book using tools from information theory. In particular, the mutual information between words in the text and parts of the text is compared with the mutual information of a shuffled version of the book. This analysis allows us to extract not only relevant words of the text but also relationships between the different words, such as co-occurrence and repulsion between them. With the connections due to co-occurrence of words, we show how to construct a network that reflects the semantic organization of the book. This method can be applied to other types of sequences, measuring the relations between the different symbols that compose such sequences.