Artículo de revista
Space-efficient construction of LZ-index
Fecha
2005Registro en:
ALGORITHMS AND COMPUTATION LECTURE NOTES IN COMPUTER SCIENCE 3827: 1143-1152 2005
0302-9743
Autor
Arroyuelo Billiardi, Diego
Navarro, Gonzalo
Institución
Resumen
A compressed full-text self-index is a data structure that replaces a text and in addition gives indexed access to it, while taking space proportional to the compressed text size. The LZ-index, in particular, requires 4uH(k)(1 + o(1)) bits of space, where u is the text length in characters and H-k is its k-th order empirical entropy, Although in practice the LZ-index needs 1.0-1.5 times the text size, its construction requires Much more main memory (around 5 times the text size), which limits its applicability to large texts. In this paper we present a practical space-efficient algorithm to construct LZ-index, requiring (4 + is an element of)uH(k) + o(u) bits of space, for any constant 0 < epsilon < 1, and O(sigma u) time, being sigma the alphabet size. Our experimental results show that our method is efficient in practice, needing an amount of memory close to that of the final index.