Artículo de revista
Smaller and Faster Lempel-Ziv Indices
Autor
Arroyuelo, Diego
Navarro, Gonzalo
Institución
Resumen
Given a text T[1..u] over an alphabet of size σ = O(polylog(u))
and with k-th order empirical entropy Hk(T), we propose a new compressed
full-text self-index based on the Lempel-Ziv (LZ) compression
algorithm, which replaces T with a representation requiring about three
times the size of the compressed text, i.e (3+ǫ)uHk(T)+o(u log σ) bits,
for any ǫ > 0 and k = o(logσ u), and in addition gives indexed access
to T: it is able to locate the occ occurrences of a pattern P[1..m] in the
text in O((m + occ) log u) time. Our index is smaller than the existing
indices that achieve this locating time complexity, and locates the occurrences
faster than the smaller indices. Furthermore, our index is able to
count the pattern occurrences in O(m) time, and it can extract any text
substring of length ℓ in optimal O(ℓ/ logσ u) time. Overall, our indices
appear as a very attractive alternative for space-efficient indexed text
searching.