dc.creatorArroyuelo, Diego
dc.creatorNavarro, Gonzalo
dc.creatorSadakane, Kunihiko
dc.date.accessioned2012-05-22T21:16:34Z
dc.date.available2012-05-22T21:16:34Z
dc.date.created2012-05-22T21:16:34Z
dc.date.issued2012
dc.identifierAlgorithmica (2012) 62:54–101
dc.identifierDOI 10.1007/s00453-010-9443-8
dc.identifierhttps://repositorio.uchile.cl/handle/2250/125597
dc.description.abstractGiven a text T [1..u] over an alphabet of size σ, the full-text search problem consists in finding the occ occurrences of a given pattern P[1..m] in T . In indexed text searching we build an index on T to improve the search time, yet increasing the space requirement. The current trend in indexed text searching is that of compressed full-text self-indices, which replace the text with a more space-efficient representation of it, at the same time providing indexed access to the text. Thus, we can provide efficient access within compressed space. The Lempel-Ziv index (LZ-index) of Navarro is a compressed full-text self-index able to represent T using 4uHk(T ) + o(ulogσ) bits of space, where Hk(T ) denotes the k-th order empirical entropy of T , for any k = o(logσ u). This space is about four times the compressed text size. The index can locate all the occ occurrences of a pattern P in T in O(m3 log σ + (m + occ) log u) worst-case time. Although this index has proven very competitive in practice, the O(m3 logσ) term can be excessive for long patterns. Also, the factor 4 in its space complexity makes it larger than other state-of-the-art alternatives. In this paper we present stronger Lempel-Ziv based indices (LZ-indices), improving the overall performance of the original LZ-index. We achieve indices requiring (2 + )uHk(T ) + o(ulogσ) bits of space, for any constant > 0, which makes them the smallest existing LZ-indices. We simultaneously improve the search time to O(m2 + (m + occ) log u), which makes our indices very competitive with stateof- the-art alternatives. Our indices support displaying any text substring of length in optimal O( / logσ u) time. In addition, we show how the space can be squeezed to (1+ )uHk(T )+o(ulogσ) to obtain a structure with O(m2) average search time for m 2 logσ u. Alternatively, the search time of LZ-indices can be improved to O((m + occ) log u) with (3 + )uHk(T ) + o(ulogσ) bits of space, which is much less than the space needed by other Lempel-Ziv-based indices achieving the same search time. Overall our indices stand out as a very attractive alternative for spaceefficient indexed text searching.
dc.languageen
dc.publisherSpringer
dc.subjectText compression
dc.titleStronger Lempel-Ziv Based Compressed Text Indexing
dc.typeArtículo de revista


Este ítem pertenece a la siguiente institución