dc.creatorArroyuelo, Diego
dc.creatorNavarro, Gonzalo
dc.date.accessioned2013-12-20T15:21:35Z
dc.date.available2013-12-20T15:21:35Z
dc.date.created2013-12-20T15:21:35Z
dc.date.issued2007
dc.identifierhttps://repositorio.uchile.cl/handle/2250/125817
dc.description.abstractFull-text searching consists in locating the occurrences of a given pattern P[1::m] in a text T[1::u], both sequences over an alpha- bet of size ¾. In this paper we de¯ne a new index for full-text searching on secondary storage, based on the Lempel-Ziv compression algorithm and requiring 8uHk +o(u log ¾) bits of space, where Hk denotes the k-th order empirical entropy of T, for any k = o(log¾ u). Our experimental re- sults show that our index is signi¯cantly smaller than any other practical secondary-memory data structure: 1.4{2.3 times the text size including the text, which means 39%{65% the size of traditional indexes like String B-trees [Ferragina and Grossi, JACM 1999]. In exchange, our index re- quires more disk access to locate the pattern occurrences. Our index is able to report up to 600 occurrences per disk access, for a disk page of 32 kilobytes. If we only need to count pattern occurrences, the space can be reduced to about 1.04{1.68 times the text size, requiring about 20{60 disk accesses, depending on the pattern length.
dc.languageen
dc.rightshttp://creativecommons.org/licenses/by-nc-nd/3.0/cl/
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Chile
dc.subjectA Lempel-Ziv Text Index
dc.titleA Lempel-Ziv Text Index on Secondary Storage
dc.typeArtículo de revista


Este ítem pertenece a la siguiente institución