Artículo de revista
Compressed Full-Text Indexes
Fecha
2007-04Registro en:
ACM COMPUTING SURVEYS, V.: 39, issue: 1, Article N°: 2, APR 2007.
0360-0300
Autor
Navarro, Gonzalo
Makinen, Veli
Institución
Resumen
Full-text indexes provide fast substring search over large text collections. A serious problem of these indexes
has traditionally been their space consumption. A recent trend is to develop indexes that exploit the compressibility
of the text, so that their size is a function of the compressed text length. This concept has evolved
into self-indexes, which in addition contain enough information to reproduce any text portion, so they replace
the text. The exciting possibility of an index that takes space close to that of the compressed text, replaces
it, and in addition provides fast search over it, has triggered a wealth of activity and produced surprising
results in a very short time, which radically changed the status of this area in less than 5 years. The most
successful indexes nowadays are able to obtain almost optimal space and search time simultaneously.
In this article we present the main concepts underlying (compressed) self-indexes.We explain the relationship
between text entropy and regularities that show up in index structures and permit compressing them.
Then we cover the most relevant self-indexes, focusing on how they exploit text compressibility to achieve
compact structures that can efficiently solve various search problems. Our aim is to give the background to
understand and follow the developments in this area.