info:eu-repo/semantics/article
Compression of Very Sparse Column Oriented Data
Registro en:
10.5902/2448190422772
Autor
Garcia, Vinicius Fulber
Mergen, Sergio Luis Sardi
Institución
Resumen
Column oriented databases store columns contiguously on disk. The adjacency of values from the same domain leads to a reduced information entropy. Consequently, compression algorithms are able to achieve better results. Columns whose values have a high cardinality are usually compressed using variations of the LZ method. In this paper, we consider the usage of simpler methods based on run-length and symbols probability in scenarios where datasets are very sparse. Our experiments show in which cases the simple methods evaluated provide promising results.