Actas de congresos
Physical data warehouse design on NoSQL databases OLAP query processing over HBase
Fecha
2016-04Registro en:
International Conference on Enterprise Information Systems, XVIII, 2016, Rome.
9789897581878
Autor
Scabora, Lucas C.
Brito, Jaqueline J.
Ciferri, Ricardo Rodrigues
Ciferri, Cristina Dutra de Aguiar
Institución
Resumen
Nowadays, data warehousing and online analytical processing (OLAP) are core technologies in business intelligence and therefore have drawn much interest by researchers in the last decade. However, these technologies have been mainly developed for relational database systems in centralized environments. In other words, these technologies have not been designed to be applied in scalable systems such as NoSQL databases. Adapting a data warehousing environment to NoSQL databases introduces several advantages, such as scalability and flexibility. This paper investigates three physical data warehouse designs to adapt the Star Schema Benchmark for its use in NoSQL databases. In particular, our main investigation refers to the OLAP query processing over column-oriented databases using the MapReduce framework. We analyze the impact of distributing attributes among column-families in HBase on the OLAP query performance. Our experiments showed how processing time of OLAP queries was impacted by a physical data warehouse design regarding the number of dimensions accessed and the data volume. We conclude that using distinct distributions of attributes among column-families can improve OLAP query performance in HBase and consequently make the benchmark more suitable for OLAP over NoSQL databases.