Relative Scalability of NoSQL Databases for Genotype Data Manipulation

dc.contributorCoord. de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)pt-BR
dc.contributorCons. Nac. de Des. Científico e Tecnológico (CNPq)pt-BR
dc.contributorFund. de Amparo à Pesq. do Estado de Minas Gerais (FAPEMIG)pt-BR
dc.contributorUniv. Fed. de Juiz de Fora (UFJF)pt-BR
dc.contributorEmp. Bras. de Pesq. Agropecuária (Embrapa)pt-BR
dc.contributorCoord. de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)en-US
dc.contributorCons. Nac. de Des. Científico e Tecnológico (CNPq)en-US
dc.contributorFund. de Amparo à Pesq. do Estado de Minas Gerais (FAPEMIG)en-US
dc.contributorUniv. Fed. de Juiz de Fora (UFJF)en-US
dc.contributorEmp. Bras. de Pesq. Agropecuária (Embrapa)en-US
dc.creatorAlmeida, Arthur Lorenzi
dc.creatorSchettino, Vinícius Junqueira
dc.creatorBarbosa, Thiago Jesus Rodrigues
dc.creatorFreitas, Pedro Fernandes
dc.creatorGuimarães, Pedro Gabriel Silva
dc.creatorArbex, Wagner
dc.date2018-07-17
dc.date.accessioned2018-11-07T21:09:54Z
dc.date.available2018-11-07T21:09:54Z
dc.identifierhttps://seer.ufrgs.br/rita/article/view/RITA-VOL-25-NR2-93
dc.identifier10.22456/2175-2745.79334
dc.identifier.urihttp://repositorioslatinoamericanos.uchile.cl/handle/2250/2187579
dc.descriptionGenotype data manipulation is one of the greatest challenges in research fields such as population genetics, bioinformatics and genomics mainly because of high dimensionality and unbalancing characteristics. These peculiarities explain why relational database management systems (RDBMS), the "de facto" standard storage solution, have not been presented as the best tools for this kind of data. However, the Big Data advent has been pushing the development of modern database systems that might be able to overcome RDBMS deficiencies. In this context, we extended our previous works on the evaluation of relative performance among NoSQLs engines from different families, adapting the schema design in order to achieve better performance based on its conclusions, thus being able to store more SNP markers for each individual. Using Yahoo! Cloud Serving Benchmark (YCSB) benchmark framework, we assessed each database system over hypothetical genotype  data (SNP markers). Results indicate that Tarantool is approximately 21,8% more efficient than MongoDB when storing 770,000 SNP markers, but MongoDB is less impacted by the increase of SNP markers per individual.pt-BR
dc.descriptionGenotype data manipulation is one of the greatest challenges in bioinformatics and genomics mainly because of high dimensionality and unbalancing characteristics. These peculiarities explains why Relational Database Management Systems (RDBMSs), the "de facto" standard storage solution, have not been presented as the best tools for this kind of data. However, Big Data has been pushing the development of modern database systems that might be able to overcome RDBMSs deficiencies. In this context, we extended our previous works on the evaluation of relative performance among NoSQLs engines from different families, adapting the schema design in order to achieve better performance based on its conclusions, thus being able to store more SNP markers for each individual. Using Yahoo! Cloud Serving Benchmark (YCSB) benchmark framework, we assessed each database system over hypothetical SNP sequences. Results indicate that although Tarantool has the best overall throughput, MongoDB is less impacted by the increase of SNP markers per individual.en-US
dc.formatapplication/pdf
dc.languageeng
dc.publisherInstituto de Informática - Universidade Federal do Rio Grande do Sulen-US
dc.relationhttps://seer.ufrgs.br/rita/article/view/RITA-VOL-25-NR2-93/pdf
dc.rightsDireitos autorais 2018 Arthur Lorenzi Almeida, Vinícius Junqueira Schettino, Thiago Jesus Rodrigues Barbosa, Pedro Fernandes Freitas, Maurício Henrique Laier, Pedro Gabriel Silva Guimarães, Wagner Arbexpt-BR
dc.rightshttp://creativecommons.org/licenses/by-nc-nd/4.0pt-BR
dc.sourceRevista de Informática Teórica e Aplicada; v. 25, n. 2 (2018); 93-100en-US
dc.sourceRevista de Informática Teórica e Aplicada; v. 25, n. 2 (2018); 93-100pt-BR
dc.source21752745
dc.source01034308
dc.subjectComputer Science; Bioinformaticspt-BR
dc.subjectDatabase; NoSQL; Bionformatics; Data Science; SNP; Genotypept-BR
dc.titleRelative Scalability of NoSQL Databases for Genotype Data Manipulationpt-BR
dc.titleRelative Scalability of NoSQL Databases for Genotype Data Manipulationen-US
dc.typeArtículos de revistas
dc.typeArtículos de revistas


Este ítem pertenece a la siguiente institución