SparkBLAST : utilização da ferramenta Apache Spark para a execução do BLAST em ambiente distribuído e escalável

Castro, Marcelo Rodrigo de

dc.contributor	Senger, Hermes
dc.contributor	http://lattes.cnpq.br/3691742159298316
dc.contributor	http://lattes.cnpq.br/8688712033943534
dc.creator	Castro, Marcelo Rodrigo de
dc.date.accessioned	2017-09-25T17:05:03Z
dc.date.available	2017-09-25T17:05:03Z
dc.date.created	2017-09-25T17:05:03Z
dc.date.issued	2017-02-13
dc.identifier	CASTRO, Marcelo Rodrigo de. SparkBLAST : utilização da ferramenta Apache Spark para a execução do BLAST em ambiente distribuído e escalável. 2017. Dissertação (Mestrado em Ciência da Computação) – Universidade Federal de São Carlos, São Carlos, 2017. Disponível em: https://repositorio.ufscar.br/handle/ufscar/9114.
dc.identifier	https://repositorio.ufscar.br/handle/ufscar/9114
dc.description.abstract	With the evolution of next generation sequencing devices, the cost for obtaining genomic data has significantly reduced. With reduced costs for sequencing, the amount of genomic data to be processed has increased exponentially. Such data growth supersedes the rate at which computing power can be increased year after year by the hardware and software evolution. Thus, the higher rate of data growth in bioinformatics raises the need for exploiting more efficient and scalable techniques based on parallel and distributed processing, including platforms like Clusters, and Cloud Computing. BLAST is a widely used tool for genomic sequences alignment, which has native support for multicore-based parallel processing. However, its scalability is limited to a single machine. On the other hand, Cloud computing has emerged as an important technology for supporting rapid and elastic provisioning of large amounts of resources. Current frameworks like Apache Hadoop and Apache Spark provide support for the execution of distributed applications. Such environments provide mechanisms for embedding external applications in order to compose large distributed jobs which can be executed on clusters and cloud platforms. In this work, we used Spark to support the high scalable and efficient parallelization of BLAST (Basic Local Alingment Search Tool) to execute on dozens to hundreds of processing cores on a cloud platform. As result, our prototype has demonstrated better performance and scalability then CloudBLAST, a Hadoop based parallelization of BLAST.
dc.language	por
dc.publisher	Universidade Federal de São Carlos
dc.publisher	UFSCar
dc.publisher	Programa de Pós-Graduação em Ciência da Computação - PPGCC
dc.publisher	Câmpus São Carlos
dc.rights	Acesso aberto
dc.subject	BLAST
dc.subject	Apache Spark
dc.subject	Nuvens computacionais
dc.subject	Sequenciamento genético
dc.subject	Cloud computing
dc.subject	Genetic sequencing
dc.subject	Hadoop
dc.title	SparkBLAST : utilização da ferramenta Apache Spark para a execução do BLAST em ambiente distribuído e escalável
dc.type	Tesis

Este ítem pertenece a la siguiente institución

Universidade Federal de São Carlos (Brasil)