Objeto de conferencia
TreeSpark: A Distributed Tool for Progeny Analysis based on Spark
Registro en:
isbn:978-987-633-574-4
Autor
López, Paula
Hasperué, Waldo
Quiroga, Facundo Manuel
Ronchetti, Franco
Institución
Resumen
Progeny analyses are useful in biological sciences for various purposes, such as improving individuals in new generations or carrying out molecular analysis of the transmission of genetic characteristics. Analyzing these data by making comparisons between individuals of a generation with their offspring is not a trivial task, and increases in complexity as more and more generations are incorporated. In this article, we present TreeSpark, an open source tool to carry out progeny analysis and provides functionality that allows simple access to the information of the individuals and their relations both as progenitors and descendants. This tool is developed as a Python module, which in turn inherits the distributed processing features of Spark, allowing it to process large volumes of progeny information. TreeSpark is compared with other similar tools, finding TreeSpark much simpler to use. Workshop: WBDMD - Base de Datos y Minería de Datos Red de Universidades con Carreras en Informática