Trabalho de Conclusão de Curso de Graduação
Mineração de dados distribuída e escalável usando Apache Mahout
Fecha
2010-12-06Autor
Pereira, Adriano
Institución
Resumen
Huge data sets have been generated from computing tools. Implicit patterns could
be present in this data. Data mining worries in look for relationship, specially, in large
data sets, enabling the extration of useful new information. Distributed computing allows
the data decentralization and speeds up the data mining process. Apache Mahout is a
distributed data mining tool, which uses MapReduce program model, promising scalability
by spliting the workload in independents tasks, among themselves. This work has
as objective to verify Apache Mahout’s performance, through a implemented algoritms’
choice, data set preparation and mining of these data in differents distributed environments,
analyzing the tool’s scalability, as the performance improvement due to nodes’ or
cores’ addition to the processing.