Trabalho de Conclusão de Curso de Graduação
Análise do desempenho da busca por similaridade utilizando o paradigma MapReduce
Fecha
2016-12-16Autor
Cardoso, Paulo Vinicius Mendonça
Institución
Resumen
The Information Retrieval (RI) is a research area involved in creating solutions for
databases search. The aim of RI is to answer a user information needed. A common data
structure used to assist the search process is the inverted index, composed by entries that lead to
a related object list. In this context, an object search can be done by equivalence or similarity.
Similarity search is a powerful method, since it can retrieve the most similar objects according
to the request. However, the complexity involved in computing the similarity can harm the
performance, making it necessary to resort to alternative processing techniques. The distributed
computing was created to help finding solutions for this type of problem, with tools, paradigms
and distributed architectures. The MapReduce paradigm is an example of distributed model
with the purpose of processing a big amount of data on cluster environments. This model fits
into the inverted index search context because of its key-value architecture. Thus, the aim of this
work is to analyses the distributed processing tools that implement the MapReduce concept over
a similarity search problem that relies on an inverted index. The results shows how different
frameworks behave under several test scenarios.