Artículo de revista
A simple, efficient, parallelizable algorithm for approximated nearest neighbors
Fecha
2018Registro en:
CEUR Workshop Proceedings, Volumen 2100, 2018
16130073
Autor
Ferrada, Sebastián
Bustos, Benjamin
Reyes, Nora
Institución
Resumen
The use of the join operator in metric spaces leads to what is
known as a similarity join, where objects of two datasets are paired if they
are somehow similar. We propose an heuristic that solves the 1-NN selfsimilarity join, that is, a similarity join of a dataset with itself, that brings
together each element with its nearest neighbor within the same dataset.
Solving the problem using a simple brute-force algorithm requires O(n
2
)
distance calculations, since it requires to compare every element against all
others. We propose a simple divide-and-conquer algorithm that gives an
approximated solution for the self-similarity join that computes only O(n
3
2 )
distances. We show how the algorithm can be easily modified in order to
improve the precision up to 31% (i.e., the percentage of correctly found
1-NNs) and such that 79% of the results are within the 10-NN, with no
significant extra distance computations. We present how the algorithm can
be executed in parallel and prove that using Θ(
√
n) processors, the total
execution takes linear time. We end discussing ways in which the algorithm
can be improved in the future.