Actas de congresos
Compact distance histogram: a novel structure to boost k-nearest neighbor queries
Fecha
2015-06Registro en:
International Conference on Scientific and Statistical Database Management, 27th, 2015, La Jolla.
9781450337090
Autor
Bêdo, Marcos Vinícius Naves
Kaster, Daniel S.
Traina, Agma Juci Machado
Traina Junior, Caetano
Institución
Resumen
The k-Nearest Neighbor query (k-NNq) is one of the most useful similarity queries. Elaborated k-NNq algorithms depend on an initial radius to prune regions of the search space that cannot contribute to the answer. Therefore, estimating a suitable starting radius is of major importance to accelerate k-NNq execution. This paper presents a new technique to estimate a tight initial radius. Our approach, named CDH-kNN, relies on Compact Distance Histograms (CDHs), which are pivot-based histograms defined as piecewise linear functions. Such structures approximate the distance distribution and are compressed according to a given constraint, which can be a desired number of buckets and/or a maximum allowed error. The covering radius of a k-NNq is estimated based on the relationship between the query element and the CDHs' joint frequencies. The paper presents a complete specification of CDH-kNN, including CDH's construction and radii estimation. Extensive experiments on both real and synthetic datasets highlighted the efficiency of our approach, showing that it was up to 72% faster than existing algorithms, outperforming every competitor in all the setups evaluated. In fact, the experiments showed that our proposal was just 20% slower than the theoretical lower bound.