Artículo de revista
Near neighbor searching with K nearest references
Fecha
2015Registro en:
Information Systems51(2015)43–61
0306-4379
DOI: 10.1016/j.is.2015.02.001
Autor
Chávez, E.
Graff, M.
Navarro, Gonzalo
Téllez, E. S.
Institución
Resumen
Proximity searching is the problem of retrieving,from a given database, those objects
closest to a query.To avoid exhaustive searching,data structures called indexes are builton
the database prior to serving queries.The curse of dimensionality is a well-known problem
for indexes:in spaces with sufficiently concentrated distance histograms,no index
out performs an exhaustives can of the database.
In recent years,a number of index es for approximate proximity searching have been
proposed. These are able to cope with the curse of dimensionality inexchangefor
returning an answer that might beslightly different from the correctone.
In this paper we show that many of those recent indexes can be understood as variants
of a simple general model basedon K-nearest reference signatures.A set o freferences is
chosen from the database,and the signature of each object consists of the K references
nearest to the object. At query time,the signature of the query is computed and the search
examines only the objects whose signature is close enough to that of the query.
Many known and novelindexes are obtained by considering different ways to
determine how much detail the signature records(e.g.,justthesetofnearestreferences,
or also their proximity order to the object, oral so their distances to the object,and soon),
how the similarity between signatures is defined,and how the parameters are tuned.In
addition,we introduce a space-efficient representation for those families of indexes,
making it possible to search very large databases in main memory.Small indexes are
cache friendly,inducing faster queries.
We perform exhaustive experiments comparing several known and new indexes that
derive from our framework,evaluatingtheir time performance,memory usage,and
quality of approximation.The best indexes out perform the state of the art,offering an
attractive balance between all these aspects,and turn out to be excellent choices in many
scenarios.Our framework gives high flexibility to design new indexes.