Actas de congresos
Large-scale Distributed Locality-sensitive Hashing For General Metric Data
Registro en:
9783319119878
Lecture Notes In Computer Science (including Subseries Lecture Notes In Artificial Intelligence And Lecture Notes In Bioinformatics). Springer Verlag, v. 8821, n. , p. 82 - 93, 2014.
3029743
10.1007/978-3-319-11988-5_8
2-s2.0-84911084388
Autor
Silva E.
Teixeira T.
Teodoro G.
Valle E.
Institución
Resumen
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) Locality-Sensitive Hashing (LSH) is extremely competitive for similarity search, but works under the assumption of uniform access cost to the data, and for just a handful of dissimilarities for which locality-sensitive families are available. In this work we propose Parallel Voronoi LSH, an approach that addresses those two limitations of LSH: it makes LSH efficient for distributedmemory architectures, and it works for very general dissimilarities (in particular, it works for all metric dissimilarities). Each hash table of Voronoi LSH works by selecting a sample of the dataset to be used as seeds of a Voronoi diagram. The Voronoi cells are then used to hash the data. Because Voronoi diagrams depend only on the distance, the technique is very general. Implementing LSH in distributed-memory systems is very challenging because it lacks referential locality in its access to the data: if care is not taken, excessive message-passing ruins the index performance. Therefore, another important contribution of this work is the parallel design needed to allow the scalability of the index, which we evaluate in a dataset of a thousand million multimedia features. 8821
82 93 CAPES; São Paulo Research Foundation; CNPq; São Paulo Research Foundation; 2010/05647-4; FAPESP; São Paulo Research Foundation Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L., (2001) Searching in Metric Spaces, 33 (3), pp. 273-321. , September Akune, F., Valle, E., Torres, R., MONORAIL: A Disk-Friendly Index for Huge Descriptor Databases (2010) 20Th Int. Conf. On Pattern Recognition, pp. 4145-4148. , IEEE (August Indyk, P., Motwani, R., Approximate nearest neighbors: Towards removing the curse of dimensionality (1998) Proc. Of 13Th Ann. ACM Symp. On Theory Of Comp, pp. 604-613 Gionis, A., Indyk, P., Motwani, R., Similarity search in high dimensions via hashing (1999) Proc. Of the 25Th Int. Conf. On Very Large Data Bases, pp. 518-529 Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S., Locality-sensitive hashing scheme based on p-stable distributions (2004) Proc. Of the 20Th Ann. Symp. On Computational Geometry, 253p Paulevé, L., Jégou, H., Amsaleg, L., (2010) Locality Sensitive Hashing: A Comparison of Hash Function Types and Querying Mechanisms, 31 (11), pp. 1348-1358. , August Kang, B., Jung, K., Robust and Efficient Locality Sensitive Hashing for Nearest Neighbor Search in Large Data Sets (2012) NIPS Workshop on Big Learning (Biglearn, pp. 1-8. , Lake Tahoe, Nevada Tellez, E.S., Chavez, E., On locality sensitive hashing in metric spaces (2010) Proc. Of the Third Int. Conf. On Similarity Search and Applications, pp. 67-74. , SISAP 2010, ACM, New York Zezula, P., Amato, G., Dohnal, V., Batko, M., Similarity Search: The Metric Space Approach (2006) Advances in Database Systems, 32. , Springer Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K., Multi-probe LSH: Efficient indexing for high-dimensional similarity search (2007) Proc. Of the 33Rd Int. Conf. On Very Large Data Bases. VLDB 2007, pp. 950-961. , VLDB Endowment Joly, A., Buisson, O., A posteriori multi-probe locality sensitive hashing (2008) Proc. Of the 16Th ACM Int. Conf. On Multimedia, MM 2008, pp. 209-218. , ACM, New York Novak, D., Batko, M., Metric Index: An Efficient and Scalable Solution for Similarity Search (2009) Second Int. Workshop on Similarity Search and Applications, pp. 65-73. , IEEE Computer Society (August Novak, D., Kyselak, M., Zezula, P., On locality-sensitive indexing in generic metric spaces (2010) Proc. Of the Third Int. Conf. On Similarity Search and Applications, SISAP 2010, pp. 59-66. , ACM Press, New York Ostrovsky, R., Rabani, Y., Schulman, L., Swamy, C., The Effectiveness of Lloyd-Type Methods for the k-Means Problem (2006) Focs, pp. 165-176. , IEEE (December Arthur, D., Vassilvitskii, S., K-means++: The advantages of careful seeding (2007) Proc. Of the 18Th Annual ACM-SIAM Symp. On Discrete Algorithms, pp. 1027-1035. , SODA 2007, Philadelphia, PA, USA Kaufman, L., Rousseeuw, P.J., (1990) Finding Groups in Data: An Introduction to Cluster Analysis, , 9th edn. Wiley-Interscience, New York Paterlini, A.A., Nascimento, M.A., Junior, C.T., (2011) Using Pivots to Speed-Up K-Medoids Clustering, 2 (2), pp. 221-236. , June Park, H.S., Jun, C.H., (2009) A Simple and Fast Algorithm for K-Medoids Clustering, 36 (2), pp. 3336-3341 Figueroa, K., Navarro, G., Chávez, E., (2007) Metric Spaces Library, , http://www.sisap.org/Metric_Space:Library.html Jegou, H., Tavenard, R., Douze, M., Amsaleg, L., Searching in one billion vectors: Re-rank with source coding (2011) ICASSP, pp. 861-864. , IEEE