Actas de congresos
Hcube: A Server-centric Data Center Structure For Similarity Search
Registro en:
9780769549538
Proceedings - International Conference On Advanced Information Networking And Applications, Aina. , v. , n. , p. 82 - 89, 2013.
1550445X
10.1109/AINA.2013.139
2-s2.0-84881070332
Autor
Da Silva Villaca R.
Pasquini R.
De Paula L.B.
Magalhaaes M.F.
Institución
Resumen
The information society is facing a sharp increase in the amount of information driven by the plethora of new applications that sprouts all the time. The amount of data now circulating on the Internet is over zettabytes (ZB), resulting in a scenario defined in the literature as Big Data. In order to handle such challenging scenario, the deployed solutions rely not only on massive storage, memory and processing capacity installed in Data Centers (DC) maintained by big players all over the globe, but also on shrewd computational techniques, such as BigTable, MapReduce and Dynamo. In this context, this work presents a DC structure designed to support the similarity search. The proposed solution aims at concentrating similar data on servers physically close within a DC. It accelerates the recovery of all data related to queries performed using a primitive get(k, sim), in which k represents the query identifier, i.e., the data used as reference, and sim a similarity level. © 2013 IEEE.
82 89 IEEE Technical Committee on Distributed Processing (TCDP),Technical University of Catalonia,Fukuoka Institute of Technology Gantz, J., Reinsel, D., (2010) The Digital Universe Decade - are you Ready?, , http://www.emc.com/collateral/analyst-reports/idc-digital-universe-Are- you-ready.pdf, Online, Available White, T., (2009) Hadoop: The Definitive Guide, , http://oreilly.com/catalog/9780596521981, first edition ed. M. Loukides, Ed. O'Reilly, june, [Online], Available Dean, J., Ghemawat, S., Mapreduce: Simplified data processing on large clusters (2004) OSDI, pp. 137-150 Bhatotia, P., Wieder, A., Rodrigues, R., Acar, U.A., Pasquini, R., Incoop: Mapreduce for incremental computations (2011) Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC '11, pp. 71-714. , http://doi.acm.org/10.1145/2038916.2038923, New York, NY, USA, [Online], Available Indyk, P., Motwani, R., Approximate nearest neighbors: Towards removing the curse of dimensionality (1998) STOC '98: Proceedings of the 30th Annual ACM Symposium on Theory of Computing, pp. 604-613. , NewYork, NY, USA: ACM Villaca, R., De Paula, L.B., Pasquini, R., Magalhaes, M.F., Hamming DHT: An indexing system for similarity search (2012) Proceedings of the 30th Brazilian Symposium on Computer Networks and Distributed Systems, SBRC '12, , Ouro Preto, MG, Brazil: SBC Charikar, M.S., Similarity estimation techniques from rounding algorithms (2002) STOC '02: Proceedings of the 34th Annual ACM Symposium on Theory of Computing, pp. 380-388. , New York, NY, USA Guo, C., Lu, G., Li, D., Wu, H., Zhang, X., Shi, Y., Tian, C., Lu, S., Bcube: A high performance, server-centric network architecture for modular data centers (2009) SIGCOMM Comput. Commun. Rev., 39 (4), pp. 63-74. , Aug Faloutsos, C., Gray codes for partial match and range queries (1988) IEEE Trans. Software Eng, 14 (10), pp. 1381-1393 Pasquini, R., Verdi, F.L., Magalhaes, M.F., Integrating servers and networking using an xor-based flat routing mechanism in 3-cube servercentric data centers (2011) Proceedings of the 29th Brazilian Symposium on Computer Networks and Distributed Systems, SBRC '11, , Campo Grande, MS, Brazil: SBC Frank, A., Asuncion, A., (2010) UCI Machine Learning Repository, , http://archive.ics.uci.edu/ml, [Online], Available Lee, D., Park, J., Shim, J., Lee, S., Efficient filtering techniques for cosine similarity joins (2011) Information-An International Interdisciplinary Journal, 14, p. 1265 Lawder, J., (1999) The Application of Space-filling Curves to the Storage and Retrieval of Multi-dimensional Data, , Ph.D. dissertation, University of London, London, December Zhang, D., Agrawal, D., Chen, G., Tung, A.K.H., Hashfile: An efficient index structure for multimedia data (2011) Proceedings of the 2011 IEEE 27th International Conference on Data Engineering, ICDE '11, pp. 1103-1114. , Washington, DC, USA: IEEE Computer Society Jeong, J., Nang, J., (2004) An Efficient Bitmap Indexing Method for Similarity Search in High Dimensional Multimedia Databases, 2, pp. 815-818 Bhattacharya, I., Kashyap, S., Parthasarathy, S., Similarity searching in peer-to-peer databases (2005) Distributed Computing Systems, 2005 ICDCS 2005 Proceedings 25th IEEE International Conference on, pp. 329-338. , June Tang, C., Xu, Z., Mahalingam, M., Psearch: Information retrieval in structured overlays (2003) SIGCOMM Comput. Commun. Rev., 33, pp. 89-94. , January Haghani, P., Michel, S., Aberer, K., Distributed similarity search in high dimensions using locality sensitive hashing (2009) Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT '09, pp. 744-755. , New York, NY, USA: ACM Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Gruber, R.E., Bigtable: A distributed storage system for structured data (2006) Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - OSDI '06, , Berkeley, CA, USA: USENIX Association De Candia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vogels, W., Dynamo: Amazon's highly available key-value store (2007) SIGOPS Oper. Syst. Rev., 41 (6), pp. 205-220. , Oct