Compiler Support For Selective Page Migration In Numa Architectures

Piccoli G.; Santos H.N.; Rodrigues R.E.; Pousa C.; Borin E.; Quintao Pereira F.M.

dc.creator	Piccoli G.
dc.creator	Santos H.N.
dc.creator	Rodrigues R.E.
dc.creator	Pousa C.
dc.creator	Borin E.
dc.creator	Quintao Pereira F.M.
dc.date	2014
dc.date	2015-06-25T17:56:47Z
dc.date	2015-11-26T14:46:19Z
dc.date	2015-06-25T17:56:47Z
dc.date	2015-11-26T14:46:19Z
dc.date.accessioned	2018-03-28T21:55:58Z
dc.date.available	2018-03-28T21:55:58Z
dc.identifier	9781450328098
dc.identifier	Parallel Architectures And Compilation Techniques - Conference Proceedings, Pact. Institute Of Electrical And Electronics Engineers Inc., v. , n. , p. 369 - 380, 2014.
dc.identifier	1089795X
dc.identifier	10.1145/2628071.2628077
dc.identifier	http://www.scopus.com/inward/record.url?eid=2-s2.0-84907065294&partnerID=40&md5=9f83e9837ff673804f3024648fe23416
dc.identifier	http://www.repositorio.unicamp.br/handle/REPOSIP/87117
dc.identifier	http://repositorio.unicamp.br/jspui/handle/REPOSIP/87117
dc.identifier	2-s2.0-84907065294
dc.identifier.uri	http://repositorioslatinoamericanos.uchile.cl/handle/2250/1252794
dc.description	Current high-performance multicore processors provide users with a non-uniform memory access model (NUMA). These systems perform better when threads access data on memory banks next to the core where they run. However, ensuring data locality is difficult. In this paper, we propose compiler analyses and code generation methods to support a lightweight runtime system that dynamically migrates memory pages to improve data locality. Our technique combines static and dynamic analyses and is capable of identifying the most promising pages to migrate. Statically, we infer the size of arrays, plus the amount of reuse of each memory access instruction in a program. These estimates rely on a simple, yet accurate, trip count predictor of our own design. This knowledge lets us build templates of dynamic checks, to be filled with values known only at runtime. These checks determine when it is profitable to migrate data closer to the processors where this data is used. Our static analyses are quadratic on the number of variables in a program, and the dynamic checks are O(1) in practice. Our technique does not require any form of user intervention, neither the support of a third-party middleware, nor modifications in the operating system's kernel. We have applied our technique on several parallel algorithms, which are completely oblivious to the asymmetric memory topology, and have observed speedups of up to 4x, compared to static heuristics. We compare our approach against Minas, a middleware that supports NUMA-aware data allocation, and show that we can outperform it by up to 50% in some cases. © 2014 ACM.
dc.description
dc.description
dc.description	369
dc.description	380
dc.description	ACM SIGARCH,IEEE Computer Society,IFIP
dc.description	Appel, A.W., Palsberg, J., (2002) Modern Compiler Implementation in Java, 2nd Ed., , Cambridge University Press
dc.description	Awasthi, M., Nellans, D.W., Sudan, K., Balasubramonian, R., Davis, A., Handling the problems and opportunities posed by multiple on-chip memory controllers (2010) PACT. ACM, pp. 319-330
dc.description	Basu, S., Pollack, R., Roy, M.-F., (2006) Algorithms in Real Algebraic Geometry, , Springer
dc.description	Blagodurov, S., Zhuravlev, S., Fedorova, A., Kamali, A., A case for NUMA-aware contention management on multicore systems (2010) PACT. ACM, pp. 557-558
dc.description	Borin, E., Devloo, P., Programming finite element methods for ccNUMA processors (2013) Int. Conference on Parallel, Distributed, Grid and Cloud Computing for Engineering, , Civil-Comp Press
dc.description	Broquedis, F., Furmento, N., Goglin, B., Wacrenier, P.-A., Namyst, R., ForestGOMP: An efficient OpenMP environment for NUMA architectures (2010) Inter. J. Parallel Programming, 38 (5-6), pp. 418-439
dc.description	Broquedis, F., Clet-Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R., hwloc: A generic framework for managing hardware affinities in HPC applications (2010) PDP. IEEE, pp. 180-186
dc.description	Castro, M., Góes, L.F.W., Ribeiro, C.P., Cole, M., Cintra, M., Méhaut, J.-F., A machine learning-based approach for thread mapping on transactional memory applications High Performance Computing Conference (HiPC). Bangalore, India: IEEE, 2011, pp. 1-10
dc.description	Chatterjee, S., Parker, E., Hanlon, P.J., Lebeck, A.R., Exact analysis of the cache behavior of nested loops (2001) PLDI. ACM, pp. 286-297
dc.description	Cruz, E., Diener, M., Navaux, P., Using the translation lookaside buffer to map threads in parallel applications based on shared memory IPDPS, 2012, pp. 532-543
dc.description	Cytron, R., Ferrante, J., Rosen, B.K., Wegman, M.N., Zadeck, F.K., Efficiently computing static single assignment form and the control dependence graph (1991) TOPLAS, 13 (4), pp. 451-490
dc.description	Diener, M., Cruz, E., Navaux, P., Communication-based mapping using shared pages IPDPS, 2013, pp. 700-711
dc.description	Dijkstra, E.W., A note on two problems in connexion with graphs (1959) Numerische Mathematik, 1, pp. 269-271
dc.description	Dupros, F., Ribeiro, C.P., Carissimi, A., Méhaut, J.-F., Parallel simulations of seismic wave propagation on NUMA architectures (2009) PARCO. IOS, pp. 67-74
dc.description	Dykema, G.L., Bassett, D.H., Lach, J.L., Mechanisms for synchronizing data transfers between non-uniform memory architecture computers (2012), 8 (244), p. 930. , US PatentFerrante, J., Ottenstein, J., Warren, D., The program dependence graph and its use in optimization (1987) TOPLAS, 9 (3), pp. 319-349
dc.description	Goglin, B., Furmento, N., Enabling high-performance memory-migration in Linux for multithreaded applications (2009) MTAAP. IEEE
dc.description	Joyner, D., Čertík, O., Meurer, A., Granger, B.E., Open source computer algebra systems: SymPy Commun. Comput. Algebra, 45 (3-4), pp. 225-234+2012. , ACM
dc.description	Kale, L.V., Bhatele, A., (2013) Parallel Science and Engineering Applications: The Charm++ Approach, , Taylor & Francis Group, CRC Press, Nov
dc.description	Lameter, C., An overview of non-uniform memory access (2013) Commun. ACM, 56 (9), pp. 59-154
dc.description	Lattner, C., Adve, V.S., LLVM: A compilation framework for lifelong program analysis & transformation (2004) CGO. IEEE, pp. 75-88
dc.description	Li, Y., Melhem, R., Abousamra, A., Jones, A., Compiler-assisted data distribution for chip multiprocessors (2010) PACT. ACM, pp. 501-512
dc.description	Löf, H., Holmgren, S., Affinity-on-next-touch: Increasing the performance of an industrial PDE solver on a cc-NUMA system (2005) ICS. ACM, pp. 387-392
dc.description	Pilla, L.L., Ribeiro, C.P., Coucheney, P., Broquedis, F., Gaujal, B., Navaux, P.O., Méhaut, J.-F., A topology-aware load balancing algorithm for clustered hierarchical multi-core machines Future Generation Computer Systems, 30, pp. 191-201+2014. , no. 0
dc.description	Ribeiro, C.P., (2011) Contributions on Memory Affinity Management for Hierarchical Shared Memory Multi-core Platforms, , Ph.D. dissertation, University of Grenoble
dc.description	Ribeiro, C.P., Méhaut, J.-F., Carissimi, A., Memory affinity management for numerical scientific applications over multi-core multiprocessors with hierarchical memory IPDPS Workshops. IEEE, 2010, pp. 1-4
dc.description	Tang, L., Mars, J., Zhang, X., Hagmann, R., Hundt, R., Tune, E., Optimizing Google's warehouse scale computers: The NUMA experience (2013) HPCA. IEEE
dc.description	Tikir, M.M., Hollingsworth, J.K., Using hardware counters to automatically improve memory performance (2004) Supercomputing. IEEE, pp. 46-46
dc.description	Wittmann, M., Hager, G., (2011) Optimizing CcNUMA Locality for Task-parallel Execution under OpenMP and TBB on Multicore-based Systems
dc.description	Wolf, M.E., Lam, M.S., A data locality optimizing algorithm (1991) PLDI. ACM, pp. 30-44
dc.description	Wolfe, M., (1996) High Performance Compilers for Parallel Computing, 1st Ed., , Adison-Wesley
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers Inc.
dc.relation	Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
dc.rights	fechado
dc.source	Scopus
dc.title	Compiler Support For Selective Page Migration In Numa Architectures
dc.type	Actas de congresos

Este ítem pertenece a la siguiente institución

Universidade Estadual de Campinas (Brasil)