On The Energy Efficiency And Performance Of Irregular Application Executions On Multicore, Numa And Manycore Platforms

Francesquini E.; Castro M.; Penna P.H.; Dupros F.; Freitas H.C.; Navaux P.O.A.; Mehaut J.-F.

dc.creator	Francesquini E.
dc.creator	Castro M.
dc.creator	Penna P.H.
dc.creator	Dupros F.
dc.creator	Freitas H.C.
dc.creator	Navaux P.O.A.
dc.creator	Mehaut J.-F.
dc.date	2015
dc.date	2015-06-25T12:51:41Z
dc.date	2015-11-26T14:24:35Z
dc.date	2015-06-25T12:51:41Z
dc.date	2015-11-26T14:24:35Z
dc.date.accessioned	2018-03-28T21:26:52Z
dc.date.available	2018-03-28T21:26:52Z
dc.identifier
dc.identifier	Journal Of Parallel And Distributed Computing. Academic Press Inc., v. 76, n. , p. 32 - 48, 2015.
dc.identifier	7437315
dc.identifier	10.1016/j.jpdc.2014.11.002
dc.identifier	http://www.scopus.com/inward/record.url?eid=2-s2.0-84924246803&partnerID=40&md5=cb4acee9b0a265843504577eb9e22468
dc.identifier	http://www.repositorio.unicamp.br/handle/REPOSIP/85280
dc.identifier	http://repositorio.unicamp.br/jspui/handle/REPOSIP/85280
dc.identifier	2-s2.0-84924246803
dc.identifier.uri	http://repositorioslatinoamericanos.uchile.cl/handle/2250/1245463
dc.description	Until the last decade, performance of HPC architectures has been almost exclusively quantified by their processing power. However, energy efficiency is being recently considered as important as raw performance and has become a critical aspect to the development of scalable systems. These strict energy constraints guided the development of a new class of so-called light-weight manycore processors. This study evaluates the computing and energy performance of two well-known irregular NP-hard problems-the Traveling-Salesman Problem (TSP) and K-Means clustering-and a numerical seismic wave propagation simulation kernel-Ondes3D-on multicore, NUMA, and manycore platforms. First, we concentrate on the nontrivial task of adapting these applications to a manycore, specifically the novel MPPA-256 manycore processor. Then, we analyze their performance and energy consumption on those different machines. Our results show that applications able to fully use the resources of a manycore can have better performance and may consume from 3.8 × to 13 × less energy when compared to low-power and general-purpose multicore processors, respectively.
dc.description	76
dc.description
dc.description	32
dc.description	48
dc.description	Andreolli, C., Thierry, P., Borges, L., Yount, C., Skinner, G., Genetic algorithm based auto-tuning of seismic applications on multi and manycore computers (2014) EAGE Workshop on High Performance Computing for Upstream, Amsterdam, Netherlands, , http://dx.doi.org/10.3997/2214-4609.20141920, September
dc.description	Aochi, H., Ulrich, T., Ducellier, A., Dupros, F., Michea, D., Finite difference simulations of seismic wave propagation for understanding earthquake physics and predicting ground motions: Advances and challenges (2013) J. Phys.: Conf. Ser., 454, p. 012010
dc.description	Aubry, P., Beaucamps, P.-E., Blanc, F., Bobin, B., Carpov, S., Cudennec, L., David, V., Sirdey, R., Extended cyclostatic dataflow program compilation and execution for an integrated manycore processor (2013) International Conference on Computational Science, ICCS, Vol. 18, pp. 1624-1633. , Elsevier Barcelona, Spain
dc.description	Boillot, L., Barucq, H., Calandra, H., Diaz, J., (Portable) task-based programming model for elastodynamics EAGE Workshop on HPC for Upstream, , Chania, Greece
dc.description	Brooks, D., Bose, P., Schuster, S.E., Power-aware microarchitecture: Design and modeling challenges for next-generation microprocessors (2000) IEEE Micro, 20, pp. 26-44
dc.description	Castro, M., Francesquini, E., Nguélé, T.M., Méhaut, J.-F., Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application (2013) Workshop on Irregular Applications: Architectures & Algorithms (IA3) - Supercomputing Conference (SC), , ACM Denver, EUA p. Article No. 5
dc.description	Collino, F., Perfectly matched absorbing layers for the paraxial equations (1997) J. Comput. Phys., 131, pp. 164-180
dc.description	Cui, Y., Olsen, K., Jordan, T., Lee, K., Zhou, J., Small, P., Roten, D., Maechling, P., Scalable earthquake simulation on petascale supercomputers High Performance Computing, Networking, Storage and Analysis, SC, 2010 International Conference, pp. 1-20
dc.description	Datta, K., Kamil, S., Williams, S., Oliker, L., Shalf, J., Yelick, K., Optimization and performance modeling of stencil computations on modern microprocessors (2009) SIAM Rev., 51, pp. 129-159
dc.description	Dhillon, I., Modha, D., A data-clustering algorithm on distributed memory multiprocessors (2000) Large-Scale Parallel Data Mining, 1759, pp. 245-260. , M. Zaki, C.-T. Ho, Lecture Notes in Computer Science Springer Berlin, Heidelberg
dc.description	Dupros, F., Aochi, H., Ducellier, A., Komatitsch, D., Roman, J., Exploiting intensive multithreading for the efficient simulation of 3D seismic wave propagation International Conference on Computational Science and Engineering, pp. 253-260. , São Paulo, Brazil
dc.description	Dupros, F., Do, H.-T., Aochi, H., On scalability issues of the elastodynamics equations on multicore platforms (2013) International Conference on Computational Science, ICCS, 18, pp. 1226-1234. , Procedia Computer Science Elsevier Barcelona, Spain
dc.description	Dupros, F., Pousa, C., Carissimi, A., Méhaut, J.-F., Parallel simulations of seismic wave propagation on NUMA architectures (2010) International Parallel Computing Conference, ParCo, 19, pp. 67-74. , Advances in Parallel Computing IOS Press Lyon, France
dc.description	Fleig, T., Mattes, O., Karl, W., Evaluation of adaptive memory management techniques on the Tilera TILE-Gx platform (2014) International Conference on Architecture of Computing Systems, ARCS, pp. 88-96. , VDE Verlag Luebeck, Deutschland
dc.description	Furumura, T., Chen, L., Parallel simulation of strong ground motions during recent and historical damaging earthquakes in Tokyo, Japan (2005) Parallel Comput., 31, pp. 149-165. , Parallel Graphics and Visualization
dc.description	Gharaibeh, A., Santos-Neto, E., Costa, L.B.A., Ripeanu, M., The energy case for graph processing on hybrid CPU and GPU systems (2013) Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms, pp. 21-28. , IA3'13 ACM New York, NY, USA
dc.description	Göddeke, D., Komatitsch, D., Geveler, M., Ribbrock, D., Rajovic, N., Puzovic, N., Ramirez, A., Energy efficiency vs. Performance of the numerical solution of PDEs: An application study on a low-Power ARM-Based cluster (2013) J. Comput. Phys., 237, pp. 132-150
dc.description	Gursoy, A., Data decomposition for parallel k-means clustering (2004) Parallel Processing and Applied Mathematics, 3019, pp. 241-248. , R. Wyrzykowski, J. Dongarra, M. Paprzycki, J. Was̈niewski, Lecture Notes in Computer Science Springer Berlin, Heidelberg
dc.description	Hähnel, M., Döbel, B., Völp, M., Härtig, H., Measuring energy consumption for short code paths using RAPL (2012) ACM SIGMETRICS Perform. Eval. Rev., 40, pp. 13-17
dc.description	Jain, A.K., Dubes, R.C., (1988) Algorithms for Clustering Data, , Prentice-Hall, Inc. Upper Saddle River, NJ, USA
dc.description	Kanungo, T., Mount, D., Netanyahu, N., Piatko, C., Silverman, R., Wu, A., An efficient k-means clustering algorithm: Analysis and implementation (2002) IEEE Trans. Pattern Anal. Mach. Intell., 24, pp. 881-892
dc.description	Kaufman, L., Rousseeuw, P.J., (1990) Finding Groups in Data: An Introduction to Cluster Analysis, , John Wiley and Sons New York
dc.description	Laporte, G., The traveling salesman problem: An overview of exact and approximate algorithms (1992) European J. Oper. Res., 59, pp. 231-247
dc.description	Larus, J., Spending Moore's dividend (2009) Commun. ACM, 52, pp. 62-69
dc.description	De Dinechin, B.D., De Massas, P.G., Lager, G., Léger, C., Orgogozo, B., Reybert, J., Strudel, T., A distributed run-time environment for the Kalray MPPA-256 integrated manycore processor (2013) Intl. Conference on Computational Science, ICCS, Vol. 18, pp. 1654-1663. , Elsevier Barcelona, Spain
dc.description	Li, H., Sudarsan, H.L., Stumm, M., Sevcik, K.C., Locality and loop scheduling on NUMA multiprocessors (1993) International Conference on Parallel Processing, ICPP, Vol. 2, pp. 140-147. , IEEE Computer Society Syracuse, USA
dc.description	Love, R., Korner, K., CPU affinity (2003) Linux J., (111)
dc.description	Madariaga, R., Dynamics of an expanding circular fault (1976) Bull. Seismol. Soc. Amer., 66, pp. 639-666
dc.description	Moczo, P., Robertsson, J.O.A., Eisner, L., The finite-difference time-domain method for modeling of seismic wave propagation (2007) Advances in Wave Propagation in Heterogeneous Media, 48, pp. 421-516. , Advances in Geophysics Elsevier, Academic Press
dc.description	Morari, A., Tumeo, A., Villa, O., Secchi, S., Valero, M., Efficient sorting on the Tilera manycore architecture (2012) IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD, pp. 171-178. , IEEE Computer Society New York, USA
dc.description	Ou, Z., Pang, B., Deng, Y., Nurminen, J., Ylä-Jääski, A., Hui, P., Energy and cost-efficiency analysis of ARM-based clusters (2012) IEEE/ACM Intl. Symposium on Cluster, Cloud and Grid Computing, CCGrid, pp. 115-123. , IEEE Computer Society Ottawa, Canada
dc.description	Padoin, E.L., De Oliveira, D.A.G., Velho, P., Navaux, P., Time-to-solution and energy-to-solution: A comparison between ARM and Xeon (2012) Workshop on Applications for Multi-Core Architectures, WAMCA, pp. 48-53. , IEEE Computer Society New York, USA
dc.description	Rajovic, N., The low-power architecture approach towards exascale computing (2011) Workshop on Scalable Algorithms for Large-Scale Systems (ScalA), pp. 1-2. , ACM New York, USA
dc.description	Rao, S., Prasad, E.V., Venkateswarlu, N.B., A scalable k-means clustering algorithm on multi-core architecture (2009) Proceeding of International Conference on Methods and Models in Computer Science, pp. 1-9. , ICM2CS 2009
dc.description	Rodrigues, L., Zarate, L., Nobre, C., Freitas, H., Parallel and distributed kmeans to identify the translation initiation site of proteins Systems, Man, and Cybernetics, SMC, 2012 IEEE International Conference, pp. 1639-1645
dc.description	Rotem, E., Naveh, A., Ananthakrishnan, A., Weissmann, E., Rajwan, D., Power-management architecture of the intel microarchitecture code-named sandy bridge (2012) IEEE Micro, 32, pp. 20-27
dc.description	Tesser, R.K., Pilla, L.L., Dupros, F., Navaux, P.O.A., Méhaut, J.-F., Mendes, C., Improving the performance of seismic wave simulations with dynamic load balancing (2014) Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP, pp. 196-203. , IEEE Computer Society Turin, Italy
dc.description	Totoni, E., Behzad, B., Comparing the power and performance of Intel's SCC to state-of-the-art CPUs and GPUs (2012) IEEE Intl. Symposium on Performance Analysis of Systems and Software, ISPASS, pp. 78-87. , IEEE Computer Society New Brunswick, Canada
dc.description	Xu, R., Wunsch, I.I.D., Survey of clustering algorithms (2005) IEEE Trans. Neural Netw., 16, pp. 645-678
dc.language	en
dc.publisher	Academic Press Inc.
dc.relation	Journal of Parallel and Distributed Computing
dc.rights	fechado
dc.source	Scopus
dc.title	On The Energy Efficiency And Performance Of Irregular Application Executions On Multicore, Numa And Manycore Platforms
dc.type	Artículos de revistas

Este ítem pertenece a la siguiente institución

Universidade Estadual de Campinas (Brasil)