Accelerating Engineering Software On Modern Multi-core Processors

Borin; Edson; Devloo; Philippe R. B.; Vieira; Gilvan S.; Shauer; Nathan

dc.creator	Borin
dc.creator	Edson; Devloo
dc.creator	Philippe R. B.; Vieira
dc.creator	Gilvan S.; Shauer
dc.creator	Nathan
dc.date	2015-JUN
dc.date	2016-06-07T13:16:02Z
dc.date	2016-06-07T13:16:02Z
dc.date.accessioned	2018-03-29T01:36:39Z
dc.date.available	2018-03-29T01:36:39Z
dc.identifier
dc.identifier	Accelerating Engineering Software On Modern Multi-core Processors. Elsevier Sci Ltd, v. 84, p. 77-84 JUN-2015.
dc.identifier	0965-9978
dc.identifier	WOS:000353008100009
dc.identifier	10.1016/j.advengsoft.2014.12.003
dc.identifier	http://www.sciencedirect.com/science/article/pii/S0965997814002038
dc.identifier	http://repositorio.unicamp.br/jspui/handle/REPOSIP/241983
dc.identifier.uri	http://repositorioslatinoamericanos.uchile.cl/handle/2250/1305681
dc.description	Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
dc.description	Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
dc.description	Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
dc.description	Recent multi-core designs migrated from Symmetric Multi Processing to cache coherent Non Uniform Memory Access architectures. In this paper we discuss performance issues that arise when designing parallel Finite Element programs for a 64-core ccNUMA computer and explore solutions for these issues. We first present the overview of the computer architecture and show that highly parallel code that does not take into account the aspects of the system memory organization scales poorly, achieving only 2.8x speedup when running with 64 threads. Then, we discuss how we identified the sources of overhead and evaluate three possible solutions for the problem. We show that the first solution does not require the application's code to be modified, however, the speedup achieved is only 10.6x. The second solution enables the performance to scale up to 30.9x, however, it requires the programmer to manually schedule threads and allocate related data on local CPUs and memory banks and rely on ccNUMA aware libraries that are not portable across operating systems. Also, we propose and evaluate "copy-on-thread", an alternative solution that enables the performance to scale up to 25.5x without relying on specialized libraries nor requiring specific data allocation and thread scheduling. Finally, we argue that the issues reported only happen for large data sets and conclude the paper with recommendations to help programmers to design algorithms and programs that perform well on such kind of machine. (C) 2014 Civil-Comp Ltd. and Elsevier Ltd. All rights reserved.
dc.description	84
dc.description
dc.description
dc.description	77
dc.description	84
dc.description	Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
dc.description	Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
dc.description	Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
dc.description	ANP/Petrobras
dc.description	Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
dc.description	Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
dc.description	Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
dc.description
dc.description
dc.description
dc.language	en
dc.publisher	ELSEVIER SCI LTD
dc.publisher
dc.publisher	OXFORD
dc.relation	ADVANCES IN ENGINEERING SOFTWARE
dc.rights	embargo
dc.source	WOS
dc.subject	Computer Science, Interdisciplinary Applications
dc.subject	Computer Science, Software Engineering
dc.subject	Engineering, Multidisciplinary
dc.title	Accelerating Engineering Software On Modern Multi-core Processors
dc.type	Artículos de revistas

Este ítem pertenece a la siguiente institución

Universidade Estadual de Campinas (Brasil)