dc.creatorBorin
dc.creatorEdson; Devloo
dc.creatorPhilippe R. B.; Vieira
dc.creatorGilvan S.; Shauer
dc.creatorNathan
dc.date2015-JUN
dc.date2016-06-07T13:16:02Z
dc.date2016-06-07T13:16:02Z
dc.date.accessioned2018-03-29T01:36:39Z
dc.date.available2018-03-29T01:36:39Z
dc.identifier
dc.identifierAccelerating Engineering Software On Modern Multi-core Processors. Elsevier Sci Ltd, v. 84, p. 77-84 JUN-2015.
dc.identifier0965-9978
dc.identifierWOS:000353008100009
dc.identifier10.1016/j.advengsoft.2014.12.003
dc.identifierhttp://www.sciencedirect.com/science/article/pii/S0965997814002038
dc.identifierhttp://repositorio.unicamp.br/jspui/handle/REPOSIP/241983
dc.identifier.urihttp://repositorioslatinoamericanos.uchile.cl/handle/2250/1305681
dc.descriptionConselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
dc.descriptionCoordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
dc.descriptionFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
dc.descriptionRecent multi-core designs migrated from Symmetric Multi Processing to cache coherent Non Uniform Memory Access architectures. In this paper we discuss performance issues that arise when designing parallel Finite Element programs for a 64-core ccNUMA computer and explore solutions for these issues. We first present the overview of the computer architecture and show that highly parallel code that does not take into account the aspects of the system memory organization scales poorly, achieving only 2.8x speedup when running with 64 threads. Then, we discuss how we identified the sources of overhead and evaluate three possible solutions for the problem. We show that the first solution does not require the application's code to be modified, however, the speedup achieved is only 10.6x. The second solution enables the performance to scale up to 30.9x, however, it requires the programmer to manually schedule threads and allocate related data on local CPUs and memory banks and rely on ccNUMA aware libraries that are not portable across operating systems. Also, we propose and evaluate "copy-on-thread", an alternative solution that enables the performance to scale up to 25.5x without relying on specialized libraries nor requiring specific data allocation and thread scheduling. Finally, we argue that the issues reported only happen for large data sets and conclude the paper with recommendations to help programmers to design algorithms and programs that perform well on such kind of machine. (C) 2014 Civil-Comp Ltd. and Elsevier Ltd. All rights reserved.
dc.description84
dc.description
dc.description
dc.description77
dc.description84
dc.descriptionConselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
dc.descriptionCoordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
dc.descriptionFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
dc.descriptionANP/Petrobras
dc.descriptionConselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)
dc.descriptionCoordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
dc.descriptionFundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
dc.description
dc.description
dc.description
dc.languageen
dc.publisherELSEVIER SCI LTD
dc.publisher
dc.publisherOXFORD
dc.relationADVANCES IN ENGINEERING SOFTWARE
dc.rightsembargo
dc.sourceWOS
dc.subjectComputer Science, Interdisciplinary Applications
dc.subjectComputer Science, Software Engineering
dc.subjectEngineering, Multidisciplinary
dc.titleAccelerating Engineering Software On Modern Multi-core Processors
dc.typeArtículos de revistas


Este ítem pertenece a la siguiente institución