Toast: Automatic Tiling For Iterative Stencil Computations On Gpus

Rocha; Rodrigo C. O.; Pereira; Alyson D.; Ramos; Luiz; Goes; Luis F. W.

dc.creator	Rocha
dc.creator	Rodrigo C. O.; Pereira
dc.creator	Alyson D.; Ramos
dc.creator	Luiz; Goes
dc.creator	Luis F. W.
dc.date	2017
dc.date	abr
dc.date	2017-11-13T13:22:24Z
dc.date	2017-11-13T13:22:24Z
dc.date.accessioned	2018-03-29T05:55:10Z
dc.date.available	2018-03-29T05:55:10Z
dc.identifier	Concurrency And Computation-practice & Experience. Wiley-blackwell, v. 29, p. , 2017.
dc.identifier	1532-0626
dc.identifier	1532-0634
dc.identifier	WOS:000398717400011
dc.identifier	10.1002/cpe.4053
dc.identifier	http://onlinelibrary.wiley.com/doi/10.1002/cpe.4053/full
dc.identifier	http://repositorio.unicamp.br/jspui/handle/REPOSIP/327877
dc.identifier.uri	http://repositorioslatinoamericanos.uchile.cl/handle/2250/1364902
dc.description	The stencil pattern is important in many scientific and engineering domains, spurring great interest from researchers and industry. In recent years, various optimizations have been proposed for parallel stencil applications running on graphics processing units (GPUs). In particular, tiling is a technique that can significantly enhance application performance by improving data locality and by reducing the volume of communication between host memory and GPU. In addition, tiling enables stencil applications to process inputs that are larger than the physical GPU memory. However, implementing tiling efficiently is complex, time-consuming, and error-prone. In this paper, we propose transparently optimized automatic stencil tiling (TOAST), an automatic tiling mechanism for iterative stencil computations running on GPUs; TOAST has 3 main benefits: (1) It incorporates an optimization model that seeks to maximize data reuse within tiles while respecting the amount of dynamically available GPU memory; (2) it offers a virtualized GPU memory for stencil computations, allowing for large input data; and (3) it performs optimal tiling transparently to the developer of the parallel stencil application. The current implementation of TOAST augments the PSkel framework with an internal solver based on genetic algorithms. Our experimental results show that TOAST improves the performance of iterative stencil applications by up to 13 x compared with their multithreaded (central processing unit-based) optimized versions and up to 48 x compared with a naive tiling approach on GPU. The TOAST mechanism is able to automatically achieve a low percentual overhead of data management compared with actual stencil computation.
dc.description	29
dc.description	8
dc.language	English
dc.publisher	Wiley-Blackwell
dc.publisher	Hoboken
dc.relation	Concurrency and Computation-Practice & Experience
dc.rights	fechado
dc.source	WOS
dc.subject	Autotuning
dc.subject	Gpu
dc.subject	Optimization Model
dc.subject	Parallel Skeletons
dc.subject	Stencil Computation
dc.subject	Tiling
dc.title	Toast: Automatic Tiling For Iterative Stencil Computations On Gpus
dc.type	Artículos de revistas

Este ítem pertenece a la siguiente institución

Universidade Estadual de Campinas (Brasil)