Cache-based Cross-iteration Coherence For Speculative Parallelization

Maximal utilization of cores in multicore architectures is key to realize the potential performance available from higher density devices. In order to achieve scalable performance, parallelization techniques rely on carefully tunning speculative architecture support, run-time environment and software-based transformations. Hardware and software mechanisms have already been proposed to address this problem. They either require deep (and risky) changes on the existing hardware and cache coherence protocols, or exhibit poor performance scalability for a range of applications. The addition of cache tags as an enabler for data versioning, recently announced by the industry (i.e. IBM BlueGene/Q), could allow a better exploitation of parallelism at the microarchitecture level. In this paper, we present an execution model that supports both DOPIPE-based speculation and traditional speculative parallelization techniques. It is based on a simple cache tagging approach for data versioning, which integrates smoothly with typical cache coherence protocols, not requiring any changes to them. Experimental results, using SPEC and PARSEC benchmarks, reveal substantial speedups in a 24-core simulated CMP, while demonstrate improved scalability when compared to a software-only approach. © 2013 IEEE.

216

225

IEEE Computer Society's Technical',ACM,Committee on Parallel Processing (TCPP),et al.,Shell India

Allen, R., Kennedy, K., (2002) Optimizing Compilers for Modern Architectures: A Dependence-based Approach, , Morgan Kaufmann Publishers Inc

(2010) A.M.D. AMD64 architecture programmers manual volume 2:System programming, , A.M.D

Bienia, C., Kumar, S., Singh, J.P., Li, K., The parsec benchmark suite: Characterization and architectural implications (2008) PACT 08: Proceedings of the 1 7th International Conference on Parallel Architectures and Compilation Techniques, pp. 72-81

Bridges, M., Vachharajani, N., Zhang, Y., Jablin, T., August, D., Revisiting the sequential programming model for multi-core (2007) MICRO '07: Proceedings of the 40th Annual IEEEIACM International Symposium on Microarchitecture, pp. 69-84

Bridges, M.J., (2008) The Velocity Compiler: Extracting Efficient Multicore Execution from Legacy Sequential Codes, , PhD thesis, Department of CS, Princeton University, Princeton, New Jersey, United States

Cintra, M., Martinez, J.F., Torreuas, J., Architectural support for scalable speculative parallelization in shared-memory mUltiprocessors (2000) ISCA '00: Proceedings of the 27th Annual International Symposium on Computer Architecture, pp. 13-24

Dagum, L., Menon, R., Open MP: An industry-standard api for shared-memory programming (1998) IEEE Computational Science and Engineering, 5 (1), pp. 46-55

Gopal, S., Vijaykumar, T.N., Smith, J.E., Sohi, G.S., Speculative versioning cache (1998) HPCA '98: Proceedings of the 4th International Symposium on High-Performance Computer Architecture, pp. 195-205

Lattner, C., Adve, V., A compilation framework for lifelong program analysis & transformation (2004) CGO 04: Proceedings of the International Symposium on Code Generation and Optimization, p. 75

Moudgal, A., Kuttanna, B., (2001) Apparatus and Method to Prevent Overwriting of Modified Cache Entries Prior to Write Back, , US Patent 6286082

Ottoni, G., Rangan, R., Stoler, A., August, D., Automatic thread extraction with decoupled software pipelining (2005) MICRO '05: Proceedings of the 38th Annual IEEEIACM International Symposium on Microarchitecture, pp. 105-118

http://d1.dropbox.comlu/5351143/proofs.pdfRaman, A., Kim, H., Mason, T.R., Jablin, T.B., August, D.I., Speculative parallelization using software multi-threaded transactions (2010) Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 65-76

Rauchwerger, L., Padua, D.A., The LRPD test: Speculative runtime paraUelization of loops with privatization and reduction parallelization (1999) IEEE Transactions on Parallel and Distributed Systems, 10 (2), pp. 160-180

Reinders, J., Intel threading building blocks (2007) O ' ReiUy

Renau, J., Strauss, K., Ceze, L., Liu, W., Sarangi, S., Tuck, J., Torrellas, J., Energy-efficient thread-level speculation (2006) IEEE Micro, 26 (1), pp. 80-91

Rundberg, P., Stenstrom, P., An all-software thread-level data dependence speculation system for mUltiprocessors (2001) Journal of InstructionLevel Parallelism, 3

http://www.spec.org, Standard Performance Evaluation CorporationSteffan, J.G., Colohan, C.B., Zhai, A., Mowry, T.C., A scalable approach to thread-level speculation (2000) ACM SIGARCH Computer Architecture News, 28 (2), pp. 1-12

http://iacoma.cs.uiuc.edu/paulsack/sescdocThies, W., Chandrasekhar, V., Amarasinghe, S., A practical approach to exploiting coarse-grained pipeline parallelism in c programs (2007) MICRO '07: Proceedings of the 40th Annual IEEEIACM International Symposium on Microarchitecture, pp. 356-369

Vachharaj Ani, N., (2008) Intelligent Speculation for Pipelined Multithreading, , PhD thesis, Department of Computer Science, Princeton University, Princeton, New Jersey, United States

Materias

Mostrar el registro completo del ítem