Tutorial: Accelerator Programming with OpenACC and CUDA
Abstract: Tutorial scope:
Full day
Beginning through intermediate material
o Jon Urbanic PSC, Introduction to OpenACC
o Lars Koesterke TACC, Intermediate CUDA
This full day tutorial will address the 2 main programming models in use today for adapting HPC codes to effectively use GPU accelerators. The two half day sessions will share some common techniques for achieving best performance with an accelerator. The included accelerator-tutorial.pdf file contains draft material that will be reworked before the final tutorial date.
Introduction to OpenACC:
The Intro to OpenACC session will cover the newest programming model for accelerators based on OpenMP-like directives. While there are no prerequisites, reading or skimming one of the many introductory CUDA tutorials would be helpful, and a working knowledge of C or Fortran is required.Students may participate in hands-on programming sessions hosted on a cluster at PSC. Topics and
hands-on sessions will include:
-Welcome/Intro to the Environment (10 mins)
-Parallel Computing Overview (10 mins)
-Introduction to OpenACC (2 hrs)
-OpenACC with CUDA Libraries (30 mins)
Intermediate CUDA:
This portion of the tutorial will cover intermediate programming techniques and performance tools/tips for CUDA programmers. There are many introductory materials for the beginning CUDA programmer and one or more of these is considered a prerequisite for this portion of the tutorial.
-http://developer.nvidia.com/cuda-education-training
The intermediate CUDA material will be presented in lecture-discussion style using well-developed example codes and walkthroughs. Students will receive working sample code and routines they can incorporate into their own HPC projects. Topics and examples will include:
- CUDA Fortran
- CUDA C
- optimizing Shared Memory use in the accelerator
- using streams to overlap communication and computation on the accelerator
- MPI and accelerators with CUDA
- driving multiple GPUs per process
- Optimizing cpu core – gpu device affinity for maximum memory bandwidth
-- [http://www.ncsa.illinois.edu/UserInfo/Resources/Hardware/DellNVIDIACluster/Doc/Architecture.html#Affinity]