-
All computingsystems, from mobile to supercomputers, are becoming heterogeneous, massivelyparallel computers for higher power efficiency and computationthroughput. While the computing community is racing to build tools andlibraries to ease the use of these systems, effective and confidentuse of these systems will always require knowledge about low-levelprogramming in these systems. This course is designed for students tolearn the essence of low-level programming interfaces and how to use theseinterfaces to achieve application goals. CUDA C, with its good balance betweenuser control and verboseness, will serve as the teaching vehicle for the firsthalf of the course. Students will then extend their learning into closelyrelated programming interfaces such as OpenCL, OpenACC, and C++AMP.
The course is unique in that it is application oriented and only introduces thenecessary underlying computer science and computer engineering knowledge forunderstanding. It covers the concept of data parallel execution models,memory models for managing locality, tiling techniques for reducing bandwidthconsumption, parallel algorithm patterns, overlapping computation withcommunication, and a variety ofheterogeneous parallel programming interfaces. The concepts learned in thiscourse form a strong foundation for learning other types of parallelprogramming systems.
Overview
Syllabus
-
- Week One: Introduction to Heterogeneous Computing, Overview of CUDA C, and Kernel-Based Parallel Programming, with lab tou and programming assignment of vector addition in CUDA C.
- Week Two: Memory Model for Locality, Tiling for Conserving Memory Bandwidth, Handling Boundary Conditions, and Performance Considerations, with programming assignment of simple matrix-matrix multiplicatio in CUDA C.
- Week Three: Parallel Convolution Pattern, with programming assignment of tiled matrix-matrix multiplication in CUDA C.
- Week Four: Parallel Scan Pattern, with programming assignment of parallel convolution in CUDA C.
- Week Five: Parallel Histogram Pattern and Atomic Operations, with programming assignment of parallel scan in CUDA C.
- Week Six: Data Transfer and Task Parallelism, with programming assignment of parallel histogram in CUDA C.
- Week Seven: Introduction to OpenCL, Introduction to C++AMP, Introduction to OpenACC, with programming assignment of vector addition using streams in CUDA C.
- Week Eight: Course Summary, Other Related Programming Models –Thrust, Bolt, and CUDA FORTRAN, with programming assignment of simple matrix-matrix multiplication in choice of OpenCL, C++AMP, or OpenACC.
- Week Nine: complete any emaining lab assignments, with optional, bonus programming assignments in choice of OpenCL, C++AMP, or OpenACC.