GPU Architectures and Programming

MS, Summer Semester, 2024

Module Gerneral Information

The module consists of lectures and a semester-long problem-based project.

  • Level: MS
  • Credit points: 6
  • SWS: 4
  • Instructor: Sohan Lal
  • Time and Location: Lecture on Thursday 11:15 - 12:45 in H - 0.02, Project/Lab Meeting on Monday 10:00 - 12:00 in CIP/E - 2.027P3d

Desirable Previous Knowledge

An introductory module on basic computer architecture, and good programming skills in C/C++.

Lectures Outline

  • Review of computer architecture basics - measuring performance, benchmarks, five-stage RISC pipeline, caches
  • GPU basics - evolution of GPU computing, a high-level overview of a GPU architecture
  • GPU programming with CUDA - program structure, CUDA threads organization, warp/thread-block scheduling
  • GPU (micro) architecture - streaming multiprocessors, single instruction multiple threads (SIMT) core design, tensor/RT cores, mixed-precision support
  • GPU memory hierarchy - banked register file and operand collectors, shared memory, GPU caches (differences w.r.t. CPU caches), global memory
  • Branch and memory divergence - branch handling, stack-based reconvergence, memory coalescing, coalescer design
  • Barriers and synchronization
  • Temporal and spatial locality exploitation challenges in GPU caches
  • Global memory- high throughput requirements, GDDR/HBM, memory bandwidth optimization techniques
  • GPU research issues - performance bottlenecks, GPU power modeling, high-power consumption/energy efficiency, GPU security
  • Application case study - deep learning
  • Cycle accurate simulators for GPUs

Semester-long Problem-based Project (Lab Assignments)

The learning in the lectures will be augmented by a semester-long problem-based project.

Please find below a brief outline of the plan for the problem-based project:

  • Several topics related to GPU architecture will be proposed
  • Initial discussion and group formation
  • Topic finalization
  • Weekly meetings for problems and progress discussion
  • Work in groups possible (2 – 4 students depending upon total students)

Course Evaluation

  • Project + 30 minutes oral exam

Course Registration and Further Information

Please register for the course on TUNE. The registration is mandatory. We will also use Stud.IP for sharing course material such as slides and further information.

Technical Infrastructure



  • David B. Kirk, Wen-mei W. Hwu, Programming Massively Parallel Processors - A Hands on Approach, Second Edition (Book)
  • David A. Patterson and John L. Hennessy, Computer Architecture: A Quantitative Approach, 5th Edition (Book)