NEW

Introduction to CUDA Programming

Name: Introduction to CUDA Programming
Price: 220000 KRW

GPGPU is no longer an unfamiliar technology. It has long been utilized in various fields such as scientific computation, simulation, and graphics processing, and today, it has established itself as the core foundation that determines the performance of AI technology. In this context, GPU programming skills serve as a powerful tool that expands a developer's capabilities to the next level. Moving beyond CPU-centric development to directly handling large-scale parallel computing means acquiring a new way of problem-solving and broader development possibilities. This course systematically covers CUDA programming—the de facto standard of GPGPU—from the basics to practical application. The curriculum focuses on content that can be immediately applied in practice, including understanding GPU architecture, parallel programming models, memory optimization, and kernel writing. The goal is to reach a level where you can design and implement GPU-based programs on your own after completing the course.

13 learners are taking this course

Level Intermediate

Course period Unlimited

megayuchi

C++

CUDA

gpgpu

C++

CUDA

gpgpu

What you will gain after the course

CUDA Parallel Programming Capability - You will understand GPU thread structures, memory hierarchies, and kernel execution models, and be able to write CUDA kernels yourself.
Computational acceleration code tens to hundreds of times faster than CPU - You can directly verify the performance difference by writing programs that accelerate actual operations such as vector operations and matrix multiplication with a GPU.

Expanding Development Capabilities with CUDA, the Start of GPU ProgrammingAn Introductory GPGPU Course for C/C++ Developers

GPU programming is no longer exclusive to specialized fields. Today, GPUs play a core role in almost every domain, including AI, simulation, image processing, and scientific computing, and the ability to utilize them has become a powerful weapon that significantly expands a developer's competitive edge. This course is designed for developers who have experience with C/C++ but have hesitated to start because GPU programming felt unfamiliar. We cover everything from basic CUDA concepts to understanding GPU architecture, parallel programming models, memory optimization, kernel writing, stream utilization, and image processing with a practical focus. After completing this course, you will be able to design and implement GPU-based programs on your own.

3. CUDA Programming Basics

This explains the basic flow of how a CUDA program operates. It covers the process of initializing and terminating the CUDA environment, and provides a step-by-step explanation of the overall execution structure, which includes copying from host memory to device memory, kernel execution, and copying from device memory back to host memory. Additionally, it summarizes essential concepts that serve as the foundation for subsequent hands-on exercises, such as CUDA kernel invocation methods and the usage of core CUDA APIs.

4. Global Memory Coalescing

This section covers the concept of global memory coalescing, a key element of GPU performance optimization. It explains how hardware merges (coalesces) requests when threads access global memory and compares the differences between optimal and worst-case access patterns through real-world scenarios. Additionally, it outlines data layout strategies and thread configuration methods to maximize memory access performance, explaining essential optimization techniques for writing efficient CUDA kernels.

5. Thread Co-op within a Block

This covers how threads within a block can collaborate to achieve higher performance. It explains how to efficiently share data at the block level using Shared Memory, followed by techniques for collaboration between threads within a warp using warp level intrinsic. It discusses strategies for writing more optimized CUDA kernels by combining these two collaborative methods and implements the process of finding a minimum value using warp-level reduction and block-level reduction as practical examples.

6. Shared Memory - MatrixTranspose

Learn the core concepts of utilizing Shared Memory through the process of transposing a matrix in CUDA. We will examine inefficient global memory access patterns that frequently occur during transpose operations and the resulting performance degradation, then explain how to optimize memory access using Shared Memory. Additionally, we cover techniques to resolve bank conflicts that can occur in Shared Memory, providing practical strategies for effectively using Shared Memory through matrix transposition examples.

7. Shared Memory - MatrixMultiply

Following the Matrix Transpose example, this section covers how to utilize Shared Memory even more effectively through a Matrix Multiplication case study. It explains the basic structure for processing large-scale matrix multiplication in CUDA and introduces the technique of dividing large matrices into smaller tile-based sub-matrices for computation. Additionally, it compares the memory access patterns of matrix multiplication—which are similar yet different from matrix transposition—and discusses strategies to maximize performance and reduce memory access bottlenecks using Shared Memory.

11. Image Histogram

We will implement an image pixel distribution analysis tool, the histogram, using CUDA, and cover data accumulation methods in a parallel environment along with the resulting performance issues. We will examine the basic structure of calculating histograms in CUDA and explain the operating principles and performance degradation issues of atomic operations, which are essential in this process. Subsequently, we will cover optimization techniques for writing more efficient histogram calculation kernels by reducing atomic operation bottlenecks using Shared Memory and warp intrinsics.

12. CUDA-D3D12 interop

This covers how to combine the Direct3D 12 rendering pipeline with CUDA to utilize GPU graphics and GPGPU operations simultaneously. It explains how to map the Render Target and Depth Buffer of a simple D3D12 game framework as CUDA resources and synchronize the D3D12 timeline with the CUDA timeline.
The example code implements functionality that takes textures mapped as CUDA resources as input, applies various image processing techniques such as Gaussian Blur, edge detection, normal map rendering, and depth value visualization, and outputs them to the final screen.

Prerequisite Knowledge

Required
- C/C++
- Basic Windows Programming
Recommended (The following courses may be helpful.)
- Windows System Programming (https://inf.run/VciKC)
- Windows debugging tips (https://inf.run/KH5J6)

Precautions

A graphics card of the GTX 1600 series or higher is required.
Examples can be run on GTX 1000 series graphics cards as well, but project settings must be slightly modified. The modification method is covered in the 'Installation and Development Environment Setup' chapter.
You can also use the latest CUDA Toolkit, version 13.3 or higher. Again, you will need to slightly modify the project settings. The modification method is covered in 'Installation and Development Environment Setup'.
It does not cover AI technology. While matrix multiplication or applying kernel filters are related to AI technology, it does not directly deal with AI technology.

Recommended for
these people

Who is this course right for?

A programmer who is intimidated by GPU programming due to a lack of graphics experience but wants to utilize parallel computing.
Developers who want to directly accelerate AI, simulation, and scientific computing

Need to know before starting?

C/C++
Basic Windows Programming using Visual Studio

Hello
This is megayuchi

Inflearn Verified

Career Verified

3,313

Learners

Reviews

Answers

5.0

Rating

Courses

프로그래머

C++,x86/x64 ASM, DirectX9/11/12, Metal, OpenGL, CUDA, win32, winsock/bsd socket

인프런 강의

D3D12프로그래밍 기초편 - https://inf.run/7gJhS

D3D12프로그래밍 기초플러스 - https://inf.run/itHDW

DirectX Raytracing 프로그래밍 - https://inf.run/cQqx7

Windows System 프로그래밍 - https://inf.run/AwfCv

Windows Debugging Tips - https://inf.run/zL7E4

Blog : https://megayuchi.com

Youtube : https://youtube.com/megayuchi

LinkedIn : https://www.linkedin.com/in/megayuchi/

Curriculum

All

13 lectures ∙ (16hr 23min)

Course Materials:

Lecture resources

Section 1. Please enter the title for the first section.

13 lectures ∙ (16hr 23min)

1. Course Introduction
10:25
2. CUDA Programming - CUDA Overview
01:17:08
3. CUDA Programming - Installation and Environment Setup
53:17
4. CUDA Programming - Programming Basics
01:33:53
5. CUDA Programming - Global Memory Coalescing
01:12:40
6. CUDA Programming - Thread Co-op within a Block
01:36:52
7. CUDA Programming - Shared Memory - MatrixTranspose
45:00
8. CUDA Programming - Shared Memory - MatrixMultiply
01:13:43
9. CUDA Programming - Occupancy
01:59:00
10. CUDA Programming - cuda Stream
01:49:56
11. CUDA Programming - Image Filter
01:45:58
12. CUDA Programming - Image Histogram
49:07
13. CUDA Programming - CUDA-D3D12 Interop
01:16:10

Published: 06/11/2026

Last updated: 06/11/2026

Reviews

Not enough reviews.

Please write a valuable review that helps everyone!

megayuchi's other courses

Check out other courses by the instructor!

Windows System Programming

megayuchi

We'll teach you essential Windows System programming skills for developing games and applications for Windows.

Basic

windows-programming, C++, microsoft-visual-c++

Windows System Programming

megayuchi

D3D12 Programming - Basics

megayuchi

It seems the mainstream graphics API has shifted from D3D11 to D3D12. D3D12 supports attractive features but has a steep learning curve. However, I believe that with gradual learning, individuals can also create games using the D3D12 API. Therefore, based on my experience building a game directly with D3D12, I aim to provide the knowledge base needed to challenge D3D12 game programming.

Intermediate

DirectX, d3d, directx-12

D3D12 Programming - Basics

megayuchi

D3D12 Programming Basics Plus

megayuchi

This course continues from D3D12 Programming Basics. After implementing basic rendering features, it explains the features and approach needed for actual engine development using them.

Intermediate

DirectX12, DirectX, directx-12

D3D12 Programming Basics Plus

megayuchi

Developing your own game engine

megayuchi

Introduces the knowledge and approaches needed to develop your own engine.

Intermediate

C++, DirectX, Architecture

Developing your own game engine

megayuchi

Introduction to D3D12 Mesh Shader

megayuchi

Introduces the purpose and programming method of Mesh Shader, a new feature added in D3D12.

Intermediate

DirectX, d3d, GPU

Introduction to D3D12 Mesh Shader

megayuchi

Introduction to D3D12 Programming

megayuchi

This course provides the basic knowledge needed for those with experience in D3D9/10/11 or OpenGL to adapt to D3D12 programming.

Intermediate

DirectX, d3d

Introduction to D3D12 Programming

megayuchi

Texture Streaming with D3D Tiled Resources

megayuchi

This tutorial introduces how to use Tiled Resources in D3D11/12.

Intermediate

DirectX, d3d, DirectX12

Texture Streaming with D3D Tiled Resources

megayuchi

Socket programming for online game development

megayuchi

This course covers the core aspects of TCP/IP network programming essential for online game development with a practical focus. You'll learn step by step from the basic principles of sockets to client-server architecture design, and implementation of game frameworks using custom-built network libraries. Topics covered: Basic network concepts and TCP/IP operation principles TCP programming using Socket API Packet protocol design and transmission structure implementation Client/server-based game framework development Optimization and practical tips needed for actual online game development Important notes: The course is conducted on Windows using Visual Studio. While it uses standard BSD socket API, there are slight differences from Unix-based operating systems. IOCP and Overlapped I/O are not covered.

Intermediate

winsock, game-programming, bsd-sockets

Socket programming for online game development

megayuchi

DirectX Raytracing Programming

megayuchi

DirectX 12 supports real-time Raytracing, allowing high-quality graphics to be implemented with concise code. However, due to the high barrier to entry, programmers who directly utilize it are rare. Based on experience applying Raytracing to actual games, this course will help you develop your own Raytracing engine.

Intermediate

DirectX, raytracing, computer-graphics

DirectX Raytracing Programming

megayuchi

Windows Debugging Tips

megayuchi

Let's learn Windows debugging techniques that no one tells you about.

Basic

debugging, debugger, windows-programming

Windows Debugging Tips

megayuchi

Similar courses

Explore other courses in the same field!

[Beginner] Practical Qt/QML Programming for Stepping Up to Intermediate Level

qtdev

Take the leap from a beginner to an intermediate developer by learning Qt/QML programming skills and the latest technologies through real-world projects.

Basic

Qt, QML, C++

[Beginner] Practical Qt/QML Programming for Stepping Up to Intermediate Level

qtdev

C++ Principles Known Only to the 1%: How to Create a Gap at the Introductory Level

tipsware

Simply learning C++ is not enough. You can truly see C++ only when you understand "why these syntax rules were created." This course is not just a list of simple syntax. A Microsoft MVP with 30 years of experience and the author of "Do it! Introduction to C Language" explains the structure and philosophy of C++ in depth from the perspective of a language designer. As a result, you will experience a shift in your developer mindset that goes beyond just "friendly explanations."

Basic

C++, oop, polymorphism

C++ Principles Known Only to the 1%: How to Create a Gap at the Introductory Level

tipsware

CUDA Programming (1) - C/C++/GPU Parallel Computing - CUDA Kernel kernel

onemoresipofcoffee

✅ (1) Creating an actual CUDA kernel, out of the complete series from (1) to (6) ✅ Explaining NVIDIA GPU + CUDA programming step-by-step from the basics. ✅ Processing arrays, matrices, image processing, statistical processing, and sorting very quickly using parallel computing with C++/C.

Intermediate

CUDA, GPU, Parallel Processing

CUDA Programming (1) - C/C++/GPU Parallel Computing - CUDA Kernel kernel

onemoresipofcoffee

[MMORPG Game Development with C++ and Unreal Series] Part 1: Introduction to C++ Programming

Rookiss

Learn the basic C++ syntax for smooth learning of the series. We will cover essential content from assembly language to basic C++ syntax, STL, and C++11 in a compressed manner.

Beginner

C++

[MMORPG Game Development with C++ and Unreal Series] Part 1: Introduction to C++ Programming

Rookiss

Windows System Programming

megayuchi

We'll teach you essential Windows System programming skills for developing games and applications for Windows.

Basic

windows-programming, C++, microsoft-visual-c++

Windows System Programming

megayuchi

Windows Debugging Tips

megayuchi

Let's learn Windows debugging techniques that no one tells you about.

Basic

debugging, debugger, windows-programming

Windows Debugging Tips

megayuchi

[IT Master Class] Let's Code the C++ Way

contents

C++ programming methods required in the field! It's time to move beyond C-style coding and start coding like a true C++ developer. This course is designed to help you build fundamental programming skills by learning the basic syntax of C++ step-by-step, growing to a level where you can apply them immediately in real-world practice. You can create practical projects using Arduino or develop the programming skills necessary for game development using Unreal Engine. Through this course, you will naturally boost your programming confidence and become comfortable utilizing C++ in practical applications.

Beginner

C++

[IT Master Class] Let's Code the C++ Way

contents

[All-in-One Introduction to Game Programming] C++ & Data Structures/Algorithms & STL & Game Mathematics & Windows API & Game Server

Rookiss

This is an all-in-one curriculum for game programming beginners who are unsure of where to start. It is a comprehensive curriculum that covers the basics of game programming, including C++, data structures/algorithms, STL, game mathematics, Windows API, and an introduction to game servers.

Beginner

C++, UE Blueprint, game-math

[All-in-One Introduction to Game Programming] C++ & Data Structures/Algorithms & STL & Game Mathematics & Windows API & Game Server

Rookiss

Triangles in action! CMake beginner

triangle

Are you having trouble using CMake? After taking this course, you too will be a CMake expert.

Basic

cmake, vcpkg, C++

Triangles in action! CMake beginner

triangle

Advanced Algorithms that Program Themselves (C++)

eazuooz

This lecture is for those who tried to study advanced algorithms but feel lost on how to approach problems with just the basic data structures and algorithms found in books.

Intermediate

data-structure, Algorithm, C++

Advanced Algorithms that Program Themselves (C++)

eazuooz

Coding Test Basics in 2 Hours (Python, C++, Java, Javascript)

ally

This course is for beginners who have started coding tests but are confused, wondering, “Am I just not good at this, or is there still more I need to learn?” Through easy problems, you will master input processing, strings, data structures, and basic mathematical concepts, while learning the right way to approach problems from the very first step.

Beginner

JavaScript, Python, Java

Coding Test Basics in 2 Hours (Python, C++, Java, Javascript)

ally

Practical C++ Programming for Experts (Mastering File Handling, Exception Handling, STL, and Lambda Expressions)

kimw24072

This course goes beyond simple grammar explanations and focuses on developing C++ programming skills that can be applied immediately in the field. Throughout my teaching, I have directly addressed common challenges faced by many learners—such as struggling to understand the STL or feeling confused by file handling and exception handling concepts. Based on this experience, I provide step-by-step explanations to ensure even complex concepts are understood easily and clearly. Furthermore, this course is not just about delivering theory; it is structured around: 👉 "Why this concept is necessary" 👉 "How it is used in actual code" 👉 "How it is applied in professional practice"

Intermediate

C++, Algorithm, data-structures

Practical C++ Programming for Experts (Mastering File Handling, Exception Handling, STL, and Lambda Expressions)

kimw24072

Triangles in action! OpenAI Triton beginner

triangle

In this course, you will learn how to program kernels and develop PyTorch modules. You can use the knowledge you have gained to develop models faster.

Basic

Deep Learning(DL), Python, gpgpu

Triangles in action! OpenAI Triton beginner

triangle

Making Your First Robot for Absolute Beginners: An Introduction to Robotics Starting with OTTO DIY

happyloper

This course goes beyond simple kit assembly; it is a process of building the open-source project Otto DIY from scratch. We have included everything from the secrets of component selection and hardware operating principles to Arduino coding (functions, variables, loops), and even the troubleshooting know-how for voltage drops, which beginners find most challenging. Don't just stop at theory—develop the skills of a 'true robot maker' by completing your own smartphone-controlled robot!

Beginner

C++, Arduino, Embedded

Making Your First Robot for Absolute Beginners: An Introduction to Robotics Starting with OTTO DIY

happyloper

Creating a custom engine using C++ (Unity Engine clone coding)

eazuooz

This is the process of creating a game engine using C++, similar to the Unity game engine. It was created by inferring the internal code, and through this, we can think about the internal principles of the Unity engine.

Intermediate

windows-api, C++, game-programming

Creating a custom engine using C++ (Unity Engine clone coding)

eazuooz

Become a Coding Test Passer - C++

dremdeveloper

C++ lectures for passing coding tests, No need for books! Community where you can communicate directly with the author!

Beginner

C++, data-structure, Algorithm

Become a Coding Test Passer - C++

dremdeveloper

Introduction to CUDA Programming

What you will gain after the course

Expanding Development Capabilities with CUDA, the Start of GPU ProgrammingAn Introductory GPGPU Course for C/C++ Developers

What you will learn

1. CUDA Overview

2. Installation and Environment Setup

3. CUDA Programming Basics

4. Global Memory Coalescing

5. Thread Co-op within a Block

6. Shared Memory - MatrixTranspose

7. Shared Memory - MatrixMultiply

8. Occupancy

9. cuda Stream

10. Image Filter

11. Image Histogram

12. CUDA-D3D12 interop

Notes before taking the course

Practice Environment

Learning Materials

Prerequisite Knowledge

Precautions

Recommended for
these people

Hello
This is megayuchi

프로그래머

인프런 강의

Curriculum

Reviews

megayuchi's other courses

Similar courses

Introduction to CUDA Programming

What you will gain after the course

Expanding Development Capabilities with CUDA, the Start of GPU ProgrammingAn Introductory GPGPU Course for C/C++ Developers

What you will learn

1. CUDA Overview

2. Installation and Environment Setup

3. CUDA Programming Basics

4. Global Memory Coalescing

5. Thread Co-op within a Block

6. Shared Memory - MatrixTranspose

7. Shared Memory - MatrixMultiply

8. Occupancy

9. cuda Stream

10. Image Filter

11. Image Histogram

12. CUDA-D3D12 interop

Notes before taking the course

Practice Environment

Learning Materials

Prerequisite Knowledge

Precautions

Recommended for these people

HelloThis is megayuchi

프로그래머

인프런 강의

Curriculum

Reviews

megayuchi's other courses

Similar courses

Recommended for
these people

Hello
This is megayuchi