CUDA Programming (0) - C/C++/GPU Parallel Computing - Open Sample Lecture
✅ This is an introductory lecture (0) that introduces the entire series of lectures (1) to (6).
✅ NVIDIA GPU + CUDA programming is explained step by step from the basics.
✅ It processes arrays/matrices/image processing/statistical processing/sorting, etc. very quickly with parallel computing in C++/C language.
Full Series - Massively Parallel Computing with CUDA on GPUs
This lecture is - Part (0) - Introduction to Massive Parallel Computing and CUDA
Update - June 2023, "Remastered" 🍀 (some audio, intro video)
✅ Bundle Discount Coupon✳️ provided in the roadmap "CUDA Programming"
Speed is everything in a program! Make it fast with massive parallel processing techniques 🚀
They say massive parallel computing is important 🧐
✅ CUDA = The most widely used GPU parallel computing technology ✅ Step by step + abundant examples + detailed explanations = This is the course!
Large-scale parallel computing based on GPUs and graphics cards is actively used in AI, deep learning, big data processing, and image/video/audio processing. Currently, the most widely applied technology in GPU parallel computing is NVIDIA's CUDA architecture.
Among parallel computing technologies, large-scale parallel computing and CUDA are considered crucial. However, it's difficult to find a course that systematically teaches this field, making it difficult to even begin learning. Learn CUDA programming step by step through this course. CUDA and parallel computing require a theoretical background and can be challenging. This course's rich examples and background explanations, along with a thorough understanding of the fundamentals, will give you the tools you need! This course will be produced as a series, ensuring ample lecture time.
This lecture will explain how C++/C programmers can use the CUDA library and C++/C functions to accelerate a wide range of problems using massively parallel processing techniques . This approach can be used to accelerate existing C++/C programs or to dramatically accelerate new algorithms and programs by developing them entirely using parallel computing.
📢 Please check before taking the class!
For this tutorial, please ensure you have a hardware environment that supports NVIDIA CUDA. You will need a PC or laptop equipped with an NVIDIA GeForce graphics card .
While NVIDIA GeForce graphics cards can be used in some cloud environments, cloud environments often have frequently changing configurations and often require a fee. If you're using a cloud environment, be sure to secure an environment that supports graphics cards.
You can check the lecture practice environment in detail in the curriculum's <00. Preparation before lecture> lecture.
Lecture Features ✨
#1. rich Examples and explanations
CUDA and large-scale parallel computing require extensive examples and explanations. This series of lectures covers parts (0) through (6), totaling over 24 hours.
#2. Practice is essential!
Since it is a computer programming subject, it emphasizes abundant practical training and provides actual working source code so that you can follow along step by step.
#3. The important part Focus!
During lecture time, we will try to avoid redundant explanations as much as possible for the source code parts that have already been explained, so that you can focus on only the changed parts or the parts that need to be emphasized.
I recommend this to these people 🙋♀️
Programmers who want to dramatically improve existing programs
Researchers who want to know how various applications are accelerated
College students who want to add new technologies to their portfolio before getting a job.
Anyone who wants to learn about the theory and practice of parallel processing such as AI, deep learning, and matrix calculations.
Preview lecture review 🏃
*The review below is a review of an external lecture given by a knowledge sharer on the same topic.
"I knew nothing about parallel algorithms or parallel computing, After taking the course, I feel more confident in parallel computing."
"There were many algorithms that could not be solved with existing C++ programs. Through this lecture, I was able to improve my ability to process in real time!"
"After attending the lecture, when I was interviewed and said that I had experience with parallel computing, the interviewers were very surprised. "I heard that it's not easy to find CUDA or parallel computing courses at the college level."
CUDA Programming Conquest Roadmap 🛩️
The CUDA programming course is designed to increase concentration on the topic, with 7 series totaling over 24 hours of lectures.
A roadmap lecture titled "CUDA Programming" is also available. Be sure to check it out. ✅
Each lecture is divided into six or more sections, each covering a separate topic . (The current lecture, Part 0, consists of only two sections, the Introduction.)
Slides used in the lecture are provided as PDF files, and the source code of the programs used is provided in the sections explaining the practical examples.
Part 0 (1-hour free lecture)Current lecture
Introduction to MPC and CUDA - This section provides an overall introduction to MPC and CUDA.
Part 1(3 hours 40 minutes)
CUDA Kernel Concepts - Learn the concepts of CUDA Kernels, the starting point of CUDA programming, and see parallel computing in action.
Part 2 (4 hours 15 minutes)
Vector addition - Various examples of operations between vectors, which are one-dimensional arrays, are presented, and AXPY routines are actually implemented in CUDA.
Part 3 (4 hours 5 minutes)
Memory Hierarchy - Learn about the memory structure at the heart of CUDA programming. Implement examples such as matrix addition and adjacent difference.
Part 4 (3 hours 45 minutes)
Matrix transpose & multiply - Provides various examples of operations between two-dimensional arrays of matrices, and implements the GEMM routine with CUDA.
Part 5 (3 hours 55 minutes)
Atomic Operation & Reduction - Learn CUDA control flow, from problem definition to solution, including atomic operations and reductions. You'll also implement GEMV routines in CUDA.
Part 6 (3 hours 45 minutes)
Search & Sort - Learn examples of how to effectively implement search-all problems, even-odd sort, bitonic sort, and counting merge sort using the CUDA architecture.
CUDA programming and Conquering massive parallel computing!
Q&A 💬
Q. What are the reviews of the paid lectures?
Paid courses are being released sequentially, from (1) to (6), so course reviews are scattered and not yet public. The paid courses currently have the following reviews:
It was very helpful that you explained in detail the process of maximizing performance by applying various techniques in one example.
It was much easier to understand because you explained the memory structure and logic visually.
While studying vague AI, it's good to be able to add in-depth content about devices.
The software installation was well explained and the source code was provided, making it easy to practice.
Q. Is this a course that non-majors can also take?
Some experience with C++ programming is required. At the very least, some C programming experience is expected. While all examples are written in a simple, straightforward manner, they are provided in C++/C code, and the functionality provided by functions like malloc and memcpy is not specifically explained.
If you have an understanding of computer architecture (registers, cache memory, etc.), operating systems (time sharing, etc.), and compilers (code generation, code optimization), you will be able to understand the lecture content more deeply.
This course was originally designed for advanced study by seniors in computer science at four-year universities.
Q. Is there anything I need to prepare before attending the lecture? Are there any notes regarding the course (necessary environment, other considerations, etc.)?
You must first secure a hardware environment that supports NVIDIA CUDA for practical training. A PC/laptop equipped with an NVIDIA GeForce graphics card or a cloud environment is required.
Some cloud environments also allow you to use NVIDIA GeForce graphics cards, but cloud environments often have changing settings and are often paid, so please choose an environment that allows you to use your graphics card.
Q. What level of content is covered in the class?
Starting from Part (0), moving up to Part (1) and Part (6), deeper theory and greater understanding are required.
We strongly recommend that you take the courses in order, from Part (0) to Part (6).
The counting merge sort covered in the final part of Part (6) is a challenging topic, even for expert researchers. However, many students who followed along step by step found it easy to understand, building on their previous learning.
Q. Is there a reason for setting a course deadline?
The reason for setting a deadline for the course is that, given the nature of the computer field, the course content is likely to become outdated after that amount of time.
By then, I'll be back with a new lecture. 😄
Q. Are there subtitles in the video?
Yes, all videos have subtitles!
Subtitles may be added to videos without subtitles when updated, but for now, all videos have subtitles.
Recommended for these people
Who is this course right for?
Those who want to accelerate arrays/matrices/image processing/statistical processing/sorting, etc. with C++C-based parallel computing/parallel processing
Those who want to accelerate their own developed program with parallel computing/CUDA/CUDA
For those who want to study NVIDIA CUDA programming/CUDA computing from the basics
Those who want to study the theory and practice of GPU parallel processing/parallel computing
Need to know before starting?
C++ or C programming experience
Knowledge of computer architecture, registers, caches, time sharing, etc. would be helpful.