CUDA Programming (0) - C/C++/GPU Parallel Computing - Public Sample Lecture
✅ This is the (0) intro lecture that introduces the entire series from (1) to (6).
✅ It explains NVIDIA GPU + CUDA programming step-by-step from the basics.
✅ Use C++/C to process arrays, matrices, image processing, statistical processing, and sorting extremely fast through parallel computing.
Thank you for making this course in Korean. I will conquer them one by one.
5.0
김성은
100% enrolled
I am an office worker who is studying the program step-by-step. Of course, it is not my field of work, but I took the course to learn about the new paradigm led by computers. I took the course in one go because of the detailed explanation and lecture progress. (This is the first lecture I have taken at Inflearn~~^^) I will listen to the next lecture diligently. Please take care of me.
5.0
몽크in도시
7% enrolled
As someone else wrote in their course review... I'm really grateful that you made this course into Korean.
What you will gain after the course
Full Series - Massively Parallel Computing with CUDA using GPUs
This lecture is - Part (0) - Introduction to Massively Parallel Computing and CUDA
Update - June 2023, "Remastering"🍀(Some audio tracks, intro video)
✅Bundle Discount Coupon✳️ provided for the "CUDA Programming" roadmap
Speed is the lifeblood of a program! Make it fast with massive parallel processing techniques 🚀
I heard large-scale parallel computing is important 🧐
✅ CUDA = The most widely used GPU parallel computing technology ✅ Step-by-step + Rich examples + Detailed explanations = This is the course!
GPU/graphics card-based massively parallel computing is being very actively used in fields such as AI, deep learning, big data processing, and image/video/audio processing. Currently, the most widely applied technology in GPU parallel computing is NVIDIA's CUDA architecture.
While technologies like massively parallel computing and CUDA are considered important within the field of parallel computing, it is often difficult to even start learning because it's hard to find courses that teach this subject systematically. Through this course, you can learn CUDA programming step-by-step. CUDA and parallel computing require a theoretical background and can be challenging. However, if you follow along from the basics with this course's abundant examples and background explanations, you can definitely do it! This course is planned as a series, ensuring sufficient lecture time is provided.
In this course, we aim to explain how C++/C programmers can combine CUDA libraries and C++/C functions to accelerate problems in various fields using large-scale parallel processing techniques. Through this method, you can accelerate existing C++/C programs or develop new algorithms and programs entirely with parallel computing to make them dramatically faster.
📢 Please check before taking the course!
Please secure a hardware environment where NVIDIA CUDA works in advance for the practice sessions. A PC/laptop equipped with an NVIDIA GeForce graphics card is absolutely necessary.
While NVIDIA GeForce graphics cards can be used in some cloud environments, cloud settings change frequently and often involve costs. If you are using a cloud environment, please ensure you have secured an environment where the graphics card is accessible.
You can find detailed information about the lecture practice environment in the <00. Pre-lecture Preparation> lesson of the curriculum.
Course Features ✨
#1. Abundant examples and explanations
CUDA and massively parallel computing require abundant examples and explanations. This lecture series provides a total of over 24 hours of content, spanning from Part (0) to Part (6).
#2. Hands-on practice is a must!
Since this is a computer programming course, we emphasize extensive hands-on practice and provide actual working source code so that you can follow along step-by-step.
#3. Focus on the important parts!
During the lecture, redundant explanations for source code that has already been covered are minimized as much as possible, allowing you to focus your learning on the modified parts or areas that need emphasis.
Recommended for these people 🙋♀️
Programmers who want to drastically improve existing programs
Major researchers who want to know how various applications have been accelerated
University students who want to add a portfolio of new technologies before getting a job
Those who want to understand the theory and practice of parallel processing in AI, deep learning, and matrix calculations.
A sneak peek at course reviews 🏃
*The reviews below are for an external lecture conducted by the instructor on the same topic.
"I didn't know anything about parallel algorithms or parallel computing, but after taking the course, I gained confidence in parallel computing."
"There were many algorithms that I couldn't solve with existing C++ programs, but through this lecture, I was able to improve them to enable real-time processing!"
"When I mentioned I had experience in parallel computing during an interview after taking this course, the interviewers were very surprised. They said it's not easy to find CUDA or parallel computing courses at the undergraduate level."
CUDA Programming Mastery Roadmap 🛩️
The CUDA programming course was designed as a 7-part series with over 24 hours of total content to enhance focus on the subject matter.
The roadmap course "CUDA Programming" is also available. Be sure to check it out. ✅
Each lecture consists of 6 or more sections, and each section covers an independent topic. (The current lecture, Part 0, consists of 2 sections and provides only the Introduction.)
The slides used in the lecture are provided as PDF files, and the program source code used in sections where practical examples are explained is also provided.
Part 0 (1-hour free lecture)Current Lecture
Introduction to MPC and CUDA - This is the introduction section providing an overall overview of MPC and CUDA.
Part 1 (3 hours 40 minutes)
CUDA kernel concepts - Learn the concept of the CUDA kernel, which is the starting point of CUDA programming, and see how parallel computing works in action.
Part 2 (4 hours 15 minutes)
vector addition - Presents operations between vectors (1D arrays) through various examples and implements the AXPY routine using CUDA.
Part 3 (4 hours 5 minutes)
memory hierarchy - Learn the memory structure, which is the core of CUDA programming. Implement matrix addition, adjacent difference, etc., as examples.
Part 4 (3 hours 45 minutes)
matrix transpose & multiply - Presents operations between matrices in 2D array format through various examples and implements the GEMM routine using CUDA.
Part 5 (3 hours 55 minutes)
atomic operation & reduction - Along with an understanding of CUDA control flow, learn everything from problem definitions to solutions for atomic operations and reduction. Also, implement the GEMV routine using CUDA.
Part 6 (3 hours 45 minutes)
search & sort - Learn examples of effectively implementing search-all problems, even-odd sort, bitonic sort, and counting merge sort using the CUDA architecture.
CUDA programming and large-scale parallel computing mastery complete!
Q&A 💬
Q. What are the reviews for the paid courses like?
Since the paid lectures are being opened sequentially from (1) to (6), the reviews are scattered and currently set to private. The paid lectures have received the following reviews so far:
It was very helpful because you explained in detail the process of maximizing performance by applying various techniques to a single example.
It was much easier to understand because the memory structures and logic were explained through visualization.
While vaguely studying AI, it's great to be able to add in-depth content about devices.
The software installation was well-explained and the source code was provided, making it easy to practice.
Q. Is this a lecture that non-majors can take?
C++ programming experience is required to some extent. At the very least, you should have experience with C programming. Although all examples are written to be as simple as possible, they are all provided in C++/C code, and the functions such as malloc and memcpy are not separately explained.
If you have an understanding of computer architecture (registers, cache memory, etc.), operating systems (time-sharing, etc.), and compilers (code generation, code optimization), you will be able to understand the course content more deeply.
This course was originally designed as an advanced study for senior computer science majors at four-year universities.
Q. Is there anything I need to prepare before taking the course? Are there any reference notes regarding the course (required environment, other precautions, etc.)?
You must secure a hardware environment where NVIDIA CUDA works for the practice sessions in advance.A PC/laptop or cloud environment equipped with an NVIDIA GeForce graphics card is absolutely necessary.
Although NVIDIA GeForce graphics cards are available in some cloud environments, cloud settings change frequently and often involve costs, so please choose an environment where you can use a graphics card.
Q. To what level does the course cover the content?
Starting from Part (0) and moving up from Part (1) to Part (6), deeper theory and a higher level of understanding are required.
I strongly recommend that you watch the courses in order, starting from Part (0) through Part (6).
The counting merge sort covered at the end of Part (6) is a problem difficult enough that even professional researchers may find it hard to follow immediately. However, many students who followed the course step-by-step were able to understand it without much trouble, building on their previous learning.
Q. Is there a reason for setting a course enrollment period?
The reason for setting an enrollment period is that, due to the nature of the computer science field, there is a high possibility that the content of this lecture will already be outdated by that time.
By then, I will see you again in a new course. 😄
Q. Are there subtitles in the videos?
Yes. All videos include subtitles!
When updates occur, videos without subtitles may be added, but as of now, subtitles are provided for all videos.
Recommended for these people
Who is this course right for?
Those who want to accelerate array/matrix/image processing, statistical processing, sorting, etc., using C++ based parallel computing/parallel processing.
Those who want to accelerate their own developed programs using parallel computing/CUDA.
Those who wish to study NVIDIA CUDA programming/CUDA computing from the basics.
Those who want to study both the theory and practice of GPU parallel processing/parallel computing in a balanced way.
Need to know before starting?
C++ or C programming experience
It is even better if you have knowledge of computer architecture, registers, caches, time-sharing, etc.