✅ Out of the entire series (1) ~ (6), (6) implements parallel search and parallel sort ✅ Explains NVIDIA GPU + CUDA programming from the basics step by step. ✅ Processes arrays/matrices/image processing/statistical processing/sorting, etc. very quickly with parallel computing in C++/C language.
Full Series - Massively Parallel Computing with CUDA on GPUs
This lecture is - Part (6) - Parallel search, parallel sort implementation
✅ Bundle Discount Coupon✳️ provided in the roadmap "CUDA Programming"
Speed is everything in a program! Make it fast with massive parallel processing techniques 🚀
They say massive parallel computing is important 🧐
Large-scale parallel computing based on GPUs and graphics cards is actively used in AI, deep learning, big data processing, and image/video/audio processing. Currently, the most widely applied technology in GPU parallel computing is NVIDIA's CUDA architecture.
Among parallel computing technologies, large-scale parallel computing and CUDA are considered crucial. However, it's difficult to find a course that systematically teaches this field, making it difficult to even begin learning. Learn CUDA programming step by step through this course. CUDA and parallel computing require a theoretical background and can be challenging. This course's rich examples and background explanations, along with a thorough understanding of the fundamentals, will give you the tools you need! This course will be produced as a series, ensuring ample lecture time.
This lecture will explain how C++/C programmers can use the CUDA library and C++/C functions to accelerate a wide range of problems using massively parallel processing techniques . This approach can be used to accelerate existing C++/C programs or to dramatically accelerate new algorithms and programs by developing them entirely using parallel computing.
📢 Please check before taking the class!
For this tutorial, please ensure you have a hardware environment that supports NVIDIA CUDA. You will need a PC or laptop equipped with an NVIDIA GeForce graphics card .
NVIDIA GeForce graphics cards can be used in some cloud environments, but cloud settings change frequently and are often paid. In cloud environments, students are responsible for ensuring their own graphics card usage.
You can check the lecture practice environment in detail in the curriculum's <00. Preparation before lecture> lecture.
Lecture Features ✨
#1. rich Examples and explanations
CUDA and large-scale parallel computing require extensive examples and explanations. This series of lectures provides over 24 hours of hands-on learning time.
#2. Practice is essential!
Since it is a computer programming subject, it emphasizes abundant practical training and provides actual working source code so that you can follow along step by step.
#3. The important part Focus!
During lecture time, we will try to avoid redundant explanations of the source code parts that have already been explained, so that you can focus on only the changed parts or the parts that need to be emphasized.
I recommend this to these people 🙋♀️
College students who want to add new technologies to their portfolio before getting a job.
Programmers who want to dramatically improve existing programs
Researchers who want to know how various applications are accelerated
Anyone who wants to learn about the theory and practice of parallel processing such as AI, deep learning, and matrix calculations.
Preview lecture review 🏃
*The review below is a review of an external lecture given by a knowledge sharer on the same topic.
"I knew nothing about parallel algorithms or parallel computing, After taking the course, I feel more confident in parallel computing."
"There were many algorithms that could not be solved with existing C++ programs. Through this lecture, I was able to improve my ability to process in real time!"
"After attending the lecture, when I was interviewed and said that I had experience with parallel computing, the interviewers were very surprised. "I heard that it's not easy to find CUDA or parallel computing courses at the college level."
CUDA Programming Conquest Roadmap 🛩️
The CUDA programming course is designed to increase concentration on the topic, with 7 series totaling over 24 hours of lectures.
Each lecture is divided into six or more sections, each covering a separate topic . (The current lecture, Part 0, consists of only two sections, the Introduction.)
Slides used in the lecture are provided as PDF files, and the source code of the programs used is provided in the sections explaining the practical examples.
Part 0 (1-hour free lecture)
Introduction to MPC and CUDA - This section provides an overall introduction to MPC and CUDA.
Part 1 (3 hours 40 minutes)
CUDA Kernel Concepts - Learn the concepts of CUDA Kernels, the starting point of CUDA programming, and see parallel computing in action.
Part 2 (4 hours 15 minutes)
Vector addition - Various examples of operations between vectors, which are one-dimensional arrays, are presented, and AXPY routines are actually implemented in CUDA.
Part 3 (4 hours 5 minutes)
Memory Hierarchy - Learn about the memory structure at the heart of CUDA programming. Implement examples such as matrix addition and adjacent difference.
Part 4 (3 hours 45 minutes)
Matrix transpose & multiply - Provides various examples of operations between two-dimensional arrays of matrices, and implements the GEMM routine with CUDA.
Part 5 (3 hours 55 minutes)
Atomic Operation & Reduction - Learn CUDA control flow, from problem definition to solution, including atomic operations and reductions. You'll also implement GEMV routines in CUDA.
Part 6 (3 hours 45 minutes) Current lecture
Search & Sort - Learn examples of how to effectively implement search-all problems, even-odd sort, bitonic sort, and counting merge sort using the CUDA architecture.
CUDA programming and Conquering massive parallel computing!
Q&A 💬
Q. What are the reviews of the paid lectures?
Paid courses are being released sequentially, from (1) to (6), so course reviews are scattered and not yet public. The paid courses currently have the following reviews:
It was very helpful that you explained in detail the process of maximizing performance by applying various techniques in one example.
It was much easier to understand because you explained the memory structure and logic visually.
While studying vague AI, it's good to be able to add in-depth content about devices.
The software installation was well explained and the source code was provided, making it easy to practice.
Q. Is this a course that non-majors can also take?
Some experience with C++ programming is required. At the very least, some C programming experience is expected. While all examples are written in a simple, straightforward manner, they are provided in C++/C code, and the functionality provided by functions like malloc and memcpy is not specifically explained.
However, if you have an understanding of computer architecture (registers, cache memory, etc.), operating systems (time sharing, etc.), and compilers (code generation, code optimization), you will be able to understand the lecture content more deeply.
This course was originally designed for advanced study by seniors in computer science at four-year universities.
Q. Is there anything I need to prepare before attending the lecture? Are there any notes regarding the course (necessary environment, other considerations, etc.)?
You must first secure a hardware environment that supports NVIDIA CUDA for practical training. A PC/laptop equipped with an NVIDIA GeForce graphics card is required.
NVIDIA GeForce graphics cards can be used in some cloud environments, but cloud environment settings change frequently and are often paid, so students must figure out how to use the graphics cards themselves in cloud environments.
Q. What level of content is covered in the class?
Starting from Part 0, moving up to Part 1 and Part 6, deeper theory and greater understanding are required.
We strongly recommend that you take the course in order , from Part 0 to Part 6.
The counting merge sort covered in the final part of Part 6 is a challenging topic, even for expert researchers. However, offline students who followed along step by step often found it easy to understand, building on the material learned in the earlier sections.
Q. Is there a reason for setting a course deadline?
The reason for setting a deadline for the course is that, given the nature of the computer field, the course content is likely to become outdated after that amount of time.
By then, I'll be back with a new lecture. 😄
Q. Are there subtitles in the video?
Yes, all videos now have subtitles.
However, some videos added in the future may not have video subtitles.
Information about fonts used in lecture materials ✔️
In the video and PDF files, only free fonts from Google/Adobe were used.
The Korean font used was "Bon Gothic" Noto Sans KR , and the English fonts used were Source Sans Pro and Source Serif Pro .
You can download them all for free from the links below. After downloading, unzip them and right-click to install them on your PC or laptop .
Those who want to accelerate arrays/matrices/image processing/statistical processing/sorting, etc. with C++C-based parallel computing/parallel processing
Those who want to accelerate their own developed program with parallel computing/CUDA/CUDA
For those who want to study NVIDIA CUDA programming/CUDA computing from the basics
Those who want to study the theory and practice of GPU parallel processing/parallel computing
Need to know before starting?
C++ or C programming experience
Knowledge of computer architecture, registers, caches, time sharing, etc. would be helpful.
I learned that sort can be surprisingly difficult in CUDA, and yet it is much faster than CPU. Thank you for the great lecture. I feel like I have learned CUDA properly.