Inflearn brand logo image
Inflearn brand logo image
Inflearn brand logo image
Programming

/

Programming Language

CUDA Programming (6) - C/C++/GPU Parallel Computing - Search & Sort

✅ Out of the entire series (1) ~ (6), (6) implements parallel search and parallel sort ✅ Explains NVIDIA GPU + CUDA programming from the basics step by step. ✅ Processes arrays/matrices/image processing/statistical processing/sorting, etc. very quickly with parallel computing in C++/C language.

(5.0) 6 reviews

148 learners

  • onemoresipofcoffee
gpu
커널
nvidia
CUDA
GPU
C++
C

Reviews from Early Learners

What you will learn!

  • Full Series - Massively Parallel Computing with CUDA on GPUs

  • This lecture is - Part (6) - Parallel search, parallel sort implementation

  • ✅ Bundle Discount Coupon✳️ provided in the roadmap "CUDA Programming"

Speed is everything in a program!
Make it fast with massive parallel processing techniques 🚀

They say massive parallel computing is important 🧐

Large-scale parallel computing based on GPUs and graphics cards is actively used in AI, deep learning, big data processing, and image/video/audio processing. Currently, the most widely applied technology in GPU parallel computing is NVIDIA's CUDA architecture.

Among parallel computing technologies, large-scale parallel computing and CUDA are considered crucial. However, it's difficult to find a course that systematically teaches this field, making it difficult to even begin learning. Learn CUDA programming step by step through this course. CUDA and parallel computing require a theoretical background and can be challenging. This course's rich examples and background explanations, along with a thorough understanding of the fundamentals, will give you the tools you need! This course will be produced as a series, ensuring ample lecture time.

This lecture will explain how C++/C programmers can use the CUDA library and C++/C functions to accelerate a wide range of problems using massively parallel processing techniques . This approach can be used to accelerate existing C++/C programs or to dramatically accelerate new algorithms and programs by developing them entirely using parallel computing.

📢 Please check before taking the class!

  • For this tutorial, please ensure you have a hardware environment that supports NVIDIA CUDA. You will need a PC or laptop equipped with an NVIDIA GeForce graphics card .
  • NVIDIA GeForce graphics cards can be used in some cloud environments, but cloud settings change frequently and are often paid. In cloud environments, students are responsible for ensuring their own graphics card usage.
  • You can check the lecture practice environment in detail in the curriculum's <00. Preparation before lecture> lecture.

Lecture Features ✨

#1.
rich
Examples and explanations

CUDA and large-scale parallel computing require extensive examples and explanations. This series of lectures provides over 24 hours of hands-on learning time.

#2.
Practice is essential!

Since it is a computer programming subject, it emphasizes abundant practical training and provides actual working source code so that you can follow along step by step.

#3.
The important part
Focus!

During lecture time, we will try to avoid redundant explanations of the source code parts that have already been explained, so that you can focus on only the changed parts or the parts that need to be emphasized.


I recommend this to these people 🙋‍♀️

College students who want to add new technologies to their portfolio before getting a job.

Programmers who want to dramatically improve existing programs

Researchers who want to know how various applications are accelerated

Anyone who wants to learn about the theory and practice of parallel processing such as AI, deep learning, and matrix calculations.

Preview lecture review 🏃

*The review below is a review of an external lecture given by a knowledge sharer on the same topic.

"I knew nothing about parallel algorithms or parallel computing,
After taking the course, I feel more confident in parallel computing."

"There were many algorithms that could not be solved with existing C++ programs.
Through this lecture, I was able to improve my ability to process in real time!"

"After attending the lecture, when I was interviewed and said that I had experience with parallel computing, the interviewers were very surprised.
"I heard that it's not easy to find CUDA or parallel computing courses at the college level."


CUDA Programming Conquest Roadmap 🛩️

  • The CUDA programming course is designed to increase concentration on the topic, with 7 series totaling over 24 hours of lectures.
  • Each lecture is divided into six or more sections, each covering a separate topic . (The current lecture, Part 0, consists of only two sections, the Introduction.)
  • Slides used in the lecture are provided as PDF files, and the source code of the programs used is provided in the sections explaining the practical examples.

Part 0 (1-hour free lecture)

  • Introduction to MPC and CUDA - This section provides an overall introduction to MPC and CUDA.

Part 1 (3 hours 40 minutes)

  • CUDA Kernel Concepts - Learn the concepts of CUDA Kernels, the starting point of CUDA programming, and see parallel computing in action.

Part 2 (4 hours 15 minutes)

  • Vector addition - Various examples of operations between vectors, which are one-dimensional arrays, are presented, and AXPY routines are actually implemented in CUDA.

Part 3 (4 hours 5 minutes)

  • Memory Hierarchy - Learn about the memory structure at the heart of CUDA programming. Implement examples such as matrix addition and adjacent difference.

Part 4 (3 hours 45 minutes)

  • Matrix transpose & multiply - Provides various examples of operations between two-dimensional arrays of matrices, and implements the GEMM routine with CUDA.

Part 5 (3 hours 55 minutes)

  • Atomic Operation & Reduction - Learn CUDA control flow, from problem definition to solution, including atomic operations and reductions. You'll also implement GEMV routines in CUDA.

Part 6 (3 hours 45 minutes) Current lecture

  • Search & Sort - Learn examples of how to effectively implement search-all problems, even-odd sort, bitonic sort, and counting merge sort using the CUDA architecture.

CUDA programming and
Conquering massive parallel computing!


Q&A 💬

Q. What are the reviews of the paid lectures?

Paid courses are being released sequentially, from (1) to (6), so course reviews are scattered and not yet public. The paid courses currently have the following reviews:

  • It was very helpful that you explained in detail the process of maximizing performance by applying various techniques in one example.
  • It was much easier to understand because you explained the memory structure and logic visually.
  • While studying vague AI, it's good to be able to add in-depth content about devices.
  • The software installation was well explained and the source code was provided, making it easy to practice.

Q. Is this a course that non-majors can also take?

  • Some experience with C++ programming is required. At the very least, some C programming experience is expected. While all examples are written in a simple, straightforward manner, they are provided in C++/C code, and the functionality provided by functions like malloc and memcpy is not specifically explained.
  • However, if you have an understanding of computer architecture (registers, cache memory, etc.), operating systems (time sharing, etc.), and compilers (code generation, code optimization), you will be able to understand the lecture content more deeply.
  • This course was originally designed for advanced study by seniors in computer science at four-year universities.

Q. Is there anything I need to prepare before attending the lecture? Are there any notes regarding the course (necessary environment, other considerations, etc.)?

  • You must first secure a hardware environment that supports NVIDIA CUDA for practical training. A PC/laptop equipped with an NVIDIA GeForce graphics card is required.
  • NVIDIA GeForce graphics cards can be used in some cloud environments, but cloud environment settings change frequently and are often paid, so students must figure out how to use the graphics cards themselves in cloud environments.

Q. What level of content is covered in the class?

  • Starting from Part 0, moving up to Part 1 and Part 6, deeper theory and greater understanding are required.
  • We strongly recommend that you take the course in order , from Part 0 to Part 6.
  • The counting merge sort covered in the final part of Part 6 is a challenging topic, even for expert researchers. However, offline students who followed along step by step often found it easy to understand, building on the material learned in the earlier sections.

Q. Is there a reason for setting a course deadline?

  • The reason for setting a deadline for the course is that, given the nature of the computer field, the course content is likely to become outdated after that amount of time.
  • By then, I'll be back with a new lecture. 😄

Q. Are there subtitles in the video?

  • Yes, all videos now have subtitles.
  • However, some videos added in the future may not have video subtitles.

Information about fonts used in lecture materials ✔️


Recommended for
these people

Who is this course right for?

  • Those who want to accelerate arrays/matrices/image processing/statistical processing/sorting, etc. with C++C-based parallel computing/parallel processing

  • Those who want to accelerate their own developed program with parallel computing/CUDA/CUDA

  • For those who want to study NVIDIA CUDA programming/CUDA computing from the basics

  • Those who want to study the theory and practice of GPU parallel processing/parallel computing

Need to know before starting?

  • C++ or C programming experience

  • Knowledge of computer architecture, registers, caches, time sharing, etc. would be helpful.

Hello
This is

9,063

Learners

216

Reviews

63

Answers

4.9

Rating

30

Courses

One more cup of drip coffee for the road

Curriculum

All

39 lectures ∙ (3hr 42min)

Course Materials:

Lecture resources
Published: 
Last updated: 

Reviews

All

6 reviews

5.0

6 reviews

  • hrham4324님의 프로필 이미지
    hrham4324

    Reviews 2

    Average Rating 5.0

    5

    31% enrolled

    • francis님의 프로필 이미지
      francis

      Reviews 2

      Average Rating 5.0

      5

      31% enrolled

      • min4849님의 프로필 이미지
        min4849

        Reviews 5

        Average Rating 5.0

        5

        31% enrolled

        • onemoresipofcoffee
          Instructor

          Hello. 🌞 Thank you for your good review. 🍀 I hope you always have a happy time.

      • 8909k8961님의 프로필 이미지
        8909k8961

        Reviews 1

        Average Rating 5.0

        5

        100% enrolled

        The lectures are well organized.

        • Hello. 🌞 Thank you for your good review. 🍀 I hope you always have a happy time.

      • wayfarecru0581님의 프로필 이미지
        wayfarecru0581

        Reviews 25

        Average Rating 5.0

        5

        5% enrolled

        I learned that sort can be surprisingly difficult in CUDA, and yet it is much faster than CPU. Thank you for the great lecture. I feel like I have learned CUDA properly.

        • Hello. 🌞 Thank you for your good review. 🍀 I hope you always have a happy time.

      $38.50

      onemoresipofcoffee's other courses

      Check out other courses by the instructor!

      Similar courses

      Explore other courses in the same field!