강의

멘토링

로드맵

Inflearn brand logo image
Programming

/

Programming Language

CUDA Programming (2) - C/C++/GPU Parallel Computing - Vector Addition

✅ Among the series of (1) to (6), (2) Simultaneous addition of vectors (1D arrays) in parallel ✅ NVIDIA GPU + CUDA programming is explained step by step from the basics. ✅ It processes arrays/matrices/image processing/statistical processing/sorting, etc. very quickly with parallel computing in C++/C language.

(5.0) 7 reviews

194 learners

  • onemoresipofcoffee
gpu
커널
nvidia
CUDA
GPU
Parallel Processing
C++
C

Reviews from Early Learners

What you will learn!

  • Full Series - Massively Parallel Computing with CUDA on GPUs

  • This lecture is - Part (2) - Adding Vectors (1D Arrays) in Parallel Simultaneously

  • Update - July 2023, "Remastered" 🍀 (some audio/video)

  • ✅ Bundle Discount Coupon✳️ provided in the roadmap "CUDA Programming"

Speed is everything in a program!
Make it fast with massive parallel processing techniques 🚀

They say massive parallel computing is important 🧐

Large-scale parallel computing based on GPUs and graphics cards is actively used in AI, deep learning, big data processing, and image/video/audio processing. Currently, the most widely applied technology in GPU parallel computing is NVIDIA's CUDA architecture.

Among parallel computing technologies, large-scale parallel computing and CUDA are considered crucial. However, it's difficult to find a course that systematically teaches this field, making it difficult to even begin learning. Learn CUDA programming step by step through this course. CUDA and parallel computing require a theoretical background and can be challenging. This course's rich examples and background explanations, along with a thorough understanding of the fundamentals, will give you the tools you need! This course will be produced as a series, ensuring ample lecture time.

This lecture will explain how C++/C programmers can use the CUDA library and C++/C functions to accelerate a wide range of problems using massively parallel processing techniques . This approach can be used to accelerate existing C++/C programs or to dramatically accelerate new algorithms and programs by developing them entirely using parallel computing.

📢 Please check before taking the class!

  • For this tutorial, please ensure you have a hardware environment that supports NVIDIA CUDA. You will need a PC or laptop equipped with an NVIDIA GeForce graphics card .
  • NVIDIA GeForce graphics cards can be used in some cloud environments, but cloud settings change frequently and are often paid. In cloud environments, students are responsible for ensuring their own graphics card usage.
  • You can check the lecture practice environment in detail in the curriculum's <00. Preparation before lecture> lecture.

Lecture Features ✨

#1.
rich
Examples and explanations

CUDA and large-scale parallel computing require extensive examples and explanations. This series of lectures provides over 24 hours of hands-on learning time.

#2.
Practice is essential!

Since it is a computer programming subject, it emphasizes abundant practical training and provides actual working source code so that you can follow along step by step.

#3.
The important part
Focus!

During lecture time, we will try to avoid redundant explanations of the source code parts that have already been explained, so that you can focus on only the changed parts or the parts that need to be emphasized.


I recommend this to these people 🙋‍♀️

College students who want to add new technologies to their portfolio before getting a job.

Programmers who want to dramatically improve existing programs

Researchers who want to know how various applications are accelerated

Anyone who wants to learn about the theory and practice of parallel processing such as AI, deep learning, and matrix calculations.

Preview lecture review 🏃

*The review below is a review of an external lecture given by a knowledge sharer on the same topic.

"I knew nothing about parallel algorithms or parallel computing,
After taking the course, I feel more confident in parallel computing."

"There were many algorithms that could not be solved with existing C++ programs.
Through this lecture, I was able to improve my ability to process in real time!"

"After attending the lecture, when I was interviewed and said that I had experience with parallel computing, the interviewers were very surprised.
"I heard that it's not easy to find CUDA or parallel computing courses at the college level."


CUDA Programming Conquest Roadmap 🛩️

  • The CUDA programming course is designed to increase concentration on the topic, with 7 series totaling over 24 hours of lectures.
  • A roadmap lecture titled "CUDA Programming" is also available. Be sure to check it out.
  • Each lecture is divided into six or more sections, each covering a separate topic . (The current lecture, Part 0, consists of only two sections, the Introduction.)
  • Slides used in the lecture are provided as PDF files, and the source code of the programs used is provided in the sections explaining the practical examples.

Part 0 (1-hour free lecture)

  • Introduction to MPC and CUDA - This section provides an overall introduction to MPC and CUDA.

Part 1 (3 hours 40 minutes)

  • CUDA Kernel Concepts - Learn the concepts of CUDA Kernels, the starting point of CUDA programming, and see parallel computing in action.

Part 2 (4 hours 15 minutes) Current lecture

  • Vector addition - Various examples of operations between vectors, which are one-dimensional arrays, are presented, and AXPY routines are actually implemented in CUDA.

Part 3 (4 hours 5 minutes)

  • Memory Hierarchy - Learn about the memory structure at the heart of CUDA programming. Implement examples such as matrix addition and adjacent difference.

Part 4 (3 hours 45 minutes)

  • Matrix transpose & multiply - Provides various examples of operations between two-dimensional arrays of matrices, and implements the GEMM routine with CUDA.

Part 5 (3 hours 55 minutes)

  • Atomic Operation & Reduction - Learn CUDA control flow, from problem definition to solution, including atomic operations and reductions. You'll also implement GEMV routines in CUDA.

Part 6 (3 hours 45 minutes)

  • Search & Sort - Learn examples of how to effectively implement search-all problems, even-odd sort, bitonic sort, and counting merge sort using the CUDA architecture.

CUDA programming and
Conquering massive parallel computing!


Q&A 💬

Q. What are the reviews of the paid lectures?

Paid courses are being released sequentially, from (1) to (6), so course reviews are scattered and not yet public. The paid courses currently have the following reviews:

  • It was very helpful that you explained in detail the process of maximizing performance by applying various techniques in one example.
  • It was much easier to understand because you explained the memory structure and logic visually.
  • While studying vague AI, it's good to be able to add in-depth content about devices.
  • The software installation was well explained and the source code was provided, making it easy to practice.

Q. Is this a course that non-majors can also take?

  • Some experience with C++ programming is required. At the very least, some C programming experience is expected. While all examples are written in a simple, straightforward manner, they are provided in C++/C code, and the functionality provided by functions like malloc and memcpy is not specifically explained.
  • However, if you have an understanding of computer architecture (registers, cache memory, etc.), operating systems (time sharing, etc.), and compilers (code generation, code optimization), you will be able to understand the lecture content more deeply.
  • This course was originally designed for advanced study by seniors in computer science at four-year universities.

Q. Is there anything I need to prepare before attending the lecture? Are there any notes regarding the course (necessary environment, other considerations, etc.)?

  • You must first secure a hardware environment that supports NVIDIA CUDA for practical training. A PC/laptop equipped with an NVIDIA GeForce graphics card is required.
  • NVIDIA GeForce graphics cards can be used in some cloud environments, but cloud environment settings change frequently and are often paid, so students must figure out how to use the graphics cards themselves in cloud environments.

Q. What level of content is covered in the class?

  • Starting from Part 0, moving up to Part 1 and Part 6, deeper theory and greater understanding are required.
  • We strongly recommend that you take the course in order , from Part 0 to Part 6.
  • The counting merge sort covered in the final part of Part 6 is a challenging topic, even for expert researchers. However, offline students who followed along step by step often found it easy to understand, building on the material learned in the earlier sections.

Q. Is there a reason for setting a course deadline?

  • The reason for setting a deadline for the course is that, given the nature of the computer field, the course content is likely to become outdated after that amount of time.
  • By then, I'll be back with a new lecture. 😄

Q. Are there subtitles in the video?

  • Yes, all videos now have subtitles.
  • However, some videos added in the future may not have video subtitles.

Information about fonts used in lecture materials ✔️


Recommended for
these people

Who is this course right for?

  • Those who want to accelerate arrays/matrices/image processing/statistical processing/sorting, etc. with C++C-based parallel computing/parallel processing

  • Those who want to accelerate their own developed program with parallel computing/CUDA/CUDA

  • For those who want to study NVIDIA CUDA programming/CUDA computing from the basics

  • Those who want to study the theory and practice of GPU parallel processing/parallel computing

Need to know before starting?

  • C++ or C programming experience

  • Knowledge of computer architecture, registers, caches, time sharing, etc. would be helpful.

Hello
This is

9,108

Learners

221

Reviews

64

Answers

4.9

Rating

30

Courses

One more cup of drip coffee for the road

Curriculum

All

50 lectures ∙ (4hr 19min)

Course Materials:

Lecture resources
Published: 
Last updated: 

Reviews

All

7 reviews

5.0

7 reviews

  • 김준한님의 프로필 이미지
    김준한

    Reviews 3

    Average Rating 5.0

    5

    100% enrolled

    • hooha1207님의 프로필 이미지
      hooha1207

      Reviews 8

      Average Rating 5.0

      5

      100% enrolled

      강의 목차가 feature를 cuda로 계산하는 방법을 기준으로 나뉘어져 있어 학습 속도가 매우 빠릅니다 원하는 feature 계산을 목차에서 찾은 뒤 오프닝 영상을 시청해 알맞은지 확인 후 알맞으면 다음 영상을 시청해 cuda 구현 방법 및 아이디어 체득 오프닝 영상이 내게 정말 필요한 기능인지 아닌지를 판단하는데 정말 큰 도움이 되었다고 생각합니다 feature에 대한 내용만을 다루기 때문에 제가 원하는 내용인지 아닌지를 정확하고 빠르게 판단할 수 있었습니다 정말 좋아요 ...그러니 수강기한 제한 좀...ㅠ

      • 김지원님의 프로필 이미지
        김지원

        Reviews 5

        Average Rating 5.0

        5

        60% enrolled

        • 드립커피+한모금더
          Instructor

          안녕하세요.🌞 좋은 평가를 해주셔서 감사합니다. 🍀 늘 행복한 시간 되세요.

      • wikimfw님의 프로필 이미지
        wikimfw

        Reviews 12

        Average Rating 4.8

        5

        100% enrolled

        • 하지님의 프로필 이미지
          하지

          Reviews 5

          Average Rating 5.0

          5

          100% enrolled

          실습면에서 좀 아쉬운 면이 있지만, 설명이 자세해서 이해하기 쉽습니다

          $38.50

          onemoresipofcoffee's other courses

          Check out other courses by the instructor!

          Similar courses

          Explore other courses in the same field!