Understanding LLM Architecture and GPU Utilization Strategies for AI Beginners

Understand Transformer-based LLM architecture and GPU utilization strategies, and experience direct serving through vLLM. This course covers the entire process of building an AI system pipeline, monitoring, and multi-GPU utilization, allowing you to learn intuitively through illustrations and hands-on practice without complex formulas or coding processes.

(5.0) 6 reviews

154 learners

Level Basic

Course period Unlimited

GPU
GPU
attention-model
attention-model
AI
AI
transformer
transformer
LLM
LLM
GPU
GPU
attention-model
attention-model
AI
AI
transformer
transformer
LLM
LLM

Reviews from Early Learners

5.0

5.0

WonJune Lee

43% enrolled

I am not in the deep learning industry, but I work in the field of computer vision (rule-based). Since my company requires LLM and vision-related deep learning technologies, I have been studying these topics. I have only completed about 40% of the course, but I felt compelled to leave a review now. I have taken many deep learning courses, including those by famous and highly-rated instructors, but I haven't found any course as clean and clear as this one. The best part is the quality of the lecture materials. The instructor recorded every single matrix calculation in Excel, which is incredibly helpful when reviewing. The Python code is also well-commented in many places. The quality of the lectures themselves is excellent; the instructor reminds you of parts you might have forgotten, ensuring you don't miss anything. While most other courses show calculations once or twice and then move on, this course goes through the calculations together until the end, which provides great clarity. It seems like the Q&A is monitored frequently, as I received immediate answers to my questions. The lectures seem to have been filmed this year, so it's great that they include a lot of the latest trends. It feels like this course hasn't gained much word-of-mouth yet, but I highly recommend it to anyone who needs to study these topics.

5.0

Jang Jaehoon

7% enrolled

Thank you for the great lecture!

5.0

nova7tr

11% enrolled

It's my first time taking the course, and it's very helpful. ^-^

What you will gain after the course

  • What is a Transformer model? Understanding the Transformer model's encoder and decoder

  • The foundation of Transformer models: A complete understanding of the evolution of attention mechanisms, including MHA, MQA, GQA, and MLA.

  • Mastering the utilization of the vLLM engine, the current de facto standard

  • vLLM Serving and Monitoring TTFT and TPOT Performance Metrics

  • Design and implementation of multi-GPU architecture utilizing Tensor/Pipeline/Data Parallelism

  • Understanding the Principles of Tool Calling: The Core of Agent AI

  • Transferring industry know-how, building AI system pipelines and performance monitoring

  • Latest trends understood through DeepSeek papers (MLA, MTP, N-gram, etc.)

What is needed now that we have become one of the top three AI powerhouses

For understanding LLM and practical application

LLM Master Class

As we enter the era of autonomous agents, many agent tools such as Open Canvas, Claude Code, and Codex are being used, but the
threat of data leakage and the issue of uncontrolled token costs
cannot be resolved.


The answer is a Hybrid AI architecture.



But are you asking if public APIs are always better?
That is not the case.

Recently, many LLMs comparable to public APIs (ChatGPT, Claude, Sonnet, etc.)
are being developed both domestically and internationally.



3 models selected as a result of the 1st evaluation of domestic Sovereign AI


However, knowing and using LLMs well is not easy.
There is a significant difference between
understanding and using an LLM versus using it without understanding,
especially after purchasing expensive GPUs.


Therefore, it is now time to learn the architecture for serving LLMs directly.


🌟 From LLM Architecture to Serving


In the era of agents, we have moved from the age of training to the age of inference. While using public APIs effectively is necessary, many companies prefer building local serving environments for various reasons such as security, governance, and cost. Learn everything from understanding LLM architecture for building local serving environments to architectural configuration and the latest LLM development trends.


Lecture Core Structure

Core 1. Understanding Hugging Face Models


You must know how to use the numerous LLMs released on Hugging Face.
However, the config.json file, which provides the specifications of an LLM model, is no different from a secret code to beginners. This is because you need to understand the Transformer model to be able to read it.

But don't worry. After taking this course, you will become an expert who can look at and understand the key specifications.

Learn how to decode the config.json file through this lecture.

(This is the content for Chapter 3-5. Make sure to take away all the remaining key parameters.)


Core 2. Mastering Attention

Attention is the beginning and end of the Transformer model, which serves as the foundation for current LLM models.

The attention-model emerged in 2017, but
it has reigned as the strongest algorithm for nearly a decade.
While many efforts are being made to move beyond the Transformer structure,
no architecture has yet emerged that completely replaces the Transformer's attention.

⚠️ You must never have just a vague understanding of attention.


Gain a perfect understanding of the principles of attention and learn about its evolutionary flow.

(This is the content of Chapter 5-4. The evolution of Attention is synonymous with the evolution of LLMs.)


Core 3. Mastering Multi-GPU Architectures

Multi-GPU configuration is essential for running large-scale LLMs and achieving fast inference.
However, did you know that there are several different ways to configure multi-GPU setups?


We will pass on essential GPU utilization strategies, a necessary gateway to becoming a core AI engineer.




😄 Recommended for these people

AI Beginners

Those who gave up on the formulas while studying attention to learn about Transformers

AI Beginner

Those who have only used ChatGPT or public APIs, but want to learn the principles of how LLM models operate.

AI Engineer

AI engineers who need the capability to understand the architectural characteristics of LLM models and to operate and manage them in a GPU environment

💡 What you will learn in this course

Step 1. Foundation

  • Understanding Transformer Models

  • Tokenizer & Embedding

  • Encoder vs Decoder

  • View model source code

Step 2. Attention

  • Mastering the Decoder Model

  • Mastering Attention

  • Masked Attention

  • KV Cache

Step 3. Serving

  • vLLM Serving

  • Paged Attention

  • OpenAI Compatible

  • SSE Protocol

Step 4. Tool Call

  • Understanding Tool Calls

  • Tool Response Architecture

  • Chat Template

  • Tool call parser

Step 5. Optimization

  • Performance Testing

  • vLLM Monitoring

  • Multi-GPU & Parallelism

  • vLLM Additional Features

Step 6. Advanced

  • Multi Token Prediction

  • mHC

  • Engram

  • Efforts to overcome limitations

💡 Key Lecture Points

Point 1

Core principles of attention learned without formulas


Learn various attention techniques intuitively through Excel without complex formulas (MHA → MQA → GQA, Sliding Window Attention)

Point 2

Implementation of a 3-Tier AI Architecture


Understand the basic structure of the 3-tier architecture connecting OpenWebUI, FastAPI, and vLLM, and learn the fundamental flow of tool integration.

Point 3

Measuring Concurrent Users and Tips for vLLM Operation

Using jMeter, perform a load test from FastAPI to vLLM to check metrics such as TTFT and TPOT according to the number of concurrent users.

Point 4

Monitoring vLLM Services

Build a Prometheus & Grafana dashboard pipeline to master the basic principles of vLLM service operations.

Point 5

Single GPU / Multi-GPU Testing

Through hands-on practice with the three basic multi-GPU methods (Pipeline Parallel, Tensor Parallel, and Data Parallel), you will see firsthand why multi-GPU setups are necessary.

Point 6

Mastering LLM Development Trends

We introduce the latest LLM development trends aimed at inference efficiency, including DeepSeek's MTP, Shared MoE, MLA, and Engram techniques.

✅ Tools used in the lecture




✅ Server Practice Environment Guide

The vLLM system construction will be carried out using Runpod. Additionally, hands-on sessions utilizing the T4 GPU in Google Colab will be conducted in parallel. Since the T4 GPU provides 15GB of GPU memory, any exercises that can be performed in Colab will be done there.

Runpod

We will configure a practice environment based on the OpenWebUI → FastAPI → Runpod flow. We will conduct various exercises by deploying vLLM on GPU servers in the Runpod cloud.

Hands-on practice will incur a cost of approximately $10 to $20.


Google Colab

Google Colab, which is like the standard environment for AI practice, is used for simple exercises that do not require a Runpod environment. We will use the standard free tier, not Pro, and utilize the T4 GPU.

✅ Guide to Local Practice Environment

The vLLM service will be hosted on Runpod, but
OpenwebUI and FastAPI will also be running on your local computer.
Therefore, please check if the following environment requirements are met!



Runpod and Colab are used as the primary practice environments, but
You will be practicing by running OpenWebUI and FastAPI within your local environment.

⚠️ This course will be updated as vLLM is updated.

vLLM's update speed is very fast. However, the major version is still in the 0.x range.
Nevertheless, many companies are using vLLM as their inference engine as a de facto standard.
vLLM supports not only the Transformer models that currently form the backbone of LLMs but also alternative architectures like Mamba , and it is updated every time new features are added to models, such as Multi Token Prediction, to support them.
This course will also be updated as new vLLM features or new model types are released.

Don't miss out on the latest LLM trends.


Recommended for
these people

Who is this course right for?

  • A beginner aiming to become an AI engineer who wants to systematically learn LLM serving technologies.

  • Developers who want to understand the principles of Transformers and Attention from a practical perspective without complex formulas.

  • Backend/Infrastructure engineers looking to build AI systems in GPU-optimized and multi-GPU environments

Need to know before starting?

  • Understanding of basic Python syntax (variables, functions, conditional statements, etc.)

  • Basic usage of git

Hello
This is hyunjinkim

1,561

Learners

102

Reviews

236

Answers

4.9

Rating

3

Courses

Hello.

I am a 17-year veteran currently working in the Data & AI field at a large corporation.

Since obtaining my Professional Engineer Information Management certification, I have been creating content to share the knowledge I've gained with as many people as possible.

Nice to meet you. :)

 

Contact: hjkim_sun@naver.com

More

Curriculum

All

54 lectures ∙ (14hr 30min)

Course Materials:

Lecture resources
Published: 
Last updated: 

Reviews

All

6 reviews

5.0

6 reviews

  • nova7tr1173님의 프로필 이미지
    nova7tr1173

    Reviews 8

    Average Rating 4.9

    5

    11% enrolled

    It's my first time taking the course, and it's very helpful. ^-^

    • hyunjinkim
      Instructor

      Hello nova7tr, Thank you for your review. I am glad to hear that it was helpful. I hope you enjoy the rest of the course :)

  • kjunekjune0812님의 프로필 이미지
    kjunekjune0812

    Reviews 2

    Average Rating 5.0

    Edited

    5

    43% enrolled

    I am not in the deep learning industry, but I work in the field of computer vision (rule-based). Since my company requires LLM and vision-related deep learning technologies, I have been studying these topics. I have only completed about 40% of the course, but I felt compelled to leave a review now. I have taken many deep learning courses, including those by famous and highly-rated instructors, but I haven't found any course as clean and clear as this one. The best part is the quality of the lecture materials. The instructor recorded every single matrix calculation in Excel, which is incredibly helpful when reviewing. The Python code is also well-commented in many places. The quality of the lectures themselves is excellent; the instructor reminds you of parts you might have forgotten, ensuring you don't miss anything. While most other courses show calculations once or twice and then move on, this course goes through the calculations together until the end, which provides great clarity. It seems like the Q&A is monitored frequently, as I received immediate answers to my questions. The lectures seem to have been filmed this year, so it's great that they include a lot of the latest trends. It feels like this course hasn't gained much word-of-mouth yet, but I highly recommend it to anyone who needs to study these topics.

    • hyunjinkim
      Instructor

      Hello Wonjune Lee, Thank you for your thoughtful review! I put a lot of thought into improving the quality of the lecture materials so that students could receive meaningful resources and review them effectively even later on. I also spent a lot of time thinking about how to effectively convey operations like Attention. The conclusion I reached was that it shouldn't be explained through formulas alone, nor through simple metaphors, nor just through torch code. Believing that it can only be understood by following the flow visually, I tried my best to explain it using Excel, and I'm glad to hear that it was conveyed well :) I hope you gain great insights from the remaining parts of the course. Keep it up!

  • jjhgwx님의 프로필 이미지
    jjhgwx

    Reviews 872

    Average Rating 4.9

    5

    7% enrolled

    Thank you for the great lecture!

    • hyunjinkim
      Instructor

      Hello Jang jaehoon, Thank you for the course review 👍 I see you've completed 7%. I hope you enjoy the rest of the lessons and find them very helpful. You can do it!

  • 6tank1004님의 프로필 이미지
    6tank1004

    Reviews 14

    Average Rating 5.0

    5

    7% enrolled

    • hyunjinkim
      Instructor

      Thank you.

  • chongin12님의 프로필 이미지
    chongin12

    Reviews 2

    Average Rating 5.0

    5

    61% enrolled

    Similar courses

    Explore other courses in the same field!

    Limited time deal ends in 6 days

    $12,586.00

    30%

    $110.00