BEST

Understanding LLM Architecture and GPU Utilization Strategies for AI Beginners

Name: Understanding LLM Architecture and GPU Utilization Strategies for AI Beginners
Price: 100100 KRW
Rating: 5 (16 reviews)

Understand Transformer-based LLM architectures and GPU utilization strategies, and gain hands-on experience with the actual serving process using vLLM. This course covers the entire practical workflow, from building AI system pipelines to monitoring and multi-GPU utilization, and is designed for intuitive understanding through diagrams and practice without complex formulas.

(5.0) 16 reviews

209 learners

Level Basic

Course period Unlimited

hyunjinkim

GPU

attention-model

transformer

LLM

GPU

attention-model

transformer

LLM

Reviews from Early Learners

5.0

WonJune Lee

43% enrolled

I am not in the deep learning industry, but I work in the field of computer vision (rule-based). Since my company requires LLM and vision-related deep learning technologies, I have been studying these topics. I have only completed about 40% of the course, but I felt compelled to leave a review now. I have taken many deep learning courses, including those by famous and highly-rated instructors, but I haven't found any course as clean and clear as this one. The best part is the quality of the lecture materials. The instructor recorded every single matrix calculation in Excel, which is incredibly helpful when reviewing. The Python code is also well-commented in many places. The quality of the lectures themselves is excellent; the instructor reminds you of parts you might have forgotten, ensuring you don't miss anything. While most other courses show calculations once or twice and then move on, this course goes through the calculations together until the end, which provides great clarity. It seems like the Q&A is monitored frequently, as I received immediate answers to my questions. The lectures seem to have been filmed this year, so it's great that they include a lot of the latest trends. It feels like this course hasn't gained much word-of-mouth yet, but I highly recommend it to anyone who needs to study these topics.

5.0

jjanec

31% enrolled

It's challenging, but I'll do my best to follow along. I like that the content contains only the essential core information.

5.0

김민서

31% enrolled

It's very helpful.

What you will gain after the course

Understanding the encoder-decoder structure and core operating principles of the Transformer model
Understanding the evolution of the latest attention mechanisms, including MHA, MQA, GQA, and MLA
Hands-on practice on how to utilize the vLLM engine, the de facto standard for current AI serving
Monitoring key performance metrics such as TTFT and TPOT in a vLLM serving environment
Design and implementation of multi-GPU architecture utilizing Tensor/Pipeline/Data Parallelism
Understanding the Core Concepts of Agent AI and the Principles of Tool Calling
Experience in building AI system pipelines and monitoring performance from a real-world industry perspective
Understanding the latest LLM trends such as MLA, MTP, and n-grams based on the latest research papers

In the era of autonomous AI Agents,
you can utilize various Agent tools and Public APIs such as OpenAI, Claude, and Codex.

However, in a real service environment, you must also consider
data security, network costs, token costs, and GPU resource management.

Therefore, what is important is
an understanding of the Hybrid AI architecture, which combines Public APIs and self-hosted GPU-based LLMs
according to the situation.
sao cho phù hợp với từng tình huống.

In that case, is using only Public APIs always the best option?

Not necessarily.

These days, many LLMs comparable to public APIs (ChatGPT, Claude, Sonnet, etc.) are being developed both domestically and internationally.

Therefore, it is now time to learn the architecture for serving LLMs directly.

🌟 From LLM Architecture to Serving

In the era of agents, we have moved from the age of training to the age of inference. While using public APIs effectively is necessary, many companies prefer building local serving environments for various reasons such as security, governance, and cost. Learn everything from understanding LLM architecture for building local serving environments to architectural configuration and the latest trends in LLM development.

Attention is the beginning and end of the Transformer model, which serves as the foundation for current LLM models.

The attention-model emerged in 2017, but
it has remained the most powerful algorithm for nearly a decade.
While many efforts are being made to move beyond the Transformer structure,
no architecture has yet emerged that completely replaces the Transformer's attention.

⚠️ You must never have just a vague understanding of attention.

Gain a perfect understanding of the principles of attention and learn about its evolutionary flow.

⚠️ This course will be updated as vLLM is updated.

vLLM's update speed is very fast. However, the major version is still in the 0.x range.
Nevertheless, many companies are using vLLM as their inference engine as a de facto standard.
vLLM supports not only the Transformer models that currently form the backbone of LLMs but also alternative architectures like Mamba , and it is updated every time new features are added to models, such as Multi Token Prediction, to support them.
This course will also be updated as new vLLM features or new model types are released.

Don't miss out on the latest LLM trends.

Recommended for
these people

Who is this course right for?

Practitioners who use ChatGPT and generative AI but want to understand how LLMs actually work
Beginners who aim to become AI engineers and want to systematically learn LLM serving and system architecture.
Developers who want to understand Transformer and Attention structures from a practical perspective without complex formulas.
Backend and infrastructure engineers who want to understand GPU optimization and the actual workflow of building AI systems in multi-GPU environments
PMs and planners who want to understand LLM architecture and GPU utilization strategies during the AI service planning and development process.

Need to know before starting?

Understanding of basic Python syntax (variables, functions, conditional statements, etc.)
Basic usage of git

Hello
This is hyunjinkim

Inflearn Verified

1,661

Learners

116

Reviews

245

Answers

4.9

Rating

Courses

Hello.

I am a 17-year veteran currently working in the Data & AI field at a large corporation.

Since obtaining my Professional Engineer Information Management certification, I have been creating content to share the knowledge I've gained with as many people as possible.

Nice to meet you. :)

Contact: hjkim_sun@naver.com

Curriculum

All

54 lectures ∙ (14hr 27min)

Course Materials:

Lecture resources

Section 1. Course Introduction

2 lectures ∙ (11min)

Section 2. Getting familiar with HuggingFace

4 lectures ∙ (1hr 13min)

3. The necessity of building a local LLM
13:37
4. HuggingFace Model Download & Inference
19:04
5. Tokenizer
22:53
6. Embedding
17:43

Section 3. Understanding the Transformer Model

7 lectures ∙ (1hr 43min)

7. Understanding Transformers
21:23
8. Attention Mechanism
16:57
9. View encoder model in detail
18:05
10. View decoder model in detail
11:30
11. Understanding the head
16:35
12. Encoder vs Decoder
09:17
13. View model source code
09:35

Section 4. Everything about Decoders

5 lectures ∙ (1hr 15min)

Section 5. Serving Engine, vLLM

6 lectures ∙ (1hr 34min)

Section 6. Runpod & Service Development

6 lectures ∙ (1hr 33min)

Section 7. Tool Calling

5 lectures ∙ (1hr 26min)

Section 8. Performance Testing and Monitoring

4 lectures ∙ (1hr 33min)

Section 9. Multi-GPU

5 lectures ∙ (1hr 13min)

Section 10. Learn more about vLLM

4 lectures ∙ (56min)

Section 11. AI Development Trends

6 lectures ∙ (1hr 45min)

Published: 04/02/2026

Last updated: 05/26/2026

Reviews

All

16 reviews

5.0

16 reviews

kjunekjune0812
Reviews 3
∙
Average Rating 5.0
04/08/2026
Edited
5
43% enrolled
I am not in the deep learning industry, but I work in the field of computer vision (rule-based). Since my company requires LLM and vision-related deep learning technologies, I have been studying these topics. I have only completed about 40% of the course, but I felt compelled to leave a review now. I have taken many deep learning courses, including those by famous and highly-rated instructors, but I haven't found any course as clean and clear as this one. The best part is the quality of the lecture materials. The instructor recorded every single matrix calculation in Excel, which is incredibly helpful when reviewing. The Python code is also well-commented in many places. The quality of the lectures themselves is excellent; the instructor reminds you of parts you might have forgotten, ensuring you don't miss anything. While most other courses show calculations once or twice and then move on, this course goes through the calculations together until the end, which provides great clarity. It seems like the Q&A is monitored frequently, as I received immediate answers to my questions. The lectures seem to have been filmed this year, so it's great that they include a lot of the latest trends. It feels like this course hasn't gained much word-of-mouth yet, but I highly recommend it to anyone who needs to study these topics.
- hyunjinkim
  Instructor
  04/08/2026
  Hello Wonjune Lee, Thank you for your thoughtful review! I put a lot of thought into improving the quality of the lecture materials so that students could receive meaningful resources and review them effectively even later on. I also spent a lot of time thinking about how to effectively convey operations like Attention. The conclusion I reached was that it shouldn't be explained through formulas alone, nor through simple metaphors, nor just through torch code. Believing that it can only be understood by following the flow visually, I tried my best to explain it using Excel, and I'm glad to hear that it was conveyed well :) I hope you gain great insights from the remaining parts of the course. Keep it up!
jjhgwx
Reviews 986
∙
Average Rating 4.9
05/04/2026
5
7% enrolled
Thank you for the great lecture!
- hyunjinkim
  Instructor
  05/07/2026
  Hello Jang jaehoon, Thank you for the course review 👍 I see you've completed 7%. I hope you enjoy the rest of the lessons and find them very helpful. You can do it!
ec93030947
Reviews 4
∙
Average Rating 5.0
06/29/2026
5
31% enrolled
It's challenging, but I'll do my best to follow along. I like that the content contains only the essential core information.
boyminseo1183
Reviews 1
∙
Average Rating 5.0
06/14/2026
5
31% enrolled
It's very helpful.
- hyunjinkim
  Instructor
  06/16/2026
  Thank you for the review, Minseo Kim. I hope the lecture was very helpful ^-^
logt
Reviews 11
∙
Average Rating 5.0
06/08/2026
5
100% enrolled
I've completed the course! Thank you so much for providing such high-quality education!! Except for the notes I left in the Q&A, I had no issues performing all the exercises on a Windows-based system!
- hyunjinkim
  Instructor
  06/16/2026
  Hello logt! You left this review after completing 100% of the course! I will make sure to update the part regarding container issues on Windows OS that you mentioned in the Q&A. Thank you.

hyunjinkim's other courses

Check out other courses by the instructor!

Airflow Master Class

hyunjinkim

This is a course to learn about Airflow, an Orchestration tool for efficiently building and managing data pipelines. Welcome to the Airflow Master Class, where even beginners can learn step-by-step!

Basic

airflow, Data Engineering, Python

Airflow Master Class

hyunjinkim

Realtime Datalake Using Kafka & Spark

hyunjinkim

Beginner's Kafka & Spark Real-time Pipeline Course. All-in-one: Master concepts to architecture.

Basic

Kafka, Apache Spark, pyspark

Realtime Datalake Using Kafka & Spark

hyunjinkim

Similar courses

Explore other courses in the same field!

Google's Most Powerful AI Weapon, Gemini Business Automation A to Z – Google Workspace Secret Class

Masocampus

The Gemini agent that works across Google Drive! Put the finishing touches on your Google AI workflow with overwhelming productivity.

Beginner

gemini, AI, Business Productivity

Google's Most Powerful AI Weapon, Gemini Business Automation A to Z – Google Workspace Secret Class

Masocampus

Starting Manufacturing AI Implementation Without Coding

fleem826937

"We need to do DX/AX." It's always serious in the meeting room, but when you actually get down to the field, don't you feel lost about what to do first and how to do it? You've done a few PoCs, but they never transition to regular operations You bought solutions and equipment, but they're treated as cumbersome tools on-site There's a lot of talk about data and systems, but you can't figure out how to apply them to your current line or process There's no dedicated TFT or team, so you're a practitioner who has to handle both your main job and DX/AX... This course was created specifically for people like you.

Beginner

Starting Manufacturing AI Implementation Without Coding

fleem826937

Understanding and Utilizing ChatGPT

papadave

With ChatGPT, create your own content (songs, comics, blogs, books, papers, etc.) and create new businesses.

Basic

ChatGPT, LLM, NLP

Understanding and Utilizing ChatGPT

papadave

Jenspark AI Agent 4-Week Completion Challenge - Detailed Feature Descriptions from A to Z Like Nowhere Else in the World

seulkikang

Genspark Ambassador's Lecture >> Genspark, the AI Super Agent that smartly handles everything from PPT, Excel, images, and videos to Vibe Coding! You’ve been waiting because there were no lectures or books that covered Genspark in depth, right?! Rated 4.8 by 100 students ⭐️⭐️⭐️⭐️⭐️ After using it for a year, I have organized all the detailed features into extensive learning materials and a 4-hour lecture. If you know how to use it, Genspark solves all your work; if you don't, it just eats up your credits. Increase your productivity by 300% through this lecture!

Basic

PowerPoint, AI, AI Agent

Jenspark AI Agent 4-Week Completion Challenge - Detailed Feature Descriptions from A to Z Like Nowhere Else in the World

seulkikang

AI for Finance Professionals: Practical Job-Specific Prompts for Lending, Underwriting, PB, and Risk to Copy and Paste Immediately

Masocampus

Delegate the pressure of overflowing financial data and reports to an AI agent. From corporate analysis to report automation, an overwhelming workflow for financial experts that changes the game of practical business!

Beginner

Business Productivity, AI, prompt engineering

AI for Finance Professionals: Practical Job-Specific Prompts for Lending, Underwriting, PB, and Risk to Copy and Paste Immediately

Masocampus

Introduction to Claude AI for Business: Learning Step-by-Step from the Beginning

Masocampus

First steps with Claude AI: Doing it right from the start for immediate practical use! If you want to work smarter, start with Claude AI.

Beginner

AI, prompt engineering, claude

Introduction to Claude AI for Business: Learning Step-by-Step from the Beginning

Masocampus

History and Development of LLMs

arigaram

It explains in detail the various language models developed in the process, starting from the beginnings of natural language processing technology to the latest LLM models.

Beginner

NLP, RNN, self-attention

History and Development of LLMs

arigaram

Mastering Perplexity AI to Increase Work Productivity 100x

sarc

This course consists of a curriculum designed to achieve overwhelming efficiency in various fields such as planning, development, marketing, and job preparation by utilizing 'Perplexity AI,' an AI answer engine that goes beyond the limitations of traditional search engines.

Beginner

perplexity, prompt engineering, AI

Mastering Perplexity AI to Increase Work Productivity 100x

sarc

Enterprise LLM Adoption and Utilization Strategies for Our Organization

leejoon8212

This video shows effective ways to link internal company data to sLM, and practical solutions for various issues from sLM adoption.

Beginner

LLM, AI

Enterprise LLM Adoption and Utilization Strategies for Our Organization

leejoon8212

Let's Efficiently Manipulate AI! ChatGPT Prompt Engineering Part. 2

usefulit

Learn the core techniques of 'Prompt Engineering' to get exactly the answers you want from AI. From a complete beginner's first steps to expert practical applications, maximize your AI utilization skills!

Basic

ChatGPT, prompt engineering, AI

Let's Efficiently Manipulate AI! ChatGPT Prompt Engineering Part. 2

usefulit

Local LLM Utilization Guide Part 1 - Using small LLM(sLLM) & Evaluating and Improving LLM Performance

AISchool

Learn how to utilize various local LLMs (Qwen, Gemma) and explore different techniques to efficiently evaluate and improve the performance of LLM systems.

Intermediate

AI, LLM, LangChain

Local LLM Utilization Guide Part 1 - Using small LLM(sLLM) & Evaluating and Improving LLM Performance

AISchool

[Biz Cheat Sheet] Confusing Stablecoins: How Far Can They Be Used?

skmns

Reading the Standard of Digital Money — A New Financial Order Opened by Stablecoins In the uncertain digital asset market, stablecoins—the only assets that function "like real money"—have already emerged as a core part of the global financial infrastructure. This provides essential, condensed knowledge for professionals who want to be at the forefront of understanding the paradigm shift in the digital economy.

Beginner

Blockchain, Platform Business, Financial Engineering

[Biz Cheat Sheet] Confusing Stablecoins: How Far Can They Be Used?

skmns

90-Minute Complete Course: From LLM Agent Basics to Practice – Learn AI Agents Through Hands-on Experience

HappyAI

The era of AI simply providing answers is over. Now is the age of LLM Agents that make decisions and take actions on their own. This course is an introductory lecture where you learn the core principles and structure of agents by implementing them yourself in just 90 minutes of hands-on practice. With minimal complex theory and a code-focused practical flow, you can directly experience "how AI makes decisions and uses tools." Beyond prompt engineering, let's take the first step into AI automation together.

Basic

multi-agent, LLM, LangChain

90-Minute Complete Course: From LLM Agent Basics to Practice – Learn AI Agents Through Hands-on Experience

HappyAI

Cursor AI for 2x Productivity Java Spring Development

ishrhrl

"The Era of Coding with AI: A New Productivity Paradigm for Java Developers" This course is a practical Cursor AI utilization guide for Java developers. Beyond simply copying AI-generated code, you'll learn Vibe Coding methods that maximize development efficiency through collaboration with AI. By leveraging Cursor's powerful code understanding, refactoring, and documentation features, you'll master AI coding workflows step by step that can be immediately applied to real-world projects. In an era where AI writes code for us, "developers who can effectively work with AI" now have the competitive edge. Start your upgraded development experience as a Java developer with this course.

Basic

Java, Spring, Business Productivity

Cursor AI for 2x Productivity Java Spring Development

ishrhrl

FEEL THE AGI: LLM and AI Agents

Feel The AGI

LLM and Agent, the fundamental technologies behind AI like ChatGPT. LLM is currently the technology that has come closest to AGI. In this lecture, we will explore LLM and Agent technologies from a layperson's perspective.

Beginner

AI, LLM, AI Agent

FEEL THE AGI: LLM and AI Agents

Feel The AGI

Completing Reports with ChatGPT: A Practical AI Class for Logic, Planning, and Results

queensungju1785

Do you feel stuck every time you have to write a report? Wasting time not knowing what to write, revising over and over because the logic isn't clear, and barely finishing right before the deadline—this is a routine many professionals and students repeat. You may have tried using ChatGPT, but it often stops at simply writing sentences for you. It might produce "plausible writing," but it doesn't create a "result-driven report." This lecture is different. We will show you how to use ChatGPT not just as a task-delegation tool, but as a strategic partner that amplifies your planning capabilities. ✔ How to clearly structure problem definitions ✔ How to apply frameworks to design logical flow ✔ How to strengthen persuasiveness based on data ✔ Strategies to upgrade to performance-oriented reports During the lecture, you will actually complete one of your own work reports, providing a practical experience that leads to immediate results rather than simple theory. Furthermore, by utilizing the latest features like Canvas and file upload analysis, we help you build AI literacy to stay ahead in the rapidly evolving generative AI environment. After taking this course: Your report writing time will decrease, Your logic will become more solid, And your results will become clearer. Experience the moment when report writing is no longer a burden, but a weapon that proves your capabilities.

Beginner

AI, ChatGPT, gpt

Completing Reports with ChatGPT: A Practical AI Class for Logic, Planning, and Results

queensungju1785

Top 1% Data Literacy – Data-Driven Customer Growth Strategy Secret Class Completed with Only Excel and Orange

Masocampus

Customer analysis completed with Excel and Orange, from cross-selling to A/B testing and churn risk management all at once

Beginner

AI, Excel, MS-Office

Top 1% Data Literacy – Data-Driven Customer Growth Strategy Secret Class Completed with Only Excel and Orange

Masocampus

<From Scratch: Building and Learning LLMs> Commentary Lecture

haesunpark

This is a course covering the GitHub notebooks and bonus content from <Build a Large Language Model from Scratch> (Gilbut, 2025). GitHub: https://github.com/rickiepark/llm-from-scratch/ <Build a Large Language Model from Scratch> is the Korean translation of the bestseller <Build a Large Language Model (from Scratch)> (Manning, 2024) by Sebastian Raschka. This book provides a way to learn and utilize the operating principles of large language models by building a complete model starting from scratch with OpenAI's GPT-2 model.

Basic

PyTorch, gpt-2, transformer

<From Scratch: Building and Learning LLMs> Commentary Lecture

haesunpark

[Integrated] Introduction to AI Video Production + Practical Application

signboardkr2653

From the basics of AI video to completing a short film all at once?! All-in-one video production connecting principles, theory, and practice.

Beginner

Youtube Creator, Video Production, AI

[Integrated] Introduction to AI Video Production + Practical Application

signboardkr2653

①/② Understanding Smart Factory Structure Through Illustrations for Non-Majors in the AI Era (Introductory Course)

88888

"The Fourth Industrial Revolution, AI factories... The terms are familiar, but can't you visualize the structure?" Smart Factory, often called the future of manufacturing. While related articles pour out and it appears as a required competency in job postings, the barrier feels far too high for non-majors. "Did you give up because of abbreviations that sound like alien languages, such as PLC, MES, or ERP?" "Are you afraid to even start because you lack knowledge in coding or engineering?" Don't worry. This lecture deals with 'structure,' not 'engineering.' Instead of complex text, we will help you complete a full map of the smart factory in your head using intuitive drawings and diagrams, just like assembling Lego blocks.

Beginner

Big Data, Business Productivity, AI

①/② Understanding Smart Factory Structure Through Illustrations for Non-Majors in the AI Era (Introductory Course)

88888

Understanding LLM Architecture and GPU Utilization Strategies for AI Beginners

5.0

What you will gain after the course

In the era of AI Agents,practical skills for understanding AI systems are becoming increasingly important.

From Transformer-based LLM architecturesto GPU utilization, vLLM serving, and multi-GPU strategies

LLM Architecture Practical Class

3 models selected as a result of the 1st evaluation of domestic Sovereign AI

Therefore, it is now time to learn the architecture for serving LLMs directly.

🌟 From LLM Architecture to Serving

Lecture Core Structure

Core 1. Understanding Hugging Face Models

Learn how to decode the config.json file through this lecture.

Core 2. Mastering Attention

Gain a perfect understanding of the principles of attention and learn about its evolutionary flow.

Core 3. Mastering Multi-GPU Architecture

We will pass on essential GPU utilization strategies, a necessary gateway to becoming a core AI engineer.

😄 Recommended for these people

AI Beginners

AI Beginner

AI Engineer

💡 What you will learn in this lecture

Step 1. Foundation

Step 2. Attention

Step 3. Serving

Step 4. Tool Call

Step 5. Optimization

Step 6. Advanced

💡 Key Lecture Points

Point 1

Point 2

Point 3

Point 4

Point 5

Point 6

✅ Tools used in the lecture

✅ Server Practice Environment Guide

Runpod

Google Colab

✅ Local Practice Environment Guide

Runpod and Colab are used as the primary practice environments, but

You will be practicing by running OpenWebUI and FastAPI within your local environment..

⚠️ This course will be updated as vLLM is updated.

Don't miss out on the latest LLM trends.

Recommended for these people

HelloThis is hyunjinkim

Curriculum

Reviews

hyunjinkim's other courses

Similar courses

In the era of AI Agents,
practical skills for understanding AI systems are becoming increasingly important.

From Transformer-based LLM architectures
to GPU utilization, vLLM serving, and multi-GPU strategies

Recommended for
these people

Hello
This is hyunjinkim