
Developing LLM Applications Using RAG (feat. LangChain)
jasonkang
RAG. Learn from Silicon Valley GenAI Hackathon Winner. Packed with real-world know-how.
Basic
LLM, RAG, LangChain
Are you anxious every time you deploy an AI agent? Based on experience with major domestic corporations and global big tech companies, we will show you how to systematically measure and improve agent quality using LangSmith.
153 learners
Level Intermediate
Course period Unlimited
AI Agent-Specific Evaluation Methodologies and Practical Know-How
Establishing a "data"-driven decision-making system rather than one based on "intuition"
Dramatic reduction in development and testing costs
Error resolution and debugging techniques for real-world service operations
I only changed one prompt, but the feature that used to work fine is suddenly lagging.
I upgraded to the latest model because they said it was smarter, but it feels like the performance has actually dropped compared to before.
I've improved the features, but I don't know how much testing is needed to feel confident about deploying.
I feel lost on how to explain the agent's performance to my team leader, who is asking about it right before deployment.
There is only one reason why we hesitate.
When we change prompts, models, or logic,
we lack the confidence that the overall performance will truly improve.
AI agents have characteristics that are different from general software.
Because the results vary every time even with the same prompt, there is no guarantee that it will always be good just because it was good once.
Most problems handled by agents do not have a single correct answer. Therefore, quality cannot be captured by Pass/Fail alone.
Because agents constantly change due to prompt modifications, model updates, and changes in user input/patterns, continuous quality verification is necessary.
Ultimately,
That's why we're sharing
We cover the entire process that can be directly applied to practical work, from building datasets to evaluating agents and comparing performance according to the evaluation process.
Learn three methods for creating domain-specific evaluation data using AI.
Automatically generate question-answer QA datasets
Generate domain-specific data with custom prompts and tools
Expanding small-scale data into large-scale datasets
We will show you how to verify where and why an agent failed, using methods adopted by Anthropic, Google, and Amazon.
E2E is an evaluation method that determines the success or failure of the final result. However, for complex practical agents involving 10 to 20 steps, Component evaluation must be used alongside it. By verifying each step, you can pinpoint exactly whether the issue lies in the search or the tool selection, allowing for efficient debugging.
We introduce two methods to objectively compare and evaluate an agent's maximum performance and consistency.
A metric to check the maximum performance an agent can achieve
A metric to verify how consistently the agent operates
Section 1
Explains the definition of AI agent evaluation and why it is essential. Explores ways to improve the quality of AI services and reduce development and testing costs by establishing a data-driven decision-making system.
Section 2
This covers how to create a Golden Dataset. It includes hands-on practice in building datasets using LangSmith settings, custom agents, and various document types.
Section 3
Learn how to design evaluation metrics to measure the performance of AI agents. Analyze accuracy, document retrieval, and tool usage efficiency through end-to-end and component-specific evaluation methods.
Section 4
You will learn how to numerically analyze the maximum performance and reliability of agents using advanced metrics such as Pass@k and Pass^k. Through this, you will conduct an in-depth evaluation of the agent's potential and stability.
Those who feel anxious that existing functions might unexpectedly malfunction whenever they modify prompts to improve model performance
Those who are worried that overall service stability might decline during model updates,
and those who struggle with making decisions based on intuition without clear evaluation metrics.
Those who want to communicate based on specific data and metrics rather than "intuition"
when conveying performance improvement requirements for AI agents to their team.
Hands-on Environment
Python 3.13 or higher must be installed.
Prerequisite Knowledge and Important Notes
You must be familiar with the basic syntax of Python programming.
This is suitable for those who have experience in developing agents using LangChain and LangGraph.
If you are not familiar with LangChain syntax, please take Mastering LangChain Basics in One Hour↗️ first.
If you are not familiar with LangGraph syntax, please take AI Agent Development using LangGraph↗️ first.
Learning Materials
Lecture materials are provided via the Notion page↗️
Practice code and example datasets are provided via GitHub↗️
Who is this course right for?
A developer who feels anxious that every time they fix a single line of a prompt, another feature might break.
A planner who wants to make decisions based on data and metrics rather than 'feelings' when communicating with the development team
Developers who want to go beyond the basics and develop AI agents at a professional, practical level
Need to know before starting?
Python required
LangGraph Required
Inflearn Verified
Career Verified
19,040
Learners
1,499
Reviews
528
Answers
4.9
Rating
10
Courses
FAANG Senior Software Engineer
(Former) GS Group AI Agent platform development/operations
(Former) GS Group DX BootCamp Mentor/Coach
(Former) FAANG Senior Software Engineer (Former) GS Group AI Agent Platform Development/Operations (Former) GS Group DX BootCamp Mentor/Coach
(Former) Tech Lead at a Series C AI Startup
Stanford University Code in Place Python Instructor
Naver Boostcamp Web/Mobile Mentor
Naver Cloud YouTube Channel presenter
Author of Building Autonomous AI Agents with LangChain & LangGraph

Wanted Pre-onboarding Frontend/Backend Challenge Instructor (6,000+ cumulative participants)
Hanghae AI Plus Course 1st Generation Coach
All
18 lectures ∙ (3hr 16min)
All
6 reviews
5.0
6 reviews
Reviews 8
∙
Average Rating 5.0
Edited
5
Jason's courses are ones I always trust and sign up for. I have taken all of the instructor's LangChain-related courses, and thanks to them, I am currently working as a junior AI Engineer. I had been worrying a lot about evaluation in my actual work, and since this course was released at the perfect time, I am planning to learn and apply it quickly. Thank you for always providing high-quality lectures. Additionally, this is a separate question, but I just found out that you recently published a book. I haven't purchased it yet, but I'd like to ask if it's worth studying with the book even though I've already taken all the courses. Your lectures feel like having a great mentor because you always explain and share things from the student's perspective. Once again, thank you for the great lectures as always. :)
Hello Seonggyu! Thank you for the great feedback. I'm so proud to hear that taking this course helped you in your career as an AI engineer, as it feels like the effectiveness of the course has been proven. Thank you for sharing. The book does cover a slightly wider variety of evaluation strategies and methods than the course. However, since the course covers evaluation theory sufficiently, I don't think you necessarily need to purchase the book if you've already completed the lectures (I probably shouldn't be saying this as someone selling the book 😅). I look forward to seeing you again with another great course!
Ah. Honestly, I'm so grateful and it makes me trust you even more because you were so straightforward..!! :) I'll continue to sign up for the early bird courses first thing in the future. I look forward to working with you!
Reviews 4
∙
Average Rating 5.0
Reviews 5
∙
Average Rating 4.8
Reviews 1
∙
Average Rating 5.0
Reviews 8
∙
Average Rating 5.0
Check out other courses by the instructor!
Explore other courses in the same field!