If you want to be recognized as a machine learning expert in large-scale data environments—from understanding the core framework of Spark Machine Learning to SQL-based data processing through high-difficulty practical problems, and the ability to implement optimized machine learning models through business domain analysis—join this course.
I first got to know Professor Kwon Chul-min through the Complete Guide to Python Machine Learning. Thanks to that lecture, I, a non-major, was able to not give up on this field that I had been thinking of giving up on.
I am currently working in this field and studying steadily by taking Infraon lectures. I wanted to thank the teacher, so I first thanked the teacher in the Q&A session, and the teacher encouraged me that if I continued to study, I would be able to achieve what I had worked for.
I plan to continue to listen to the teacher's lectures in the future. ^^ㅎㅎ He really teaches so well.
Professor Kwon Chul-min, I would like to take this opportunity to sincerely thank you.
5.0
egs41
10% enrolled
It was good to focus on the instructor's diction and voice, and the content was solid. Please continue to make good lectures. Thank you.
5.0
밑바닥개발자
54% enrolled
I am a student who has been attending Kwon Chul-min's lecture series! Thank you for continuing to provide high-quality lectures! And I have seen several Spark lectures in Scala and Java, but this is the first time I have seen a lecture that teaches Spark in Python, so I think it was even better! Although I have not completed the course yet, I still like how he tries to teach simple grammar as easily as possible! And I also like how he provides various practice materials to encourage repeated mastery! I look forward to other lectures in the future!
What you will gain after the course
Implementing Machine Learning Models in Spark
A detailed understanding of DataFrame, the foundation of Spark's data processing
Understanding various technical elements that constitute the Spark Machine Learning Framework
Mastering Spark's Machine Learning Pipelines
SQL proficiency for data analysis
SQL-based Feature Engineering Techniques
Implementing models with XGBoost and LightGBM in Spark
Model hyperparameter tuning method based on Bayesian optimization
Simultaneously improve data analysis and ML model implementation skills through high-difficulty practical problems.
Data analysis method based on analysis domains
Various data visualization techniques
[Notice] Databricks Community Edition, which was provided for free as the practice environment for this course, is no longer accepting new sign-ups. Therefore, please be advised that the practice environment will be changed to a local Spark and Jupyter environment as of December 5, 2025.
Since the changes to the practice code due to the transition to a local environment are limited to certain parts, most lecture videos from Section 1 to Section 10 will continue to use the existing recordings from Databricks Community, while new lecture videos in the local Spark environment have been added only for major changes. From Section 11 onwards, many lectures have been replaced with practice videos in the local Spark environment.
Please note when choosing lectures that the current course is composed of a mix of existing recorded videos based on Databricks Community and new videos based on local Spark.
Data analysis + feature engineering + ML implementation, master all three skills at once.
The encounter between Apache Spark and Machine Learning.
Apache Spark, the leader in open-source large-scale distributed processing solutions, has met Machine Learning.
Many large corporations and financial institutions in Korea utilize Apache Spark to analyze massive amounts of data and build machine learning models. Since Spark is based on a distributed data processing framework, it can process large-scale data and create ML models while scaling capacity across anywhere from a few to dozens of servers. Therefore, it allows you to overcome the limitations of Scikit-learn, which can only implement machine learning models on a single server.
We will help you grow into a machine learning expert who is also proficient in data processing and analysis.
The 'Spark Machine Learning Complete Guide - Part 1' course goes beyond learning how to implement machine learning models in Spark and will help you grow into a machine learning expert who is also proficient in data processing and analysis.
To grow into a true machine learning expert, it is crucial to possess not only the ability to implement ML models but also the skill to process and combine business data to create those models. To this end, you will learn how to process data using SQL, which is most commonly used for large-scale data processing in practice, and data analysis techniques based on business domain analysis through hands-on exercises.
The curriculum is designed to help you build data processing/analysis and ML implementation skills through detailed theoretical explanations and hands-on practice.
We will solve the problems you will face.
Implementing machine learning models on Spark is not easy. This is because you encounter many problems that existing data scientists or machine learning experts have not experienced, such as unique machine learning APIs and frameworks based on the specificities of the Spark architecture, and data processing based on SQL.
Through this course, Spark Machine Learning Perfect Guide, I will help you develop the ability to solve the problems you encounter.
The first half of the 'Spark Machine Learning Perfect Guide - Part 1' course is
The first half of the lecture consists of detailed theoretical explanations and extensive hands-on practice regarding various elements that make up the Spark Machine Learning Framework, such as DataFrame, SQL, Estimator, Transformer, Pipeline, and Evaluator. Through this, you will be able to easily and quickly implement ML models in Spark.
Additionally, I will provide detailed explanations on how to use LightGBM in Spark and how to tune hyperparameters using HyperOpt based on Bayesian optimization.
The latter half of the 'Spark Machine Learning Guide - Part 1' course is
The latter half of the lecture consists of a hands-on practice of Kaggle's Instacart Market Basket Analysis competition.
Through the model implementation of Kaggle's Instacart Market Basket Analysis competition, a highly challenging contest, we will simultaneously improve your practical data processing/analysis skills and machine learning model implementation capabilities.
Through this dataset, you will learn in detail how to process and analyze business data and perform feature engineering based on SQL, how to derive analysis domains from business operations, and how to create models based on these derived features.
💻 Please check before taking the course!
All practice codes in this course are based on Python. Please note that Scala is not covered before choosing this course.
Please check the practice environment.
This course uses Docker to set up a practice environment based on local Spark and Jupyter. The practice environment is configured by installing Docker Desktop on your local PC, and the course is designed so that you will have no problem setting up the environment even if you are not familiar with Docker.
Lecture practice codes and lecture explanatory materials can be downloaded from '실습코드와 설명자료 다운로드 받기'.
Prior knowledge is required for this course.
This course is designed with the assumption that students possess knowledge of Chapter 5 (Regression) of the Python Machine Learning Guide or equivalent expertise, as well as a very basic understanding of SQL, so please keep this in mind when choosing the course.
It is helpful if you know the basics of Spark, but you should have no trouble following the lecture even if you don't.
Please check the prerequisite courses!
Python Machine Learning Guide
Stop theory-oriented machine learning lectures, learn everything from core machine learning concepts to practical skills easily and accurately.
I am a student who has been attending Kwon Chul-min's lecture series! Thank you for continuing to provide high-quality lectures! And I have seen several Spark lectures in Scala and Java, but this is the first time I have seen a lecture that teaches Spark in Python, so I think it was even better! Although I have not completed the course yet, I still like how he tries to teach simple grammar as easily as possible! And I also like how he provides various practice materials to encourage repeated mastery! I look forward to other lectures in the future!
I first got to know Professor Kwon Chul-min through the Complete Guide to Python Machine Learning. Thanks to that lecture, I, a non-major, was able to not give up on this field that I had been thinking of giving up on.
I am currently working in this field and studying steadily by taking Infraon lectures. I wanted to thank the teacher, so I first thanked the teacher in the Q&A session, and the teacher encouraged me that if I continued to study, I would be able to achieve what I had worked for.
I plan to continue to listen to the teacher's lectures in the future. ^^ㅎㅎ He really teaches so well.
Professor Kwon Chul-min, I would like to take this opportunity to sincerely thank you.
I am even more impressed that you left such a touching review. I think I should be the one to thank you for the writing that instantly rewards the hard work you put into creating the lecture. If you continue to work hard like this, you will definitely achieve everything you want. Thank you.