Spark Machine Learning Complete Guide - Part 1

If you want to be recognized as a machine learning expert in large-scale data environments—from understanding the core framework of Spark Machine Learning to SQL-based data processing through high-difficulty practical problems, and the ability to implement optimized machine learning models through business domain analysis—join this course.

(4.9) 29 reviews

948 learners

Level Intermediate

Course period Unlimited

Apache Spark
Apache Spark
Machine Learning(ML)
Machine Learning(ML)
Big Data
Big Data
Data Engineering
Data Engineering
Apache Spark
Apache Spark
Machine Learning(ML)
Machine Learning(ML)
Big Data
Big Data
Data Engineering
Data Engineering

Reviews from Early Learners

Reviews from Early Learners

4.9

5.0

freedom07

93% enrolled

I first got to know Professor Kwon Chul-min through the Complete Guide to Python Machine Learning. Thanks to that lecture, I, a non-major, was able to not give up on this field that I had been thinking of giving up on. I am currently working in this field and studying steadily by taking Infraon lectures. I wanted to thank the teacher, so I first thanked the teacher in the Q&A session, and the teacher encouraged me that if I continued to study, I would be able to achieve what I had worked for. I plan to continue to listen to the teacher's lectures in the future. ^^ㅎㅎ He really teaches so well. Professor Kwon Chul-min, I would like to take this opportunity to sincerely thank you.

5.0

egs41

10% enrolled

It was good to focus on the instructor's diction and voice, and the content was solid. Please continue to make good lectures. Thank you.

5.0

밑바닥개발자

54% enrolled

I am a student who has been attending Kwon Chul-min's lecture series! Thank you for continuing to provide high-quality lectures! And I have seen several Spark lectures in Scala and Java, but this is the first time I have seen a lecture that teaches Spark in Python, so I think it was even better! Although I have not completed the course yet, I still like how he tries to teach simple grammar as easily as possible! And I also like how he provides various practice materials to encourage repeated mastery! I look forward to other lectures in the future!

What you will gain after the course

  • Implementing Machine Learning Models in Spark

  • A detailed understanding of DataFrame, the foundation of Spark's data processing

  • Understanding various technical elements that constitute the Spark Machine Learning Framework

  • Mastering Spark's Machine Learning Pipelines

  • SQL proficiency for data analysis

  • SQL-based Feature Engineering Techniques

  • Implementing models with XGBoost and LightGBM in Spark

  • Model hyperparameter tuning method based on Bayesian optimization

  • Simultaneously improve data analysis and ML model implementation skills through high-difficulty practical problems.

  • Data analysis method based on analysis domains

  • Various data visualization techniques

[Notice] Databricks Community Edition, which was provided for free as the practice environment for this course, is no longer accepting new sign-ups. Therefore, please be advised that the practice environment will be changed to a local Spark and Jupyter environment as of December 5, 2025.

Since the changes to the practice code due to the transition to a local environment are limited to certain parts, most lecture videos from Section 1 to Section 10 will continue to use the existing recordings from Databricks Community, while new lecture videos in the local Spark environment have been added only for major changes. From Section 11 onwards, many lectures have been replaced with practice videos in the local Spark environment.

Please note when choosing lectures that the current course is composed of a mix of existing recorded videos based on Databricks Community and new videos based on local Spark.

 

Data analysis + feature engineering + ML implementation,
master all three skills at once.

The encounter between Apache Spark and
Machine Learning.

Apache Spark, the leader in open-source large-scale distributed processing solutions, has met Machine Learning.

Many large corporations and financial institutions in Korea utilize Apache Spark to analyze massive amounts of data and build machine learning models. Since Spark is based on a distributed data processing framework, it can process large-scale data and create ML models while scaling capacity across anywhere from a few to dozens of servers. Therefore, it allows you to overcome the limitations of Scikit-learn, which can only implement machine learning models on a single server.


We will help you grow into a machine learning expert
who is also proficient in
data processing and analysis.

The 'Spark Machine Learning Complete Guide - Part 1' course goes beyond learning how to implement machine learning models in Spark and will help you grow into a machine learning expert who is also proficient in data processing and analysis.

To grow into a true machine learning expert, it is crucial to possess not only the ability to implement ML models but also the skill to process and combine business data to create those models. To this end, you will learn how to process data using SQL, which is most commonly used for large-scale data processing in practice, and data analysis techniques based on business domain analysis through hands-on exercises.

The curriculum is designed to help you build data processing/analysis and ML implementation skills through detailed theoretical explanations and hands-on practice.


We will solve the problems
you will face.

Implementing machine learning models on Spark is not easy. This is because you encounter many problems that existing data scientists or machine learning experts have not experienced, such as unique machine learning APIs and frameworks based on the specificities of the Spark architecture, and data processing based on SQL.

Through this course, Spark Machine Learning Perfect Guide, I will help you develop the ability to solve the problems you encounter.

The first half of the 'Spark Machine Learning Perfect Guide - Part 1' course is

The first half of the lecture consists of detailed theoretical explanations and extensive hands-on practice regarding various elements that make up the Spark Machine Learning Framework, such as DataFrame, SQL, Estimator, Transformer, Pipeline, and Evaluator. Through this, you will be able to easily and quickly implement ML models in Spark.

Additionally, I will provide detailed explanations on how to use LightGBM in Spark and how to tune hyperparameters using HyperOpt based on Bayesian optimization.

The latter half of the 'Spark Machine Learning Guide - Part 1' course is

The latter half of the lecture consists of a hands-on practice of Kaggle's Instacart Market Basket Analysis competition.

Through the model implementation of Kaggle's Instacart Market Basket Analysis competition, a highly challenging contest, we will simultaneously improve your practical data processing/analysis skills and machine learning model implementation capabilities.

Through this dataset, you will learn in detail how to process and analyze business data and perform feature engineering based on SQL, how to derive analysis domains from business operations, and how to create models based on these derived features.

💻 Please check before taking the course!

  • All practice codes in this course are based on Python. Please note that Scala is not covered before choosing this course.

Please check the
practice environment.

This course uses Docker to set up a practice environment based on local Spark and Jupyter. The practice environment is configured by installing Docker Desktop on your local PC, and the course is designed so that you will have no problem setting up the environment even if you are not familiar with Docker.

Lecture practice codes and lecture explanatory materials can be downloaded from '실습코드와 설명자료 다운로드 받기'.


Prior knowledge is
required for this course.

This course is designed with the assumption that students possess knowledge of Chapter 5 (Regression) of the Python Machine Learning Guide or equivalent expertise, as well as a very basic understanding of SQL, so please keep this in mind when choosing the course.

It is helpful if you know the basics of Spark, but you should have no trouble following the lecture even if you don't.

Please check the prerequisite courses!

Python Machine Learning Guide

Stop theory-oriented machine learning lectures,
learn everything from core machine learning concepts to practical skills easily and accurately.

Curious about the instructor's interview? (Click)

Recommended for
these people

Who is this course right for?

  • Those who wish to implement machine learning using Spark

  • Those who wish to implement machine learning based on large-scale data

  • Those who wish to improve data processing techniques for machine learning using SQL

  • Those who want to master the entire process of processing data into a desired format and building ML models based on it in a practical setting.

  • Those who want to simultaneously improve their data analysis, feature engineering skills, and ML implementation.

Need to know before starting?

  • Understanding up to Chapter 5 (Regression) of "Python Machine Learning Perfect Guide" or equivalent prerequisite knowledge.

  • Basic Understanding of SQL

Hello
This is dooleyz3525

27,852

Learners

1,498

Reviews

4,067

Answers

4.9

Rating

15

Courses

(Former) Encore Consulting | (Former) Oracle Korea | Author of "Python Machine Learning Perfect Guide"

AI Freelance Consultant

 

More

Curriculum

All

132 lectures ∙ (25hr 1min)

Course Materials:

Lecture resources
Published: 
Last updated: 

Reviews

All

29 reviews

4.9

29 reviews

  • egs41님의 프로필 이미지
    egs41

    Reviews 54

    Average Rating 5.0

    5

    10% enrolled

    It was good to focus on the instructor's diction and voice, and the content was solid. Please continue to make good lectures. Thank you.

    • indizz4933님의 프로필 이미지
      indizz4933

      Reviews 1

      Average Rating 5.0

      5

      100% enrolled

      Thank you for explaining it step by step.

      • iamcodingcat님의 프로필 이미지
        iamcodingcat

        Reviews 13

        Average Rating 5.0

        5

        54% enrolled

        I am a student who has been attending Kwon Chul-min's lecture series! Thank you for continuing to provide high-quality lectures! And I have seen several Spark lectures in Scala and Java, but this is the first time I have seen a lecture that teaches Spark in Python, so I think it was even better! Although I have not completed the course yet, I still like how he tries to teach simple grammar as easily as possible! And I also like how he provides various practice materials to encourage repeated mastery! I look forward to other lectures in the future!

        • gomjong님의 프로필 이미지
          gomjong

          Reviews 8

          Average Rating 4.9

          5

          100% enrolled

          Thanks to you, I learned about Spark and gained confidence in Kaggle challenges. Thank you!

          • freedom07님의 프로필 이미지
            freedom07

            Reviews 7

            Average Rating 5.0

            5

            93% enrolled

            I first got to know Professor Kwon Chul-min through the Complete Guide to Python Machine Learning. Thanks to that lecture, I, a non-major, was able to not give up on this field that I had been thinking of giving up on. I am currently working in this field and studying steadily by taking Infraon lectures. I wanted to thank the teacher, so I first thanked the teacher in the Q&A session, and the teacher encouraged me that if I continued to study, I would be able to achieve what I had worked for. I plan to continue to listen to the teacher's lectures in the future. ^^ㅎㅎ He really teaches so well. Professor Kwon Chul-min, I would like to take this opportunity to sincerely thank you.

            • dooleyz3525
              Instructor

              I am even more impressed that you left such a touching review. I think I should be the one to thank you for the writing that instantly rewards the hard work you put into creating the lecture. If you continue to work hard like this, you will definitely achieve everything you want. Thank you.

          dooleyz3525's other courses

          Check out other courses by the instructor!

          Similar courses

          Explore other courses in the same field!

          $77.00