강의

멘토링

커뮤니티

BEST
Data Science

/

Data Engineering

The Complete Guide to Spark Machine Learning - Part 1

If you want to be recognized as a machine learning expert based on large-scale data, from understanding the core framework of Spark machine learning, to SQL-based data processing through difficult practical problems, to data analysis through business domain analysis, and to the ability to implement optimized machine learning models, please join this course.

(4.9) 28 reviews

938 learners

  • dooleyz3525
한국에 이런 강의가?
압도적 분량
Apache Spark
Machine Learning(ML)
Big Data
Data Engineering

Reviews from Early Learners

What you will gain after the course

  • Implementing Machine Learning Models in Spark

  • Detailed understanding of DataFrame, the foundation of Spark's data processing

  • Understand the various technical elements that make up the Spark Machine Learning Framework

  • Mastering Spark's Machine Learning Pipeline

  • Ability to use SQL for data analysis

  • SQL-based Feature Engineering Techniques

  • Implementing models with XGBoost and LightGBM in Spark

  • Model hyperparameter tuning method based on Bayesian optimization

  • Improve your data analysis and ML model implementation skills simultaneously through challenging real-world problems.

  • Data analysis method based on analysis domain

  • Various data visualization techniques

[Notice] The Databricks Community Edition, which was freely provided as a cloud-based practice environment for this course, is no longer accepting new registrations. Therefore, please be informed that the practice environment has been changed to a local Spark and Jupyter environment as of December 5, 2025.

Since the changes to the local environment are limited to specific parts of the practice code, most of the lecture videos from Section 1 to Section 10 will continue to use the existing recorded videos from Databricks Community, and the course has been newly structured with practice videos on local Spark only for the major changes. Additionally, from Section 11 onwards, all practice videos will be on local Spark, and the course will be newly structured by January 15, 2026, so please keep this in mind when selecting the course.

Data Analysis + Feature Engineering + ML Implementation,
Master all three skills at once.

Apache Spark Meets
Machine Learning.

The strongest open-source large-scale distributed processing solution, Apache Spark, meets Machine Learning.

Many large domestic corporations and financial institutions are using Apache Spark to analyze large-scale data and build machine learning models. Since Spark is based on a distributed data processing framework, it can process large volumes of data and create ML models while scaling capacity across anywhere from a few to dozens of servers. This allows you to overcome the limitations of scikit-learn, which can only implement machine learning models on a single server.


We will help you grow into
a machine learning expert
skilled in data processing and analysis.

The 'Complete Guide to Spark Machine Learning - Part 1' course will help you grow into a machine learning expert proficient in data processing and analysis, going beyond just learning how to implement machine learning models in Spark.

To grow into a true machine learning expert, it's crucial not only to have ML implementation skills but also to develop the ability to process and combine business data to create ML models. To achieve this, you will learn through hands-on practice how to process data using SQL, which is most commonly used for handling large-scale data in real-world settings, and data analysis techniques based on business domain analysis.

This course is designed to help you develop data processing/analysis and ML implementation skills through detailed theoretical explanations and hands-on practice.


We solve the problems
you face.

Implementing machine learning models on a Spark-based platform is not easy. This is because you encounter many problems that existing data scientists and machine learning experts have not experienced before, such as unique machine learning APIs and frameworks based on Spark's architectural specificity, and SQL-based data processing.

Through this course, Spark Machine Learning Complete Guide, I will help you develop the ability to solve the problems you face.

The first half of the 'Spark Machine Learning Complete Guide - Part 1' course is

The first half of the course consists of detailed theoretical explanations and abundant hands-on practice covering various components that make up the Spark Machine Learning Framework, including DataFrame, SQL, Estimator, Transformer, Pipeline, and Evaluator. Through this, you will be able to implement ML models in Spark easily and quickly.

Additionally, I will explain in detail how to use XGBoost and LightGBM in Spark, and how to tune hyperparameters using HyperOpt based on Bayesian optimization.

The latter part of the 'Spark Machine Learning Complete Guide - Part 1' course is

Currently, the latter part of the course consists of a hands-on practice using Kaggle's Instacart Market Basket Analysis competition, but as the Instacart Market Basket Analysis competition has been removed from Kaggle, it will be changed to a hands-on practice using Kaggle's Home Credit Default Risk competition (scheduled to be completed by January 15, 2026)

Through implementing models for Kaggle's Home Credit Default Risk competition, a highly challenging competition, we will simultaneously improve your practical data processing/analysis skills and machine learning model implementation abilities.

Through this dataset, you will learn in detail how to process and analyze business data in SQL, how to perform Feature Engineering, how to derive analytical domains in business contexts, and how to build models based on these derived features.

💻 Please check before enrolling!

  • All practical code in this course is based on Python. Scala is not covered, so please take note of this before selecting the course.

Please check your
practice environment.

This course uses Docker to set up a hands-on environment based on local Spark and Jupyter. The practice environment is configured by installing Docker Desktop on your local PC, and the course is structured so that you can set up the practice environment without any problems even if you don't know Docker.

The practice code and lecture materials can be downloaded from 'Download Practice Code and Materials'.


This course requires
prerequisite knowledge.

This course is designed assuming that students have knowledge equivalent to Chapter 5 (Regression) of the Python Machine Learning Perfect Guide, and also have a very basic understanding of SQL. Please refer to these requirements when selecting this course.

Spark is good to know the basics of, but even if you don't, you should have no problem following the course.

Check out the prerequisite courses!

Python Machine Learning Perfect Guide

Stop theory-heavy machine learning lectures,
From core machine learning concepts to practical skills, easy and accurate.

Curious about the instructor's interview? (Click)

Recommended for
these people

Who is this course right for?

  • Anyone who wants to implement machine learning using Spark

  • Those who want to implement machine learning based on large-scale data

  • Anyone who wants to improve their data processing techniques for machine learning using SQL

  • Anyone who wants to learn the entire process of processing data into the desired format and creating an ML model based on it in practice

  • Anyone who wants to improve data analysis, feature engineering capabilities, and ML implementation

Need to know before starting?

  • Understanding up to Chapter 5 (Regression) of the Complete Guide to Python Machine Learning or equivalent prior knowledge

  • Understanding SQL Basics

Hello
This is

26,975

Learners

1,378

Reviews

4,011

Answers

4.9

Rating

14

Courses

(전) 엔코아 컨설팅

(전) 한국 오라클

AI 프리랜서 컨설턴트

파이썬 머신러닝 완벽 가이드 저자

Curriculum

All

122 lectures ∙ (24hr 23min)

Course Materials:

Lecture resources
Published: 
Last updated: 

Reviews

All

28 reviews

4.9

28 reviews

  • freedom07님의 프로필 이미지
    freedom07

    Reviews 7

    Average Rating 5.0

    5

    93% enrolled

    I first got to know Professor Kwon Chul-min through the Complete Guide to Python Machine Learning. Thanks to that lecture, I, a non-major, was able to not give up on this field that I had been thinking of giving up on. I am currently working in this field and studying steadily by taking Infraon lectures. I wanted to thank the teacher, so I first thanked the teacher in the Q&A session, and the teacher encouraged me that if I continued to study, I would be able to achieve what I had worked for. I plan to continue to listen to the teacher's lectures in the future. ^^ㅎㅎ He really teaches so well. Professor Kwon Chul-min, I would like to take this opportunity to sincerely thank you.

    • dooleyz3525
      Instructor

      I am even more impressed that you left such a touching review. I think I should be the one to thank you for the writing that instantly rewards the hard work you put into creating the lecture. If you continue to work hard like this, you will definitely achieve everything you want. Thank you.

  • egs41님의 프로필 이미지
    egs41

    Reviews 54

    Average Rating 5.0

    5

    10% enrolled

    It was good to focus on the instructor's diction and voice, and the content was solid. Please continue to make good lectures. Thank you.

    • iamcodingcat님의 프로필 이미지
      iamcodingcat

      Reviews 13

      Average Rating 5.0

      5

      54% enrolled

      I am a student who has been attending Kwon Chul-min's lecture series! Thank you for continuing to provide high-quality lectures! And I have seen several Spark lectures in Scala and Java, but this is the first time I have seen a lecture that teaches Spark in Python, so I think it was even better! Although I have not completed the course yet, I still like how he tries to teach simple grammar as easily as possible! And I also like how he provides various practice materials to encourage repeated mastery! I look forward to other lectures in the future!

      • gomjong님의 프로필 이미지
        gomjong

        Reviews 8

        Average Rating 4.9

        5

        100% enrolled

        Thanks to you, I learned about Spark and gained confidence in Kaggle challenges. Thank you!

        • indizz4933님의 프로필 이미지
          indizz4933

          Reviews 1

          Average Rating 5.0

          5

          100% enrolled

          Thank you for explaining it step by step.

          $77.00

          dooleyz3525's other courses

          Check out other courses by the instructor!

          Similar courses

          Explore other courses in the same field!