강의

멘토링

커뮤니티

BEST
Data Science

/

Data Engineering

The Complete Guide to Spark Machine Learning - Part 1

If you want to be recognized as a machine learning expert based on large-scale data, from understanding the core framework of Spark machine learning, to SQL-based data processing through difficult practical problems, to data analysis through business domain analysis, and to the ability to implement optimized machine learning models, please join this course.

(4.9) 28 reviews

937 learners

  • dooleyz3525
한국에 이런 강의가?
압도적 분량
Apache Spark
Machine Learning(ML)
Big Data
Data Engineering

Reviews from Early Learners

What you will gain after the course

  • Implementing Machine Learning Models in Spark

  • Detailed understanding of DataFrame, the foundation of Spark's data processing

  • Understand the various technical elements that make up the Spark Machine Learning Framework

  • Mastering Spark's Machine Learning Pipeline

  • Ability to use SQL for data analysis

  • SQL-based Feature Engineering Techniques

  • Implementing models with XGBoost and LightGBM in Spark

  • Model hyperparameter tuning method based on Bayesian optimization

  • Improve your data analysis and ML model implementation skills simultaneously through challenging real-world problems.

  • Data analysis method based on analysis domain

  • Various data visualization techniques

[Notice] The Databricks Community Edition, which has been provided free of charge as the hands-on environment for this course, has suddenly stopped accepting new registrations. Therefore, we announce that the hands-on environment will be changed to the paid Databricks Free Edition starting November 2025. Please consider this when selecting the course.

Data analysis + feature engineering + ML implementation,
master all three skills at once.

The meeting of Apache Spark and
machine learning.

The ultimate open source large-scale distributed processing solution Apache Spark meets Machine Learning.

Many large corporations and financial institutions in Korea are using Apache Spark to analyze big data and create machine learning models. Since Spark is based on a distributed data processing framework, it can process large volumes of data and create ML models by scaling capacity across anywhere from a few to dozens of servers. This allows us to overcome the limitations of scikit-learn, which can only implement machine learning models on a single server.


Skilled in data processing/analysis as well,
we will help you grow into
a machine learning expert.

The 'Spark Machine Learning Complete Guide - Part 1' course will help you grow into a machine learning expert skilled in data processing and analysis, going beyond just learning how to implement machine learning models in Spark.

To grow into a true machine learning expert, it's crucial to develop not only ML implementation skills but also the ability to process and combine business data to create ML models. For this purpose, you will learn through hands-on practice how to process data using SQL, which is most widely used for handling large-scale data in real-world applications, and data analysis techniques based on business domain analysis.

Structured to help you develop data processing/analysis and ML implementation skills through detailed theoretical explanations and hands-on practice.


We solve the problems
you face.

Implementing machine learning models on a Spark foundation is not easy. This is because you encounter many problems that existing data scientists or machine learning experts have never experienced before, such as unique machine learning APIs and frameworks based on Spark architecture's special characteristics, and SQL-based data processing.

Through this course, Spark Machine Learning Complete Guide, I will help you develop the ability to solve the problems you encounter.

The first half of the 'Spark Machine Learning Complete Guide - Part 1' course is

The first half of the course consists of detailed theoretical explanations and abundant hands-on practice covering various components that make up the Spark Machine Learning Framework, including DataFrame, SQL, Estimator, Transformer, Pipeline, Evaluator, and more. Through this, you will be able to implement ML models in Spark easily and quickly.

Additionally, I will provide detailed explanations on how to use XGBoost and LightGBM in Spark, and how to tune hyperparameters using HyperOpt based on Bayesian optimization.

The latter part of the 'Spark Machine Learning Complete Guide - Part 1' course is

The latter part of the course will enhance both your practical data processing/analysis skills and machine learning model implementation abilities through hands-on practice with Kaggle's Instacart Market Basket Analysis competition. The Kaggle Instacart competition is a high-difficulty competition, and the dataset is particularly composed of e-commerce order processing tables (products, orders, order products).

Through this dataset, you will learn in detail how to process and analyze business data based on SQL and perform Feature Engineering, how to derive analysis domains in business contexts, and how to create models based on the features derived in this way.

The 'Complete Guide to Spark Machine Learning' course being released this time is Part 1. The Part 2 course is scheduled to be released later and will cover text analysis, recommendation, and time series analysis.

💻 Please check before taking the course!

  • All practical code in this course is based on Python. Scala is not covered, so please refer to this before selecting the course.

Please check your
practice environment.

The hands-on practice uses Databricks. Databricks provides a notebook environment where you can create Spark-based applications in the cloud without installing Spark.

The Databricks Community Edition, which has been provided free of charge as the practice environment for this course, no longer accepts new registrations. Therefore, we would like to inform you that the practice environment will be changed to the paid Databricks Free Edition starting November 2025.

We will provide detailed information about the practical training costs when using Databricks Free Edition after testing until the end of November 2025.

You can download the lecture practice code and lecture explanation materials from 'Download Practice Code and Explanation Materials'.


Prerequisites are
required for this course.

This course is designed assuming that students have knowledge of Chapter 5 (Regression) from the Python Machine Learning Complete Guide or equivalent knowledge, and also have understanding of very basic aspects of SQL. Please refer to the above requirements when selecting this course.

It would be good if you know the basics of Spark, but even if you don't, you shouldn't have any problems following the course.

Please check the prerequisite courses!

Python Machine Learning Complete Guide

Stop with theory-heavy machine learning lectures,
from core machine learning concepts to practical skills, easily and accurately.

Curious about the knowledge creator's interview? (Click)

Recommended for
these people

Who is this course right for?

  • Anyone who wants to implement machine learning using Spark

  • Those who want to implement machine learning based on large-scale data

  • Anyone who wants to improve their data processing techniques for machine learning using SQL

  • Anyone who wants to learn the entire process of processing data into the desired format and creating an ML model based on it in practice

  • Anyone who wants to improve data analysis, feature engineering capabilities, and ML implementation

Need to know before starting?

  • Understanding up to Chapter 5 (Regression) of the Complete Guide to Python Machine Learning or equivalent prior knowledge

  • Understanding SQL Basics

Hello
This is

26,923

Learners

1,367

Reviews

4,010

Answers

4.9

Rating

14

Courses

(전) 엔코아 컨설팅

(전) 한국 오라클

AI 프리랜서 컨설턴트

파이썬 머신러닝 완벽 가이드 저자

Curriculum

All

122 lectures ∙ (24hr 53min)

Course Materials:

Lecture resources
Published: 
Last updated: 

Reviews

All

28 reviews

4.9

28 reviews

  • freedom07님의 프로필 이미지
    freedom07

    Reviews 7

    Average Rating 5.0

    5

    93% enrolled

    파이썬 머신러닝 완벽가이드 통해서 권철민선생님을 처음 알게 되었습니다. 그 강의를 통해서 비전공자였던 저는 포기하려고 했던 이 분야를 포기하지 않을 수 있었습니다. 현재 이 분야에서 일을 하면서 이렇게 인프런 강의를 들으며 공부도 꾸준히 하고 있습니다. 선생님께 감사하다는 말씀을 전하고 싶어서 처음에 질문답변 사안에 선생님께 감사하다는 말씀을 드렸었는데, 선생님께서 꾸준히 하면 노력한 바를 이룰 수 있을 거라고 응원하면서 말씀해주셨습니다. 앞으로도 선생님께서 강의하시는 것 꾸준히 들을 예정입니다. ^^ㅎㅎ 그만큼 정말 잘 가르쳐주십니다. 권철민 선생님 이 자리를 빌러, 진심으로 정말 감사합니다.

    • 권 철민
      Instructor

      이렇게 가슴 뭉클한 수강평을 남겨 주시다니 제가 더 감명 받았습니다. 강의를 만드는 수고를 한 순간에 보상받는 글이여서 제가 오히려 감사드려야 할 것 같습니다. 앞으로도 계속 이렇게 정진하신다면, 원하는 모든 일 확실히 다 성취 하실 것입니다. 감사합니다.

  • egs41님의 프로필 이미지
    egs41

    Reviews 54

    Average Rating 5.0

    5

    10% enrolled

    강사님의 딕션과 목소리에 집중하기 좋았고, 컨텐츠 또한 탄탄합니다. 앞으로도 좋은 강의 만들어주세요. 감사합니다.

    • 밑바닥개발자님의 프로필 이미지
      밑바닥개발자

      Reviews 13

      Average Rating 5.0

      5

      54% enrolled

      권철민님 강의 시리즈를 쭉 들어온 수강생입니다! 여전히 양질의 강의를 제공해주셔서 감사합니다! 그리고 Spark 강의가 Scala, Java로 구성된 강의들을 몇 번 보았지만 Python으로 Spark를 알려주시는 강의는 처음이어서 더 좋았던 것 같네요! 아직 완강하지는 않았지만, 여전히 간단한 문법도 최대한 쉽게 알려주시려고 하는 게 가장 좋네요! 그리고 반복 숙달을 유도하기 위해 다양한 실습자료를 제공해주시는 것도 좋습니다! 앞으로 다른 강의들도 기대가 됩니다!

      • kjo19990606님의 프로필 이미지
        kjo19990606

        Reviews 8

        Average Rating 4.9

        5

        100% enrolled

        덕분에 spark에 대해서 알게되었고 캐글도전에도 자신감을 얻게 되었스빈다 감사합니다 !

        • 인디즈님의 프로필 이미지
          인디즈

          Reviews 1

          Average Rating 5.0

          5

          100% enrolled

          차근차근 잘 알려주셔서 감사합니다

          Limited time deal ends in 07:20:31

          $74,250.00

          25%

          $77.00

          dooleyz3525's other courses

          Check out other courses by the instructor!

          Similar courses

          Explore other courses in the same field!