Inflearn brand logo image
Inflearn brand logo image
Inflearn brand logo image
Data Science

/

Data Engineering

Learning by doing: Practical Spark Part 1

By the end of this course, you will be able to implement Apache Spark projects in your organization.

17 learners are taking this course

  • nexthumans
실습 중심
명령어
데이터엔지니어
데이터처리
Apache Spark
Big Data
Machine Learning(ML)
data-transformation

What you will learn!

  • Using Spark Core Commands

  • Spark-based Data Science

Spark Part 1: Follow along

Introduction to the course

"Follow-up Spark Part 1" is a practice-oriented lecture designed for everyone from learners who are new to data science to practitioners preparing for practical projects using Spark. This lecture is structured so that you can systematically learn everything from the basic concepts of Spark to practical applications, and especially focuses on commands and data processing methods essential for executing Spark projects.

@Apache Spark, @Big Data, @Machine Learning, @Data Engineering, @Data Transformation

Course Objectives

  • Spark Basics and Environment Settings : Learn how Spark works and how to configure the environment so you can use it efficiently in local and Docker environments.

  • Distributed Data Processing and Optimization : Learn the concepts of distributed processing in Spark, as well as data partitioning, shuffling, and cluster resource configuration, and lay the foundation for large-scale data processing.

  • Acquire hands-on data processing skills : Learn advanced data processing techniques by loading, transforming, filtering, and combining data using various Spark commands.

  • Develop data analysis and visualization skills : Analyze data and visualize the results using Spark's data frames and SQL commands.


Curriculum Structure

  1. Orientation

    • We introduce the concept and practical application of Spark and suggest a learning direction.

  2. Spark environment configuration

    • Learn how to install and configure Spark using a local environment and Docker to create a hands-on environment.

  3. Distributed processing concept

    • Learn how Spark processes large-scale data and the fundamentals of distributed processing.

  4. Understanding Spark in Action

    • Understand the core operating principles such as lazy operations, partitions, and shuffles visually through Jupyter Notebook and Spark UI.

  5. Spark Essential Commands in Practice

    • Learn commands frequently used in practice, such as data loading, date filtering, join, aggregation, UDF utilization, and data storage.

    • It also includes how to use SQL commands efficiently.

  6. Advanced Data Processing

    • Learn advanced techniques for handling common problems in practice, such as string data processing, null value handling, JSON data handling, and partition optimization.


Who is this course for?

  • Beginner learners who want to learn Spark from the basics to practical use.

  • Data engineers who want to learn data analysis and engineering techniques using Spark

  • Working professionals who are working on corporate Spark projects or building scalable data pipelines


Expected effects after taking the course

  • You can acquire data processing and analysis capabilities using Spark and secure the ability to carry out Spark projects in your company.

  • In practice, you will acquire the know-how to load, transform, and store data and efficiently process large amounts of data.

  • You will be able to build a solid foundation for Spark projects in cloud environments, which we will cover in Part 2.


If you are new to Spark or want to learn practical skills for data processing, "Practical Spark Part 1" is the perfect starting point. Let's move forward into the world of data science together! 🎓

Recommended for
these people

Who is this course right for?

  • People who are new to Spark

  • People who want to do a Spark corporate project

Need to know before starting?

  • Python Basics (Very Low Level)

Hello
This is

102

Learners

9

Reviews

16

Answers

4.9

Rating

3

Courses

현재 대기업 중심으로 아래와 같은 프로젝트의 개발책임 및 컨설팅을 맡고 있습니다. 현역^^입니다.

더불어, 고려대 대학원에서 인공지능 관련 겸임교수로도 활동하고 있습니다.

저의 목표는 실전에 바로 써먹을 수 있는 현장감 있는 프로그래밍 기술입니다. 앞으로 많은 여러분과 함께 재미난 수업 만들어 나가고 싶습니다.

  • 엔터프라이즈 인공지능 구조 및 서비스 설계

  • 머신러닝 서비스 구현

  • 벡엔드 서비스 개발

  • 클라우드(Azure) Databricks, ETL, Fabric 등 각종 클라우드 환경에서의 데이터베이스 구축 및 서비스 개발

Curriculum

All

48 lectures ∙ (9hr 57min)

Published: 
Last updated: 

Reviews

Not enough reviews.
Please write a valuable review that helps everyone!

$77.00

nexthumans's other courses

Check out other courses by the instructor!

Similar courses

Explore other courses in the same field!