inflearn logo

Big Data Cluster Construction Package; A Roadmap to Success

This is a code-lab-oriented course where you will directly build big data or distributed processing system clusters (HDFS, Zookeeper, Spark, Zeppelin) that guarantee High Availability.

(4.8) 21 reviews

119 learners

Level Basic

Course period Unlimited

Big Data
Big Data
Apache Spark
Apache Spark
Hadoop
Hadoop
Data Engineering
Data Engineering
cluster
cluster
Big Data
Big Data
Apache Spark
Apache Spark
Hadoop
Hadoop
Data Engineering
Data Engineering
cluster
cluster

Reviews from Early Learners

Reviews from Early Learners

4.8

5.0

귤껍데기

44% enrolled

I think this is a great course with a lot of content and is a good place to start. Thank you for preparing this course.

5.0

one831

19% enrolled

I am a computer science student who is about to graduate and is aspiring to become a data engineer. While creating a portfolio related to employment, I had many concerns about how to configure pipelines and architectures for processing big data, and how to set up an AWS environment to use it efficiently at the lowest cost possible. Through this lecture, I gained tremendous insight and know-how. In particular, I am glad that I gained a lot of knowledge about various frameworks that handle big data, and that I was inspired to delve into which direction I can go in the future. It was like a shower after a drought. I recommend this course to students who are aspiring to this field like me.

5.0

권영미

100% enrolled

thank you!

What you will gain after the course

  • Big Data Cluster Setup

  • Distributed File OR Processing System

  • High Availability

  • Hadoop

  • HDFS

  • Apache Spark

  • Apache Zeppelin

  • Apache Zookeeper

  • AWS (EC2, AMI, Security Group)

A Big Data Cluster Construction Package where you actually build a big data distributed cluster through Code Lab
👨🏻‍🎓

Hello, this is J.PHIL 🍏

As a semester has passed and a great opportunity has arisen, this season I am planning to conduct the 'Big Data Cluster Construction Package' course, where you will build a big data distributed cluster yourself 📚

Thanks to your support and inspired by the previous 'Big Data Pipeline Master' class, I have carefully crafted this course after much deliberation on whether there could be a lecture that, while somewhat challenging, is meaningful in a different way.

Keyword: Big Data Cluster, Distributed System, High Availability, Hadoop, HDFS, Apache Spark, Zookeeper, Zeppelin, AWS EC2 & AMI

 

Why should we take this course 🙇🏻

Over the past 10 years or so, radical technological advancements have led to the rapid emergence of various platforms and services. As the vast amounts of data derived from our daily lives through these platforms are utilized and analyzed, we are enjoying a high quality of life.

As shown in Figure 1 below, not only major domestic conglomerates but also global giants are publicly emphasizing the importance of Big Data Storage and Big Data Processing, and are requiring many engineers to possess a similar analysis and architecture skillset.

 
001.png

<F1. Countless companies worldwide are focusing on data processing>

 
002.png

<F2. Building Big Data Clusters in Numerous Fields>

However, until we actually enter the relevant INDUSTRY, it is not easy to gain experience in directly building or handling a BIG DATA CLUSTER in advance. Therefore, when an opportunity to gain meaningful value actually arises, you may face disappointing results due to a lack of experience in this area.

Back when I was a researcher writing a paper for a 'DATA TOPTIER CONFERENCE', I had to personally build a 50-node big data cluster. I spent fifteen days staying up day and night focusing solely on building that cluster, while enduring the pressure of leading by example for my members and the immense stress regarding the costs.

Of course, those precious experiences taught me a lot and became great nourishment for moving forward, but I do not want you to use your time so inefficiently. In other words, I created this course with the hope that you won't spend about 200 hours of your valuable time just building a cluster, but instead focus efficiently on conducting experiments or analyzing customer data on top of the cluster. 📝

Above all, I hope that after taking this course and gaining experience in building clusters, it will be of great help when you build big data clusters yourself in the industry or graduate school, and please note that this course has been released with unlimited access 💓

 

What will we be learning? 📚

📝

The experience of sweating over writing a Data Top-Tier Conference paper

👨🏻‍💼

Valuable big data system construction and analysis experience gained in the field

🧑🏻‍🏫

Experience of nurturing great students at a university for a long time

Based on these valuable experiences, I will help you build a ⚔️ powerful weapon in this field.

1. On top of the distributed file system HDFS, which guarantees high availability (refer to the Daemon examples below)

2. The Masterpiece of Big Data Systems, Apache Spark and the Big Data Notebook Zeppelin

We will build the cluster package ourselves through a combination of theory and solid hands-on code labs.

 

image.png

Do the high-availability file system daemon configurations above look a bit difficult? It's only natural for architecture and system diagrams to feel overwhelming when you see them for the first time.

However,

Reflecting on the valuable feedback from excellent students over the past 6 years and based on my experience launching two previous Inflearn courses, I have structured this with high-quality content that is as easy and accessible as possible, tailored to the students' level, so you can follow along comfortably.

special thanks to my lovely students 👨🏻‍🎓

 

Please tell me about the curriculum 🧑🏻‍🏫

Instead of jumping straight into the CODELAB, we will start by learning the theory required to build a high-availability cluster. Then, for students who are not familiar with AWS or Linux environments, we will study guide videos and background knowledge before proceeding with an in-depth codelab in earnest 😎

curri-1.jpg

 

Anyone interested in Big Data OR Distributed Processing can take this course 🧑🏻‍🎓

 

 

What is the practice environment like? 💻

You can follow the class perfectly fine even if you prepare a modest environment like the one below.

  • OS: Ubuntu 20.04 LTS

  • Editor: Vim (up to your preference)

  • Machine Specifications

    • AWS EC2 / c5.large (2 Core 4GB) 4 or 5 units

Please refer to the Course Curriculum for more details 😊

 

J.PHIL Introduction 👨‍👨‍👧‍👦

image.png

 

 

Recommended for
these people

Who is this course right for?

  • Students who want to gain hands-on experience in building a big data processing system cluster

  • A student who is interested in and seeking a career in data analysis and systems.

  • Developers who want to gain hands-on experience with high-availability clusters

  • A job seeker who wants to build strengths in the field of big data analysis and infrastructure.

Need to know before starting?

  • Basic Python Coding

  • Basic knowledge of Linux commands

  • Basic Database Knowledge

Hello
This is jphil

467

Learners

42

Reviews

50

Answers

4.9

Rating

2

Courses

Hello, I'm J.PHIL 🧑🏻‍🎓

As my first course, I have opened "Mastering Big Data Processing: Tools and Techniques for Success" for [ beginners interested in big data system construction and analysis ].

Please refer to the class details page for more information regarding the 'Class and Profile' 🙏🏻

More

Curriculum

All

36 lectures ∙ (4hr 51min)

Course Materials:

Lecture resources
Published: 
Last updated: 

Reviews

All

21 reviews

4.8

21 reviews

  • youngmikwon님의 프로필 이미지
    youngmikwon

    Reviews 3

    Average Rating 5.0

    5

    100% enrolled

    thank you!

    • jphil
      Instructor

      Hello, Kwon Young-mi, Thank you for your valuable course review! Fighting!

  • jasonking님의 프로필 이미지
    jasonking

    Reviews 2

    Average Rating 5.0

    5

    36% enrolled

    I'm listening to this lecture after listening to the previous pipeline lecture, and I like it because it's easy to understand~ Thank you for the compact and practical lecture~ I think I'll listen to this lecture quickly, but I'm looking forward to other lectures.

    • It took 2 days. Since it was in lab format, it progressed quickly, and it was difficult because the namenode did not start (I think it was because of a mistake somewhere) Later, I saw that the trouble shoot guide section organized the startup procedure script and log viewing section. If I had seen this, I would have recovered from the mistake a bit faster ㅜㅜ For those who are going to do it, it would be better to read it once and follow it rather than just following along~ Instructor. Thank you for the great lecture every time~

    • jphil
      Instructor

      Hello Jason.King, Thank you for taking my lecture diligently :) Sometimes, it will be helpful to experience bugs or troubleshooting yourself, think about them, and review them, so I think this experience will be of great help in the future. If you build a large open source yourself, you will be able to build a cluster, so if another open source comes out, you will be able to build it well in a short time. Fighting in the future.

  • upgleman8112423674님의 프로필 이미지
    upgleman8112423674

    Reviews 4

    Average Rating 5.0

    5

    31% enrolled

    This is a lecture that I highly recommend to beginners, from theory to code lab!! I highly recommend taking this as a mandatory lecture on building a big data cluster!!

    • jphil
      Instructor

      Hello Yeonwoo Jung, Thank you for your valuable review. I hope you will invest a day or two when you have the chance to practice with AWS and achieve good results. Happy New Year :)

  • won831님의 프로필 이미지
    won831

    Reviews 1

    Average Rating 5.0

    5

    19% enrolled

    I am a computer science student who is about to graduate and is aspiring to become a data engineer. While creating a portfolio related to employment, I had many concerns about how to configure pipelines and architectures for processing big data, and how to set up an AWS environment to use it efficiently at the lowest cost possible. Through this lecture, I gained tremendous insight and know-how. In particular, I am glad that I gained a lot of knowledge about various frameworks that handle big data, and that I was inspired to delve into which direction I can go in the future. It was like a shower after a drought. I recommend this course to students who are aspiring to this field like me.

    • jphil
      Instructor

      Hello one831, Thank you for your valuable review. I hope you have good results in the future. Fighting!

  • 귤껍데기님의 프로필 이미지
    귤껍데기

    Reviews 3

    Average Rating 4.3

    5

    44% enrolled

    I think this is a great course with a lot of content and is a good place to start. Thank you for preparing this course.

    Similar courses

    Explore other courses in the same field!

    $77.00