강의

멘토링

로드맵

Data Science

/

Data Engineering

Big Data Cluster Build Package; Roadmap to Success

This is a code lab-based class where you will directly build a big data system or distributed processing system cluster (HDFS, Zookeeper, Spark, Zeppelin) that guarantees high availability.

(4.8) 20 reviews

114 learners

  • jphil
cluster
클러스터
빅데이터
실습 중심
Big Data
Apache Spark
Hadoop
Data Engineering

Reviews from Early Learners

What you will learn!

  • Big Data Cluster Setup

  • Distributed File OR Processing System

  • High Availability

  • Hadoop

  • HDFS

  • Apache Spark

  • Apache Zeppelin

  • Apache Zookeeper

  • AWS (EC2, AMI, Security Group)

Building a big data distributed cluster through code lab.
Big Data Cluster Build Package
👨🏻‍🎓

Hello, this is J.PHIL 🍏

As one semester has passed and a good opportunity has come up, this season we will be holding a lecture titled ' Big Data Cluster Construction Package ' where you will build a big data distributed cluster yourself 📚

Thanks to your support, inspired by the previous "Big Data Pipeline Master" class, I pondered, "Isn't there a more challenging, yet meaningful, course?" After much deliberation, I've painstakingly crafted this course.

Keywords: Big Data Cluster, Distributed System, High Availability, Hadoop, HDFS, Apache Spark, Zookeeper, Zeppelin, AWS EC2 & AMI

Why should we attend lectures 🙇🏻

Over the past decade, rapid technological advancements have led to the proliferation of platforms and services, enabling us to utilize and analyze the vast amounts of data generated from our daily lives, enabling us to live a higher quality of life.

As shown in Figure 1 below, not only domestic large corporations but also global giants openly emphasize the importance of Big Data Storage and Big Data Processing, and demand similar analysis and construction skill sets from many engineers.

001.png

002.png

However, before entering the industry , it's difficult to gain hands-on experience building or managing a BIG DATA CLUSTER . Therefore, when the opportunity to gain meaningful value arises, a lack of experience can lead to disappointing results.

When I was a researcher, I had to build a big data cluster of 50 people myself while writing a paper for the 'DATA TOPTIER CONFERENCE'. I endured the burden of having to set an example for the members and the great stress of having to pay, and I stayed up day and night for two weeks , focusing solely on building the cluster.

Of course, I learned a lot from that valuable experience, and it served as valuable nurturing for my future. However , I don't want you to waste your time inefficiently like this. In other words, I created this course with the hope that you won't just spend your precious 200 hours building a cluster, but instead dedicate it to efficiently conducting experiments or analyzing customer data on top of it. 📝

Above all, I hope that after you take the current lecture and gain experience in building a cluster, it will be of great help to you when you build a big data cluster in the field or in graduate school like me. Please refer to the lecture as it is unlimited.💓

What will we learn 📚

📝

Experience of writing a paper for a top-tier data conference

👨🏻‍💼

Valuable experience in building and analyzing big data systems gained from the field

🧑🏻‍🏫

Long experience in nurturing good students at university

With this valuable experience, we hope to help you create a ⚔️ powerful weapon in your field.

1. On top of HDFS , a distributed file system that guarantees high availability (see Daemon example below)

2. Big Data System Masterpiece: Apache Spark and Zeppelin , a Big Data-Dedicated Notebook

We will build the cluster package ourselves through theory and solid code labs.

image.png

Do the high-availability file system daemon configurations above seem a bit daunting? Seeing architecture and system configuration diagrams for the first time can be overwhelming.

but

Based on the valuable feedback from excellent students over the past six years and the experience of launching the last two Inflearn courses, we have organized the content into easy-to-understand, high-quality content that is as easy to understand as possible, step by step, tailored to the students' level . Feel free to follow along.

special thanks to my lovely students 👨🏻‍🎓

Please tell me about the curriculum 🧑🏻‍🏫

Rather than starting directly with CODELAB, we'll begin by learning the theory behind building a high-availability cluster . For students unfamiliar with AWS or Linux environments, we'll watch video tutorials and study background knowledge before moving on to in-depth code labs .

curri-1.jpg

Anyone interested in big data or distributed processing can take this course 🧑🏻‍🎓

What is the training environment like? 💻

You can follow the class sufficiently by preparing a stress-free environment as shown below.

  • OS: Ubuntu 20.04 LTS

  • Editor: Vim (up to your preferences)

  • Machine specifications

    • AWS EC2 / c5.large ( 2 Core 4GB ) 4 or 5 units

Please watch the Course Curriculum for more details 😊

Introducing J.PHIL 👨‍👨‍👧‍👦

image.png

Recommended for
these people

Who is this course right for?

  • Students who want to experience building a big data processing system cluster

  • Students interested in data analysis and systems and who wish to pursue a career in this field

  • Developers who want to experience high availability cluster practice firsthand

  • Job seekers who want to build strengths in the field of big data analysis and construction

Need to know before starting?

  • Python Basic Coding

  • Basic knowledge of Linux commands

  • Database Basics

Hello
This is

452

Learners

40

Reviews

50

Answers

4.9

Rating

2

Courses

안녕하세요 J.PHIL 입니다 🧑🏻‍🎓

첫번째 강의로 [ 빅데이터 시스템 구축 및 분석에 관심있는 입문자 ] 를 위해
"Mastering Big Data Processing: Tools and Techniques for Success" 강의를 오픈 하였습니다.

'수업 및 프로필' 자세한 사항들은 수업 상세 페이지에 잘 작성했으니 참고 부탁드립니다  🙏🏻

Curriculum

All

36 lectures ∙ (4hr 51min)

Course Materials:

Lecture resources
Published: 
Last updated: 

Reviews

All

20 reviews

4.8

20 reviews

  • 귤껍데기님의 프로필 이미지
    귤껍데기

    Reviews 3

    Average Rating 4.3

    5

    44% enrolled

    I think this is a great course with a lot of content and is a good place to start. Thank you for preparing this course.

    • won831님의 프로필 이미지
      won831

      Reviews 1

      Average Rating 5.0

      5

      19% enrolled

      I am a computer science student who is about to graduate and is aspiring to become a data engineer. While creating a portfolio related to employment, I had many concerns about how to configure pipelines and architectures for processing big data, and how to set up an AWS environment to use it efficiently at the lowest cost possible. Through this lecture, I gained tremendous insight and know-how. In particular, I am glad that I gained a lot of knowledge about various frameworks that handle big data, and that I was inspired to delve into which direction I can go in the future. It was like a shower after a drought. I recommend this course to students who are aspiring to this field like me.

      • jphil
        Instructor

        Hello one831, Thank you for your valuable review. I hope you have good results in the future. Fighting!

    • youngmikwon님의 프로필 이미지
      youngmikwon

      Reviews 3

      Average Rating 5.0

      5

      100% enrolled

      thank you!

      • jphil
        Instructor

        Hello, Kwon Young-mi, Thank you for your valuable course review! Fighting!

    • jasonking님의 프로필 이미지
      jasonking

      Reviews 2

      Average Rating 5.0

      5

      36% enrolled

      I'm listening to this lecture after listening to the previous pipeline lecture, and I like it because it's easy to understand~ Thank you for the compact and practical lecture~ I think I'll listen to this lecture quickly, but I'm looking forward to other lectures.

      • It took 2 days. Since it was in lab format, it progressed quickly, and it was difficult because the namenode did not start (I think it was because of a mistake somewhere) Later, I saw that the trouble shoot guide section organized the startup procedure script and log viewing section. If I had seen this, I would have recovered from the mistake a bit faster ㅜㅜ For those who are going to do it, it would be better to read it once and follow it rather than just following along~ Instructor. Thank you for the great lecture every time~

      • jphil
        Instructor

        Hello Jason.King, Thank you for taking my lecture diligently :) Sometimes, it will be helpful to experience bugs or troubleshooting yourself, think about them, and review them, so I think this experience will be of great help in the future. If you build a large open source yourself, you will be able to build a cluster, so if another open source comes out, you will be able to build it well in a short time. Fighting in the future.

    • upgleman8112423674님의 프로필 이미지
      upgleman8112423674

      Reviews 4

      Average Rating 5.0

      5

      31% enrolled

      This is a lecture that I highly recommend to beginners, from theory to code lab!! I highly recommend taking this as a mandatory lecture on building a big data cluster!!

      • jphil
        Instructor

        Hello Yeonwoo Jung, Thank you for your valuable review. I hope you will invest a day or two when you have the chance to practice with AWS and achieve good results. Happy New Year :)

    $77.00

    jphil's other courses

    Check out other courses by the instructor!

    Similar courses

    Explore other courses in the same field!