Inflearn brand logo image
Inflearn brand logo image
Inflearn brand logo image
BEST
Data Science

/

Data Engineering

Data Engineering Course (1): Installing Big Data Hadoop Directly

Students who want to learn Hadoop and Big Data will celebrate the amazing advancements in experiencing the world of Big Data through this course!

(4.6) 36 reviews

562 learners

  • hadoop3bigdata
hadoop
하둡
빅데이터
실무로배우는빅데이터기술
맵리듀스
Big Data
Hadoop
Data Engineering
Java
mapreduce

Reviews from Early Learners

What you will learn!

  • Encountering big data technology in everyday life

  • Handling Big Data with Hadoop

  • Learn distributed processing technology for handling big data with Hadoop

  • Handling Hadoop Big Data Using Java Language

  • Learn techniques to overcome the limitations of relational data processing with Hadoop

  • Learn about Hadoop's various projects and interfaces.

It's the era of big data! 👨‍💻
Become an expert with Hadoop.

The center of data science,
Hadoop is the trend!

Many IT giants, social media services, and others are competing to use Hadoop (Apache Hadoop) for big data analysis and processing. Hadoop is a Java-based framework designed to process massive amounts of data at low cost, enabling distributed storage and processing of large data sets. But what if you could achieve the level of a big data expert through Hadoop?

Through data analysis, companies will be able to pioneer new markets, create unique value, and provide new consumers with the thrill of real-time access to essential information. Big data is also a crucial skill for small and medium-sized businesses, so this is welcome news for those seeking employment or a career change in big data .

BigData with Hadoop

Google, Yahoo, Facebook, IBM, Instagram, Twitter, etc.
It is being used by many companies for data analysis.
Through Hadoop, a representative big data solution
Let's build a big data distributed system infrastructure .

This course begins with an understanding of big data terminology and then provides an indirect experience of handling big data using the open source software Hadoop . Through this course, students will simultaneously experience the world of big data technology and the Fourth Industrial Revolution.

What is Hadoop?

  • Hadoop is open source software that anyone can use for free.
    In this lecture, we will cover big data using Hadoop version 3.2.1 .

Understanding big data
How to use Hadoop
OK at once.

Big data
About the term
Essential Understanding
Hadoop 's
In concept and use
Introduction to Korea
Through Hadoop
Big data processing
Learning Tutorials

I recommend this to these people!

Of course, those who don't fit this category are also welcome. (Beginners are doubly welcome ✌)

Employment/Job Change
Future IT in consideration
Data science aspirants
Via Java/Python
I want to deal with big data
Those who do it
With interest and curiosity
About big data
Anyone who wants to experience it
Hadoop 3.x version
Data environment, etc.
Office worker who wants to experience

Before taking the class, please check your knowledge!

  • Prerequisite knowledge is the basics of the Java programming language, knowledge of big data and virtual machine/dataset related terminology , and a basic understanding of Linux Ubuntu .

The following content
I'm learning.

1. Understanding virtualization technology challenges and guest operating systems

We will learn virtualization technology, which is advantageous for server consolidation, and how to isolate multiple servers with a single OS through OS-level virtualization. Anyone can take on the challenge of creating and operating a large number of servers using Ubuntu, an open-source solution that supports Linux virtualization. Furthermore, we will gain knowledge of guest operating systems and gain extensive technical experience in distributing big data across multiple servers. Using server virtualization, you can experience multiple operating systems running on a single physical server or operating system in a highly efficient virtual machine.

  • Learn about the definition of Big Data and its practical applications.
  • Let's understand the terminology related to Hadoop, the data processing software preferred by businesses.
Data Sizes
The Landscape: Big Data

2. How to install Hadoop on Ubuntu 20.04 LTS and use commands

We'll cover the basics of using the Linux CLI (Command Line Interface) tools that front-end developers naturally encounter when developing web applications, and then seamlessly transition to the Linux terminal for Hadoop. Furthermore, we'll cover the basics of using Ubuntu in a non-Windows GUI environment, moving beyond a basic understanding of Linux systems like shell configuration files and moving towards an intermediate level.

  • Let's install and set up Linux (Ubuntu 20.04 LTS) as a virtual machine on a Windows 10-based laptop.
  • Install Hadoop version 3.2.1 on a Linux virtual machine.
Hadoop 2.x Architecture
Hadoop 2.x vs. 3.x

3. Hadoop 3.2.1 Latest Direction Guide & Understanding the Core Architecture Structure

The starting point for big data processing for unstructured data is understanding the Hadoop Distributed File System (HDFS), a model of Google's file system, MapReduce, and YARN for cluster scaling and resource management. We will examine the architectural structures of Hadoop versions 1, 2, and 3 one by one, providing students with a visual representation of the history of Hadoop technology.

  • Understand and integrate with the Hadoop Distributed File System (HDFS).
  • Understand the principles of the Map/Reduce framework and analyze data based on it.
HDFS Architecture
YARN Core Components

4. HDFS Shell Operation Guide and Building MapReduce Applications with Java/Python

While data manipulation techniques vary, the foundation of big data analysis lies in building MapReduce applications. From a basic wordcount MapReduce application in Python to a COVID-19 application built using Eclipse-based Java, building a variety of big data MapReduce applications is no longer an option; it's a necessary step forward.

  • Let's connect Hadoop with Java and implement an application.
  • Let's connect Hadoop with Python and implement an application.
Python Map/Reduce WordCount Application
Java Map/Reduce WordCount Application

Expected Questions Q&A!

Q. What is big data? Is its definition necessary when using Hadoop?

Yes, of course, when working with Hadoop, a brief definition and understanding of big data is required. Of course, this doesn't require a complete and in-depth understanding. However, it does require a level of understanding essential to working with Hadoop.

Big data involves handling extremely large datasets using Hadoop tools. These datasets serve as the foundation for analyzing numerous patterns and trends across numerous businesses. They are closely linked to human social behavior, patterns, and the value creation that occurs through interactions.

Image Source: TechTarget (Go to original article)

Q. What is Hadoop? What are its components? What is the Hadoop stack?

Data from large-scale social sites, ranging from terabytes to petabytes (Zettabytes) Hadoop is helping with this mission. The Hadoop Stack refers to an open-source framework for handling big data.

Simply put, "Hadoop" is called the "Hadoop stack." Hadoop and the Hadoop stack help you build clusters using inexpensive, common commodity hardware and handle large-scale processing within these massive clusters of servers. The Hadoop stack, also known as "simple batch processing," is a Java-based "distributed computing platform." It allows individuals to batch process as much data as they want, periodically, distributing the data in the desired format to produce results.

Q. Is programming knowledge required?

Even if you have no programming knowledge or coding experience, it's okay. I teach with a deep understanding of Java and Python, as if you were experiencing them for the first time. While the lecture documentation is in English, I'll teach in Korean to ensure you can follow along without any difficulties. While I occasionally provide explanations in English, I believe anyone with a high school level will be able to interpret them. (Just like I achieved my dream, even with my limited English skills.)

Q. How relevant is big data to Hadoop?

This course naturally covers Hadoop. Beyond simple RDMSs like Oracle, MSSQL, or MySQL, it aims to address essential business requirements, starting with large-scale processing, data processing speed, and cost-effectiveness. Specifically, Hadoop addresses not only structured data—the relational data handled by row- and column-based RDMSs—but also unstructured data, such as images, audio, and word processing files themselves.

When dealing with service structure data, we're talking about data related to communication and data integration with web servers, such as email, CSV, XML, and JSON. HTML, websites, and NoSQL databases are also included. Of course, the accumulation of datasets used to handle computer-to-computer transfers of business documents, known as EDI, also falls into this category.

Image source: MonkeyLearn Blog (Go to original article)

Q. What level of content is covered?

This course will guide users through the installation of Hadoop 3.2.1 on Ubuntu 20.04 LTS. Even if you have no prior Unix or Linux experience, you'll naturally learn installation techniques and the Linux operating system. Beyond the basics of Hadoop's CLI and user language, this course will also help you become familiar with Google's proprietary DFS and MapReduce technologies. Your understanding of YARN will be limited to basic theory. We anticipate a more in-depth study of YARN as you install a cluster in the Hadoop 3.3.0 intermediate course.

Q. Is there a reason you are using Ubuntu 20.04 LTS as a practice environment?

Ubuntu is free to use, and its LTS (Long-Term Service) program targets companies seeking long-term service support. By installing Hadoop on Linux, you can naturally build the operating system and development environment your business needs. By supporting the use of Eclipse and Intelligent within the same environment, you can contribute to realizing the dream of data science, which involves big data, right now.

Ubuntu is a Windows operating system that allows installation and operation.
Similar environment, i.e. GUI (Graphical User Interface)
We are helping users through the environment.

Recommended for
these people

Who is this course right for?

  • Enthusiastic students who want to learn the basics of big data from scratch

  • For those who are thirsty for big data principles and applications

  • For those who want to learn Hadoop to handle big data in their companies

  • For those who have basic knowledge of Java

Need to know before starting?

  • The Concept of Big Data (Understanding Big Data)

  • Virtual Machine

  • Data set terminology

  • Understanding Linux (Ubuntu)

  • Java 15

Hello
This is

583

Learners

37

Reviews

69

Answers

4.6

Rating

2

Courses

네오아베뉴 대표 빌리 리 입니다.

2022년 9월 한국에 가족 모두 귀국한 뒤 현대자동차 빅데이터 프로젝트에 TA 컨설팅 (2022.09 -11월까지)하였고, 에자일 PM과 빅데이터 C-ITS 시스템 구축으로 하둡 에코시스템 및 머신러닝 딥러닝 리드하여 프로젝트 관리자 (PMO)역할을 하였습니다. 이후 Azure Data Factory & Azure Databricks 을 가지고 데이터 관리 기술을 AIA 생명 보험 이노베이션 데이터 플랫폼 팀에서 근무하면서 데이터 과학자로 깊은 탐구와 열정을 불살랐습니다.

2012년에서 2020년 까지 센터니얼 칼리지 Software Eng. Technician 졸업한 열공생이자 한국에서는 9년의 IT 경력 소유자로 금융권 (재무, 금융 프로젝트 및 빅데이터 관련 ) 에 다수 근무했습니다.

1999년 필리핀 (Dasmarinas) 지역에서 P.T.S. 네트워크 엔지니어링 자원 봉사자로 1년 근무하면서 글로벌 IT 세계와 네트워크 지식을 쌓으며 이후 2000년 한국으로 돌아와 K.M.C.에서 Clarion 4GL 언어로 Warehouse Inventory Control and Management 그리고 PIS Operational Test PCS C/C++ 개발했었습니다.

2001년 LG-SOFT SCHOOL 자바 전문가 과정 이수 후 CNMTechnologies 에서 e-CRM/e-SFA R&D 연구 및 개발 2년 정도 (한국산업은행/대정정부청사/영진제약) 다양한 프로젝트를 섭렵하였습니다.

2004년부터 2012년 캐나다로 올 때까지 SKT/SK C&C (IMOS), SC제일은행(TBC), 프로덴션 생명(PFMS), 교보생명 AXA Kyobo Life Insurance Account Management, Kook-min Bank 국민은행 Financial Management Reconstruction NGM외 다수 프로젝트에 참여 개발 및 리드하였습니다.

 

2012년 연말에 캐나다에 거주하면서 세 아이의 아빠이자 Scrum Master로서 에자일 개발 방식을 채택하여 핸디맨 어플/이커머스 어플/프로덱트 개발/레시피 어플 개발한 미주 캐나다 지역의 실경험자입니다.

Curriculum

All

85 lectures ∙ (6hr 39min)

Course Materials:

Lecture resources
Published: 
Last updated: 

Reviews

All

36 reviews

4.6

36 reviews

  • hadoop3bigdata님의 프로필 이미지
    hadoop3bigdata

    Reviews 3

    Average Rating 5.0

    5

    93% enrolled

    この講義は、ビッグデータを扱うHadoopの専門家として養成したい心で講義を制作しました。クラウデラなどの包括的なオンプロメス配布ソフトウェアアプリケーション(OPD)を使用するのではなく、Hadoopを最初からインストールし、データセットを抽出、移動、およびロードすることに進みます。 1.xバージョンから始まったHadoopは、3.3バージョンまで多くの機能が追加されて非常に海賊なプラットフォームになりましたが、多くのツールを扱い、ビッグデータ専門家として養成される心溢れる講義になることを願っています。

    • kentucky8612311057님의 프로필 이미지
      kentucky8612311057

      Reviews 4

      Average Rating 5.0

      5

      100% enrolled

      利点: Hadoop MapReduceの基礎を学ぶことができます。 韓国語で唯一のHadoop講義のようです 残念なこと: マッパーを2つ使用して1つの共通キーに抽出する キーを2つ書く場合、 コンパレータを直接設定する方法 など気になった内容がなくて残念だった。 欠点: 講師様韓国語の発音が明確ではないが、背景音楽が大きくて何度何を言うのか再び聞かなければならなかった。 --------------------------------------- 先生の回答を見て別点5に修正します。

      • hadoop3bigdata
        Instructor

        親切に詳しい評価ありがとうございます。更新した講義もあるので静かな時間に聞きながらハドゥプ専門家に残すことを楽しみにしています。

    • seaking79727님의 프로필 이미지
      seaking79727

      Reviews 38

      Average Rating 4.7

      5

      59% enrolled

      Hadoopの入門者にはいいですね。本を見る前にまず学習するのにちょうどいいようです。

      • hadoop3bigdata
        Instructor

        はい、良い評価ありがとうございます。 YARNアプリケーションを実行することを強調しました。ありがとうございます。楽しみにしています。

    • dlgnsxo1239897님의 프로필 이미지
      dlgnsxo1239897

      Reviews 56

      Average Rating 5.0

      5

      100% enrolled

      Hadoop川の本当によかったです! スパーク講義も開いてほしいです。 ありがとうございます!

      • hadoop3bigdata
        Instructor

        この講義を通して、Hadoopをもっと親しみ、きっかけになることを楽しみにしています。

    • jason님의 프로필 이미지
      jason

      Reviews 28

      Average Rating 5.0

      5

      31% enrolled

      講師様 こんな質の良い講義をお手頃な価格で視聴できるように提供していただきありがとうございます! データエンジニアが新しくなりました HadoopやSparkなどのビッグデータフレワークをどのように始めなければならないのか悩み、厚い本を考慮していたときに過去に購入しておいたこの講義を見直すようになりました。 Javaをやったことがないので理解するのに少し時間がかかりますが、よく修了します! 今年末や、来年初めにスパーク川の予定だと言うのにとても楽しみにしています! pysparkベースでこの講義のように簡単に説明できることを願っています!

      $42.90

      hadoop3bigdata's other courses

      Check out other courses by the instructor!

      Similar courses

      Explore other courses in the same field!