강의

멘토링

로드맵

BEST
Data Science

/

Data Engineering

Data Engineering Course (1): Installing Big Data Hadoop Directly

Students who want to learn Hadoop and Big Data will celebrate the amazing advancements in experiencing the world of Big Data through this course!

(4.5) 39 reviews

567 learners

  • Billy Lee
hadoop
하둡
빅데이터
실무로배우는빅데이터기술
맵리듀스
Big Data
Hadoop
Data Engineering
Java
mapreduce

Reviews from Early Learners

What you will learn!

  • Encountering big data technology in everyday life

  • Handling Big Data with Hadoop

  • Learn distributed processing technology for handling big data with Hadoop

  • Handling Hadoop Big Data Using Java Language

  • Learn techniques to overcome the limitations of relational data processing with Hadoop

  • Learn about Hadoop's various projects and interfaces.

It's the era of big data! 👨‍💻
Become an expert with Hadoop.

The center of data science,
Hadoop is the trend!

Many IT giants, social media services, and others are competing to use Hadoop (Apache Hadoop) for big data analysis and processing. Hadoop is a Java-based framework designed to process massive amounts of data at low cost, enabling distributed storage and processing of large data sets. But what if you could achieve the level of a big data expert through Hadoop?

Through data analysis, companies will be able to pioneer new markets, create unique value, and provide new consumers with the thrill of real-time access to essential information. Big data is also a crucial skill for small and medium-sized businesses, so this is welcome news for those seeking employment or a career change in big data .

BigData with Hadoop

Google, Yahoo, Facebook, IBM, Instagram, Twitter, etc.
It is being used by many companies for data analysis.
Through Hadoop, a representative big data solution
Let's build a big data distributed system infrastructure .

This course begins with an understanding of big data terminology and then provides an indirect experience of handling big data using the open source software Hadoop . Through this course, students will simultaneously experience the world of big data technology and the Fourth Industrial Revolution.

What is Hadoop?

  • Hadoop is open source software that anyone can use for free.
    In this lecture, we will cover big data using Hadoop version 3.2.1 .

Understanding big data
How to use Hadoop
OK at once.

Big data
About the term
Essential Understanding
Hadoop 's
In concept and use
Introduction to Korea
Through Hadoop
Big data processing
Learning Tutorials

I recommend this to these people!

Of course, those who don't fit this category are also welcome. (Beginners are doubly welcome ✌)

Employment/Job Change
Future IT in consideration
Data science aspirants
Via Java/Python
I want to deal with big data
Those who do it
With interest and curiosity
About big data
Anyone who wants to experience it
Hadoop 3.x version
Data environment, etc.
Office worker who wants to experience

Before taking the class, please check your knowledge!

  • Prerequisite knowledge is the basics of the Java programming language, knowledge of big data and virtual machine/dataset related terminology , and a basic understanding of Linux Ubuntu .

The following content
I'm learning.

1. Understanding virtualization technology challenges and guest operating systems

We will learn virtualization technology, which is advantageous for server consolidation, and how to isolate multiple servers with a single OS through OS-level virtualization. Anyone can take on the challenge of creating and operating a large number of servers using Ubuntu, an open-source solution that supports Linux virtualization. Furthermore, we will gain knowledge of guest operating systems and gain extensive technical experience in distributing big data across multiple servers. Using server virtualization, you can experience multiple operating systems running on a single physical server or operating system in a highly efficient virtual machine.

  • Learn about the definition of Big Data and its practical applications.
  • Let's understand the terminology related to Hadoop, the data processing software preferred by businesses.
Data Sizes
The Landscape: Big Data

2. How to install Hadoop on Ubuntu 20.04 LTS and use commands

We'll cover the basics of using the Linux CLI (Command Line Interface) tools that front-end developers naturally encounter when developing web applications, and then seamlessly transition to the Linux terminal for Hadoop. Furthermore, we'll cover the basics of using Ubuntu in a non-Windows GUI environment, moving beyond a basic understanding of Linux systems like shell configuration files and moving towards an intermediate level.

  • Let's install and set up Linux (Ubuntu 20.04 LTS) as a virtual machine on a Windows 10-based laptop.
  • Install Hadoop version 3.2.1 on a Linux virtual machine.
Hadoop 2.x Architecture
Hadoop 2.x vs. 3.x

3. Hadoop 3.2.1 Latest Direction Guide & Understanding the Core Architecture Structure

The starting point for big data processing for unstructured data is understanding the Hadoop Distributed File System (HDFS), a model of Google's file system, MapReduce, and YARN for cluster scaling and resource management. We will examine the architectural structures of Hadoop versions 1, 2, and 3 one by one, providing students with a visual representation of the history of Hadoop technology.

  • Understand and integrate with the Hadoop Distributed File System (HDFS).
  • Understand the principles of the Map/Reduce framework and analyze data based on it.
HDFS Architecture
YARN Core Components

4. HDFS Shell Operation Guide and Building MapReduce Applications with Java/Python

While data manipulation techniques vary, the foundation of big data analysis lies in building MapReduce applications. From a basic wordcount MapReduce application in Python to a COVID-19 application built using Eclipse-based Java, building a variety of big data MapReduce applications is no longer an option; it's a necessary step forward.

  • Let's connect Hadoop with Java and implement an application.
  • Let's connect Hadoop with Python and implement an application.
Python Map/Reduce WordCount Application
Java Map/Reduce WordCount Application

Expected Questions Q&A!

Q. What is big data? Is its definition necessary when using Hadoop?

Yes, of course, when working with Hadoop, a brief definition and understanding of big data is required. Of course, this doesn't require a complete and in-depth understanding. However, it does require a level of understanding essential to working with Hadoop.

Big data involves handling extremely large datasets using Hadoop tools. These datasets serve as the foundation for analyzing numerous patterns and trends across numerous businesses. They are closely linked to human social behavior, patterns, and the value creation that occurs through interactions.

Image Source: TechTarget (Go to original article)

Q. What is Hadoop? What are its components? What is the Hadoop stack?

Data from large-scale social sites, ranging from terabytes to petabytes (Zettabytes) Hadoop is helping with this mission. The Hadoop Stack refers to an open-source framework for handling big data.

Simply put, "Hadoop" is called the "Hadoop stack." Hadoop and the Hadoop stack help you build clusters using inexpensive, common commodity hardware and handle large-scale processing within these massive clusters of servers. The Hadoop stack, also known as "simple batch processing," is a Java-based "distributed computing platform." It allows individuals to batch process as much data as they want, periodically, distributing the data in the desired format to produce results.

Q. Is programming knowledge required?

Even if you have no programming knowledge or coding experience, it's okay. I teach with a deep understanding of Java and Python, as if you were experiencing them for the first time. While the lecture documentation is in English, I'll teach in Korean to ensure you can follow along without any difficulties. While I occasionally provide explanations in English, I believe anyone with a high school level will be able to interpret them. (Just like I achieved my dream, even with my limited English skills.)

Q. How relevant is big data to Hadoop?

This course naturally covers Hadoop. Beyond simple RDMSs like Oracle, MSSQL, or MySQL, it aims to address essential business requirements, starting with large-scale processing, data processing speed, and cost-effectiveness. Specifically, Hadoop addresses not only structured data—the relational data handled by row- and column-based RDMSs—but also unstructured data, such as images, audio, and word processing files themselves.

When dealing with service structure data, we're talking about data related to communication and data integration with web servers, such as email, CSV, XML, and JSON. HTML, websites, and NoSQL databases are also included. Of course, the accumulation of datasets used to handle computer-to-computer transfers of business documents, known as EDI, also falls into this category.

Image source: MonkeyLearn Blog (Go to original article)

Q. What level of content is covered?

This course will guide users through the installation of Hadoop 3.2.1 on Ubuntu 20.04 LTS. Even if you have no prior Unix or Linux experience, you'll naturally learn installation techniques and the Linux operating system. Beyond the basics of Hadoop's CLI and user language, this course will also help you become familiar with Google's proprietary DFS and MapReduce technologies. Your understanding of YARN will be limited to basic theory. We anticipate a more in-depth study of YARN as you install a cluster in the Hadoop 3.3.0 intermediate course.

Q. Is there a reason you are using Ubuntu 20.04 LTS as a practice environment?

Ubuntu is free to use, and its LTS (Long-Term Service) program targets companies seeking long-term service support. By installing Hadoop on Linux, you can naturally build the operating system and development environment your business needs. By supporting the use of Eclipse and Intelligent within the same environment, you can contribute to realizing the dream of data science, which involves big data, right now.

Ubuntu is a Windows operating system that allows installation and operation.
Similar environment, i.e. GUI (Graphical User Interface)
We are helping users through the environment.

Recommended for
these people

Who is this course right for?

  • Enthusiastic students who want to learn the basics of big data from scratch

  • For those who are thirsty for big data principles and applications

  • For those who want to learn Hadoop to handle big data in their companies

  • For those who have basic knowledge of Java

Need to know before starting?

  • The Concept of Big Data (Understanding Big Data)

  • Virtual Machine

  • Data set terminology

  • Understanding Linux (Ubuntu)

  • Java 15

Hello
This is

588

Learners

40

Reviews

69

Answers

4.5

Rating

2

Courses

네오아베뉴 대표 빌리 리 입니다.

2022년 9월 한국에 가족 모두 귀국한 뒤 현대자동차 빅데이터 프로젝트에 TA 컨설팅 (2022.09 -11월까지)하였고, 에자일 PM과 빅데이터 C-ITS 시스템 구축으로 하둡 에코시스템 및 머신러닝 딥러닝 리드하여 프로젝트 관리자 (PMO)역할을 하였습니다. 이후 Azure Data Factory & Azure Databricks 을 가지고 데이터 관리 기술을 AIA 생명 보험 이노베이션 데이터 플랫폼 팀에서 근무하면서 데이터 과학자로 깊은 탐구와 열정을 불살랐습니다.

2012년에서 2020년 까지 센터니얼 칼리지 Software Eng. Technician 졸업한 열공생이자 한국에서는 9년의 IT 경력 소유자로 금융권 (재무, 금융 프로젝트 및 빅데이터 관련 ) 에 다수 근무했습니다.

1999년 필리핀 (Dasmarinas) 지역에서 P.T.S. 네트워크 엔지니어링 자원 봉사자로 1년 근무하면서 글로벌 IT 세계와 네트워크 지식을 쌓으며 이후 2000년 한국으로 돌아와 K.M.C.에서 Clarion 4GL 언어로 Warehouse Inventory Control and Management 그리고 PIS Operational Test PCS C/C++ 개발했었습니다.

2001년 LG-SOFT SCHOOL 자바 전문가 과정 이수 후 CNMTechnologies 에서 e-CRM/e-SFA R&D 연구 및 개발 2년 정도 (한국산업은행/대정정부청사/영진제약) 다양한 프로젝트를 섭렵하였습니다.

2004년부터 2012년 캐나다로 올 때까지 SKT/SK C&C (IMOS), SC제일은행(TBC), 프로덴션 생명(PFMS), 교보생명 AXA Kyobo Life Insurance Account Management, Kook-min Bank 국민은행 Financial Management Reconstruction NGM외 다수 프로젝트에 참여 개발 및 리드하였습니다.

 

2012년 연말에 캐나다에 거주하면서 세 아이의 아빠이자 Scrum Master로서 에자일 개발 방식을 채택하여 핸디맨 어플/이커머스 어플/프로덱트 개발/레시피 어플 개발한 미주 캐나다 지역의 실경험자입니다.

Curriculum

All

85 lectures ∙ (6hr 39min)

Course Materials:

Lecture resources
Published: 
Last updated: 

Reviews

All

39 reviews

4.5

39 reviews

  • hadoop3bigdata님의 프로필 이미지
    hadoop3bigdata

    Reviews 3

    Average Rating 5.0

    5

    93% enrolled

    This course was created with the intention of training you to become a Hadoop expert who handles big data. Rather than using a comprehensive on-premise distribution software application (OPD) like Cloudera, we will move you to the step of installing Hadoop from scratch, extracting, moving, and loading datasets. Hadoop, which started from version 1.x, has now become a very heavy platform with many features added up to version 3.3, but I hope that this course will be filled with the desire to train you to become a big data expert by handling many tools.

    • kentucky8612311057님의 프로필 이미지
      kentucky8612311057

      Reviews 4

      Average Rating 5.0

      5

      100% enrolled

      Pros: You can learn the basics of Hadoop MapReduce. It seems to be the only Hadoop lecture in Korean. Disappointing points: I was disappointed that there was no content I was curious about, such as extracting with one common key using two mappers, when using two keys, how to set the comparator directly, and so on. Cons: The instructor's Korean pronunciation is not clear, and the background music is loud, so I had to listen to what he was saying several times. --------------------------------------- I will change the rating to 5 stars after seeing the teacher's answer.

      • hadoop3bigdata
        Instructor

        Thank you for your kind and detailed evaluation. The theory of Hadoop is so vast that I can say that I can't cover everything. It's even harder to understand the entire Hadoop by listening to my lecture. I removed the background music and re-recorded it with a clear voice, so I would appreciate it if you could take the lecture again. There are also updated lectures, so I hope you will listen to them in quiet times and become a Hadoop expert.

    • seaking79727님의 프로필 이미지
      seaking79727

      Reviews 47

      Average Rating 4.5

      5

      59% enrolled

      It's good for Hadoop beginners. It seems like a good idea to learn it first before reading the book.

      • hadoop3bigdata
        Instructor

        Yes, thank you for your good review. It is not easy for beginners who are new to Hadoop to follow the books currently available on the market. In that sense, my lecture emphasized running Hadoop, HDFS, and YARN applications on a single node while learning before purchasing the book, as Taekyung Kim said in his review. If it is effective, thank you. I will see you again with a better lecture. I hope you will grow into a Hadoop expert.

    • dlgnsxo1239897님의 프로필 이미지
      dlgnsxo1239897

      Reviews 56

      Average Rating 5.0

      5

      100% enrolled

      The Hadoop lecture was really good! I wish there was a Spark lecture too. Thank you!

      • hadoop3bigdata
        Instructor

        I hope this lecture will be an opportunity to approach Hadoop in a more friendly way. I also hope that the Spark lecture will be delivered to you. I support you from Toronto to become a Hadoop expert.

    • jason님의 프로필 이미지
      jason

      Reviews 28

      Average Rating 5.0

      5

      31% enrolled

      Instructor, thank you so much for providing such a high-quality lecture at such a low price! I am a new data engineer, and I was considering a thick book while thinking about how to start with big data frameworks such as Hadoop and Spark, and I came across this lecture that I had purchased in the past. I have never used Java before, so it will take some time to understand, but I will complete it well! You said you are planning a Spark lecture at the end of this year or early next year, so I am really, really looking forward to it! Please, I hope that this lecture can be explained as easily as this one based on pyspark!

      Limited time deal

      $33.00

      23%

      $42.90

      Billy Lee's other courses

      Check out other courses by the instructor!

      Similar courses

      Explore other courses in the same field!