강의

멘토링

로드맵

BEST
Data Science

/

Data Engineering

Data Engineering Course (1): Installing Big Data Hadoop Directly

Students who want to learn Hadoop and Big Data will celebrate the amazing advancements in experiencing the world of Big Data through this course!

(4.5) 39 reviews

567 learners

  • Billy Lee
hadoop
하둡
빅데이터
실무로배우는빅데이터기술
맵리듀스
Big Data
Hadoop
Data Engineering
Java
mapreduce

Reviews from Early Learners

What you will learn!

  • Encountering big data technology in everyday life

  • Handling Big Data with Hadoop

  • Learn distributed processing technology for handling big data with Hadoop

  • Handling Hadoop Big Data Using Java Language

  • Learn techniques to overcome the limitations of relational data processing with Hadoop

  • Learn about Hadoop's various projects and interfaces.

It's the era of big data! 👨‍💻
Become an expert with Hadoop.

The center of data science,
Hadoop is the trend!

Many IT giants, social media services, and others are competing to use Hadoop (Apache Hadoop) for big data analysis and processing. Hadoop is a Java-based framework designed to process massive amounts of data at low cost, enabling distributed storage and processing of large data sets. But what if you could achieve the level of a big data expert through Hadoop?

Through data analysis, companies will be able to pioneer new markets, create unique value, and provide new consumers with the thrill of real-time access to essential information. Big data is also a crucial skill for small and medium-sized businesses, so this is welcome news for those seeking employment or a career change in big data .

BigData with Hadoop

Google, Yahoo, Facebook, IBM, Instagram, Twitter, etc.
It is being used by many companies for data analysis.
Through Hadoop, a representative big data solution
Let's build a big data distributed system infrastructure .

This course begins with an understanding of big data terminology and then provides an indirect experience of handling big data using the open source software Hadoop . Through this course, students will simultaneously experience the world of big data technology and the Fourth Industrial Revolution.

What is Hadoop?

  • Hadoop is open source software that anyone can use for free.
    In this lecture, we will cover big data using Hadoop version 3.2.1 .

Understanding big data
How to use Hadoop
OK at once.

Big data
About the term
Essential Understanding
Hadoop 's
In concept and use
Introduction to Korea
Through Hadoop
Big data processing
Learning Tutorials

I recommend this to these people!

Of course, those who don't fit this category are also welcome. (Beginners are doubly welcome ✌)

Employment/Job Change
Future IT in consideration
Data science aspirants
Via Java/Python
I want to deal with big data
Those who do it
With interest and curiosity
About big data
Anyone who wants to experience it
Hadoop 3.x version
Data environment, etc.
Office worker who wants to experience

Before taking the class, please check your knowledge!

  • Prerequisite knowledge is the basics of the Java programming language, knowledge of big data and virtual machine/dataset related terminology , and a basic understanding of Linux Ubuntu .

The following content
I'm learning.

1. Understanding virtualization technology challenges and guest operating systems

We will learn virtualization technology, which is advantageous for server consolidation, and how to isolate multiple servers with a single OS through OS-level virtualization. Anyone can take on the challenge of creating and operating a large number of servers using Ubuntu, an open-source solution that supports Linux virtualization. Furthermore, we will gain knowledge of guest operating systems and gain extensive technical experience in distributing big data across multiple servers. Using server virtualization, you can experience multiple operating systems running on a single physical server or operating system in a highly efficient virtual machine.

  • Learn about the definition of Big Data and its practical applications.
  • Let's understand the terminology related to Hadoop, the data processing software preferred by businesses.
Data Sizes
The Landscape: Big Data

2. How to install Hadoop on Ubuntu 20.04 LTS and use commands

We'll cover the basics of using the Linux CLI (Command Line Interface) tools that front-end developers naturally encounter when developing web applications, and then seamlessly transition to the Linux terminal for Hadoop. Furthermore, we'll cover the basics of using Ubuntu in a non-Windows GUI environment, moving beyond a basic understanding of Linux systems like shell configuration files and moving towards an intermediate level.

  • Let's install and set up Linux (Ubuntu 20.04 LTS) as a virtual machine on a Windows 10-based laptop.
  • Install Hadoop version 3.2.1 on a Linux virtual machine.
Hadoop 2.x Architecture
Hadoop 2.x vs. 3.x

3. Hadoop 3.2.1 Latest Direction Guide & Understanding the Core Architecture Structure

The starting point for big data processing for unstructured data is understanding the Hadoop Distributed File System (HDFS), a model of Google's file system, MapReduce, and YARN for cluster scaling and resource management. We will examine the architectural structures of Hadoop versions 1, 2, and 3 one by one, providing students with a visual representation of the history of Hadoop technology.

  • Understand and integrate with the Hadoop Distributed File System (HDFS).
  • Understand the principles of the Map/Reduce framework and analyze data based on it.
HDFS Architecture
YARN Core Components

4. HDFS Shell Operation Guide and Building MapReduce Applications with Java/Python

While data manipulation techniques vary, the foundation of big data analysis lies in building MapReduce applications. From a basic wordcount MapReduce application in Python to a COVID-19 application built using Eclipse-based Java, building a variety of big data MapReduce applications is no longer an option; it's a necessary step forward.

  • Let's connect Hadoop with Java and implement an application.
  • Let's connect Hadoop with Python and implement an application.
Python Map/Reduce WordCount Application
Java Map/Reduce WordCount Application

Expected Questions Q&A!

Q. What is big data? Is its definition necessary when using Hadoop?

Yes, of course, when working with Hadoop, a brief definition and understanding of big data is required. Of course, this doesn't require a complete and in-depth understanding. However, it does require a level of understanding essential to working with Hadoop.

Big data involves handling extremely large datasets using Hadoop tools. These datasets serve as the foundation for analyzing numerous patterns and trends across numerous businesses. They are closely linked to human social behavior, patterns, and the value creation that occurs through interactions.

Image Source: TechTarget (Go to original article)

Q. What is Hadoop? What are its components? What is the Hadoop stack?

Data from large-scale social sites, ranging from terabytes to petabytes (Zettabytes) Hadoop is helping with this mission. The Hadoop Stack refers to an open-source framework for handling big data.

Simply put, "Hadoop" is called the "Hadoop stack." Hadoop and the Hadoop stack help you build clusters using inexpensive, common commodity hardware and handle large-scale processing within these massive clusters of servers. The Hadoop stack, also known as "simple batch processing," is a Java-based "distributed computing platform." It allows individuals to batch process as much data as they want, periodically, distributing the data in the desired format to produce results.

Q. Is programming knowledge required?

Even if you have no programming knowledge or coding experience, it's okay. I teach with a deep understanding of Java and Python, as if you were experiencing them for the first time. While the lecture documentation is in English, I'll teach in Korean to ensure you can follow along without any difficulties. While I occasionally provide explanations in English, I believe anyone with a high school level will be able to interpret them. (Just like I achieved my dream, even with my limited English skills.)

Q. How relevant is big data to Hadoop?

This course naturally covers Hadoop. Beyond simple RDMSs like Oracle, MSSQL, or MySQL, it aims to address essential business requirements, starting with large-scale processing, data processing speed, and cost-effectiveness. Specifically, Hadoop addresses not only structured data—the relational data handled by row- and column-based RDMSs—but also unstructured data, such as images, audio, and word processing files themselves.

When dealing with service structure data, we're talking about data related to communication and data integration with web servers, such as email, CSV, XML, and JSON. HTML, websites, and NoSQL databases are also included. Of course, the accumulation of datasets used to handle computer-to-computer transfers of business documents, known as EDI, also falls into this category.

Image source: MonkeyLearn Blog (Go to original article)

Q. What level of content is covered?

This course will guide users through the installation of Hadoop 3.2.1 on Ubuntu 20.04 LTS. Even if you have no prior Unix or Linux experience, you'll naturally learn installation techniques and the Linux operating system. Beyond the basics of Hadoop's CLI and user language, this course will also help you become familiar with Google's proprietary DFS and MapReduce technologies. Your understanding of YARN will be limited to basic theory. We anticipate a more in-depth study of YARN as you install a cluster in the Hadoop 3.3.0 intermediate course.

Q. Is there a reason you are using Ubuntu 20.04 LTS as a practice environment?

Ubuntu is free to use, and its LTS (Long-Term Service) program targets companies seeking long-term service support. By installing Hadoop on Linux, you can naturally build the operating system and development environment your business needs. By supporting the use of Eclipse and Intelligent within the same environment, you can contribute to realizing the dream of data science, which involves big data, right now.

Ubuntu is a Windows operating system that allows installation and operation.
Similar environment, i.e. GUI (Graphical User Interface)
We are helping users through the environment.

Recommended for
these people

Who is this course right for?

  • Enthusiastic students who want to learn the basics of big data from scratch

  • For those who are thirsty for big data principles and applications

  • For those who want to learn Hadoop to handle big data in their companies

  • For those who have basic knowledge of Java

Need to know before starting?

  • The Concept of Big Data (Understanding Big Data)

  • Virtual Machine

  • Data set terminology

  • Understanding Linux (Ubuntu)

  • Java 15

Hello
This is

588

Learners

40

Reviews

69

Answers

4.5

Rating

2

Courses

네오아베뉴 대표 빌리 리 입니다.

2022년 9월 한국에 가족 모두 귀국한 뒤 현대자동차 빅데이터 프로젝트에 TA 컨설팅 (2022.09 -11월까지)하였고, 에자일 PM과 빅데이터 C-ITS 시스템 구축으로 하둡 에코시스템 및 머신러닝 딥러닝 리드하여 프로젝트 관리자 (PMO)역할을 하였습니다. 이후 Azure Data Factory & Azure Databricks 을 가지고 데이터 관리 기술을 AIA 생명 보험 이노베이션 데이터 플랫폼 팀에서 근무하면서 데이터 과학자로 깊은 탐구와 열정을 불살랐습니다.

2012년에서 2020년 까지 센터니얼 칼리지 Software Eng. Technician 졸업한 열공생이자 한국에서는 9년의 IT 경력 소유자로 금융권 (재무, 금융 프로젝트 및 빅데이터 관련 ) 에 다수 근무했습니다.

1999년 필리핀 (Dasmarinas) 지역에서 P.T.S. 네트워크 엔지니어링 자원 봉사자로 1년 근무하면서 글로벌 IT 세계와 네트워크 지식을 쌓으며 이후 2000년 한국으로 돌아와 K.M.C.에서 Clarion 4GL 언어로 Warehouse Inventory Control and Management 그리고 PIS Operational Test PCS C/C++ 개발했었습니다.

2001년 LG-SOFT SCHOOL 자바 전문가 과정 이수 후 CNMTechnologies 에서 e-CRM/e-SFA R&D 연구 및 개발 2년 정도 (한국산업은행/대정정부청사/영진제약) 다양한 프로젝트를 섭렵하였습니다.

2004년부터 2012년 캐나다로 올 때까지 SKT/SK C&C (IMOS), SC제일은행(TBC), 프로덴션 생명(PFMS), 교보생명 AXA Kyobo Life Insurance Account Management, Kook-min Bank 국민은행 Financial Management Reconstruction NGM외 다수 프로젝트에 참여 개발 및 리드하였습니다.

 

2012년 연말에 캐나다에 거주하면서 세 아이의 아빠이자 Scrum Master로서 에자일 개발 방식을 채택하여 핸디맨 어플/이커머스 어플/프로덱트 개발/레시피 어플 개발한 미주 캐나다 지역의 실경험자입니다.

Curriculum

All

85 lectures ∙ (6hr 39min)

Course Materials:

Lecture resources
Published: 
Last updated: 

Reviews

All

39 reviews

4.5

39 reviews

  • Billy Lee님의 프로필 이미지
    Billy Lee

    Reviews 3

    Average Rating 5.0

    5

    93% enrolled

    이 강의는 빅데이터를 다루는 하둡 전문가로 양성하고 싶은 마음에서 강의를 제작했습니다. 클라우데라와 같은 종합적인 온 프로메스 배포 소프트웨어 어플리케이션(On-Premise Distribution Software: OPD)을 사용하기 보다는 직접 하둡을 처음부터 설치하고 데이터셋을 추출하고 이동 및 로드하는 단계로 여러분을 이동시킬 것입니다. 1.x 버전부터 시작된 하둡은 이제 3.3 버전까지 많은 기능들이 추가되면서 무척 해비한 플랫폼이 되었지만 많은 도구들을 다루며 빅데이터 전문가로 양성되는 마음이 넘치는 강의되기를 바랍니다.

    • 성실한개발자님의 프로필 이미지
      성실한개발자

      Reviews 4

      Average Rating 5.0

      5

      100% enrolled

      장점: 하둡 맵리듀스 기초를 배울 수 있다. 한국어로 된 유일한 하둡 강의인 듯 아쉬운 점: 맵퍼를 두개 사용해서 하나의 공통 키로 추출하거나 키를 두개 쓰는 경우 , 컴퍼레이터를 직접 설정하는 방법 등 궁금했던 내용이 없어서 아쉬웠다. 단점: 강사님 한국어 발음이 명확하지 않은데 배경음악이 커서 여러번 무슨 말을 하는건지 다시 들어야했다. --------------------------------------- 선생님 답변 보고 별점 5로 수정합니다.

      • Billy Lee
        Instructor

        친절히 자세한 평가 감사합니다. 하둡의 이론은 방대하여 모든 일에 손을 댈수가 없다고 말할 수 있네요. 저의 강의를 듣고 하둡 전체를 이해하기는 더더욱 힘들죠. 배경음악을 제거한 뒤 선명한 목소리로 재녹음하였으니 재수강 고맙겠습니다. 업데이트한 강의도 있으니 고요한 시간에 들으시면서 하둡 전문가로 남기를 기대합니다.

    • 김태경님의 프로필 이미지
      김태경

      Reviews 47

      Average Rating 4.5

      5

      59% enrolled

      하둡 입문자에게 좋네요. 책보기 전에 먼저 학습하기 딱 좋은듯 합니다.

      • Billy Lee
        Instructor

        네 좋은 평가 감사합니다. 하둡을 처음 접하는 입문자에게는 현재 시중에 나와있는 책들을 통해 따라가기 쉽지 않습니다. 그 점에서 저의 강의는 김태경님의 평가처럼 책을 구매하기 전 학습하면서 단일노드에서 하둡과 HDFS, YARN 어플리케이션 실행시키는 점을 부각시켰습니다. 효과가 있다면 감사합니다. 더 좋은 강의로 다시 뵙죠. 부디 하둡 전문가로 성장하기를 기대합니다.

    • 이훈태 남자님의 프로필 이미지
      이훈태 남자

      Reviews 56

      Average Rating 5.0

      5

      100% enrolled

      하둡 강의 정말 좋았습니다 ! 스파크 강의도 열렸으면 좋겠습니다. 감사합니다 !

      • Billy Lee
        Instructor

        이 강의를 통해 하둡을 좀더 친근하게 다가서는 계기가 되길 기대합니다. 또한 스파크 강의가 여러분에게 전달되기를 기대합니다. 하둡 전문가되길 토론토에서 응원합니다.

    • 홍태경님의 프로필 이미지
      홍태경

      Reviews 30

      Average Rating 5.0

      5

      31% enrolled

      강사님 이런 질 좋은 강의를 너무 저렴한 가격에 시청할 수 있게 제공해 주셔서 너무 감사합니다! 데이터 엔지니어 신입이 되고 하둡과 스파크등 빅데이터 프레이워크를 어떻게 시작해야 하나 고민하며 두꺼운 책을 고려 하던 때에 과거에 구매 해 놓은 이 강의를 다시 보게 되었습니다 자바를 해본 적이 없어 이해하는데 시간이 좀 걸리겠지만 잘 수료 하겠습니다! 이번 년 말이나, 내년 초에 스파크 강의 예정이시라 하셧는데 너무 너무 기대하겠습니다! 제발 pyspark 기반으로 이 강의처럼 쉽게 설명이 되있길 바랍니다!

      $42.90

      Billy Lee's other courses

      Check out other courses by the instructor!

      Similar courses

      Explore other courses in the same field!