인프런 영문 브랜드 로고
인프런 영문 브랜드 로고
BEST
Data Science

/

Data Engineering

Data Engineering Course (1): Installing Big Data Hadoop Directly

Students who want to learn Hadoop and Big Data will celebrate the amazing advancements in experiencing the world of Big Data through this course!

(4.6) 35 reviews

536 students

Big Data
Hadoop
Data Engineering
Thumbnail

This course is prepared for Basic Learners.

What you will learn!

  • Encountering big data technology in everyday life

  • Handling Big Data with Hadoop

  • Learn distributed processing technology for handling big data with Hadoop

  • Handling Hadoop Big Data Using Java Language

  • Learn techniques to overcome the limitations of relational data processing with Hadoop

  • Learn about Hadoop's various projects and interfaces.

It's the era of big data! 👨‍💻
Become an expert with Hadoop.

The center of data science,
The trend is Hadoop!

Many IT giants, social media services, etc. are using Hadoop (Apache Hadoop) for big data analysis and processing. Hadoop is a Java-based framework that was created to process large amounts of data at low cost, and it distributes and stores and processes large data sets. But what if you could rise to the ranks of big data experts through Hadoop?

Through data analysis, companies will be able to open up new markets, provide unique value, and provide new consumers with the pleasure of providing them with the information they need in real time. Since big data is also an essential issue that small and medium-sized businesses must deal with, this is great news for those who dream of getting a job or changing jobs in a big data-related field .

BigData with Hadoop

Google, Yahoo, Facebook, IBM, Instagram, Twitter, etc.
Used by many companies for data analysis
Through Hadoop, a representative big data solution
Let's build a big data distributed system infrastructure .

This lecture will begin with an understanding of big data terms and indirectly experience the process of handling big data through open source software Hadoop . Through this lecture, students will be able to experience the world of big data technology and the world of the 4th industrial revolution at the same time.

What is Hadoop?

  • Hadoop is open source software that anyone can use for free.
    In this lecture, we will cover big data using Hadoop version 3.2.1 .

Understanding big data
How to use Hadoop
OK at once.

Big Data
About the term
Essential Understanding
Hadoop 's
Concept and Use
Introduction to Korea
Through Hadoop
Big data processing
Learning Tutorials

I recommend this to these people!

Of course, those who don't fit this category are also welcome. (Beginners are doubly welcome ✌)

Employment/Job Change
Future IT in consideration
Data Science Prep
Via Java/Python
I want to deal with big data
People who do it
With interest and curiosity
About big data
Anyone who wants to experience it
Hadoop 3.x version
Data environment, etc.
A worker who wants to experience

Before taking the class, please check your prerequisite knowledge!

  • Prerequisite knowledge is the basics of the Java programming language, knowledge of big data and virtual machine/data set related terminology , and basic understanding of Linux Ubuntu .

The following content
I am learning.

1. Understanding virtualization technology challenges and guest operating systems

We will learn virtualization technology that is advantageous for server integration and learn the method of separating multiple servers with a single OS through OS-level virtualization. Anyone will be able to challenge and produce and operate a large number of servers through Ubuntu, an open source solution that is a virtualization method that can be applied to Linux. Furthermore, you will be able to accumulate knowledge about guest operating systems as well as a large amount of technical experience in changing big data into distributed technology through a large number of servers. By using server virtualization, you can enjoy the privilege(?) of experiencing a variety of operating systems in a highly efficient virtual machine on a single physical server or operating system.

  • Learn about the definition and practical application cases of Big Data.
  • Understand the terminology related to Hadoop, the data processing software preferred by enterprises.
Data Sizes
The Landscape: Big Data

2. How to install Hadoop on Ubuntu 20.04 LTS and manipulate commands

We will learn the basics of using Linux CLI (Command Line Interface) tools that Front-End developers naturally encounter when developing web applications, and naturally learn the Linux terminal that handles Hadoop. Of course, we will learn the various things to use Ubuntu like a circle in a non-Windows-based GUI environment, and naturally lead you to an intermediate level beyond understanding the Linux system, such as the shell configuration file.

  • Let's install and set up Linux (Ubuntu 20.04 LTS) as a virtual machine on a Windows 10-based laptop.
  • Install Hadoop version 3.2.1 on a Linux virtual machine.
Hadoop 2.x Architecture
Hadoop 2.x vs. 3.x

3. Hadoop 3.2.1 Latest Direction Guide & Understanding Core Architecture Structure

The beginning of big data for unstructured data processing is understanding Hadoop Distributed File System (HDFS), a model of Google's file system, MapReduce, and YARN for cluster expansion and resource management. We will look at the architecture structure of Hadoop versions 1, 2, and 3 one by one, and give students a picture of the history of Hadoop technology.

  • Understand and integrate with the Hadoop Distributed File System (HDFS).
  • Understand the principles of the Map/Reduce framework and analyze data based on it.
HDFS Architecture
YARN Core Components

4. HDFS Shell Operation Guide and Creating MapReduce Applications with Java/Python

There are many technologies used to manipulate data, but the foundation of big data analysis is in creating MapReduce applications. From the basic WordCount MapReduce application in the programming language Python to the COVID-19 application in the Eclipse-based Java language, creating a variety of big data MapReduce applications will now be a direction that goes beyond choice and becomes essential.

  • Let's connect Hadoop with Java and implement an application.
  • Let's connect Hadoop with Python and implement an application.
Python Map/Reduce WordCount Application
Java Map/Reduce WordCount Application

Expected Questions Q&A!

Q. What is big data? Is its definition necessary when using Hadoop?

Yes, of course, when dealing with Hadoop, a brief definition and understanding of big data is required. Of course, it is not to the extent of requiring a perfect and in-depth level of knowledge. However, it is in the form of requiring a level of understanding that is absolutely necessary when dealing with Hadoop.

Big data deals with very large data sets using Hadoop tools. These data sets are the basic data that many companies analyze to identify various patterns and trends. They are related to the creation of human value through human social behavior and patterns and interactions.

Image Source: TechTarget (Go to original article)

Q. What is Hadoop? What are its components? What is the Hadoop stack?

Data from large-scale social sites ranging from terabytes to petabytes Hadoop is helping with the mission that needs to be processed. The Hadoop Stack refers to an open source framework method for handling such big data.

Simply put, 'Hadoop' is called the 'Hadoop stack'. Hadoop and the Hadoop stack help to create a cluster using cheap and common commodity hardware and process large-scale processes within the cluster, which is a collection of large servers. The Hadoop stack is also called a 'simple batch process' and is a Java-based 'distributed computing platform'. So, it processes as much data as an individual wants by batching it periodically, and processes and distributes the data into the desired form to produce the results.

Q. Is programming knowledge required?

It's okay if you don't have any programming knowledge or experience writing code. I will teach you with a deep understanding so that you can think of Java or Python as your first experience. The documents used in the lectures are in English, but I will teach in Korean so that you can follow along without any difficulties. I will occasionally explain in English, but I think you can interpret it at a high school level. (Just like I achieved my dream even with my poor English skills.)

Q. How much does big data have to do with Hadoop?

This course naturally deals with Hadoop. Beyond simply RDMS such as Oracle, MSSQL, or MYSQL, we aim to create essential elements for businesses such as large-scale processing, data processing speed, and low-cost effectiveness. In particular, Hadoop will deal with not only structured data, which deals with relational data handled by RDMS based on rows and columns, but also unstructured data, such as images, audio, and word processing files themselves.

When dealing with service structure data, we are talking about data related to communication and data linkage with web servers such as Email, CSV, XML, and JSON. HTML, Web Sites, and NoSQL Databases are also included here. Of course, the accumulation of datasets used to handle computer-to-computer movement processing problems related to business document movement called EDI also falls into this category.

Image Source: MonkeyLearn Blog (Go to original text)

Q. What level of content is covered?

This lecture will help users install Hadoop 3.2.1 on Ubuntu 20.04 LTS. Even if you have no experience with Unix or Linux, you will naturally learn the installation techniques and Linux operating system based on Linux if you follow along. In addition, it will help you become familiar with Google's DFS and MapReduce technologies beyond the basics of learning the CLI language and user language used by Hadoop. You will only have a basic understanding of YARN, and we hope that you will look forward to learning more in-depth about YARN while installing the cluster in the Hadoop 3.3.0 intermediate course.

Q. Is there a reason why you are using Ubuntu 20.04 LTS as a practice environment?

Ubuntu is free to use, and through LTS (Long-Term Service), it is aimed at companies that dream of long-term service support, and it helps companies naturally build the operating system and development environment they require by installing Hadoop on Linux. By helping to use Eclipse or Intelligent within the same environment, it will be a good time to contribute together to realizing the dream of data science that deals with big data right now.

Ubuntu is a Windows operating system that allows installation and operation.
Similar environment, i.e. GUI (Graphical User Interface)
We're helping users through the environment.

Recommended for
these people!

Who is this course right for?

  • Enthusiastic students who want to learn the basics of big data from scratch

  • For those who are thirsty for big data principles and applications

  • For those who want to learn Hadoop to handle big data in their companies

  • For those who have basic knowledge of Java

Need to know before starting?

  • The Concept of Big Data (Understanding Big Data)

  • Virtual Machine

  • Data set terminology

  • Understanding Linux (Ubuntu)

  • Java 15

Hello
This is

557

Students

36

Reviews

69

Answers

4.6

Rating

2

Courses

네오아베뉴 대표 빌리 리 입니다.

2022년 9월 한국에 가족 모두 귀국한 뒤 현대자동차 빅데이터 프로젝트에 TA 컨설팅 (2022.09 -11월까지)하였고, 에자일 PM과 빅데이터 C-ITS 시스템 구축으로 하둡 에코시스템 및 머신러닝 딥러닝 리드하여 프로젝트 관리자 (PMO)역할을 하였습니다. 이후 Azure Data Factory & Azure Databricks 을 가지고 데이터 관리 기술을 AIA 생명 보험 이노베이션 데이터 플랫폼 팀에서 근무하면서 데이터 과학자로 깊은 탐구와 열정을 불살랐습니다.

2012년에서 2020년 까지 센터니얼 칼리지 Software Eng. Technician 졸업한 열공생이자 한국에서는 9년의 IT 경력 소유자로 금융권 (재무, 금융 프로젝트 및 빅데이터 관련 ) 에 다수 근무했습니다.

1999년 필리핀 (Dasmarinas) 지역에서 P.T.S. 네트워크 엔지니어링 자원 봉사자로 1년 근무하면서 글로벌 IT 세계와 네트워크 지식을 쌓으며 이후 2000년 한국으로 돌아와 K.M.C.에서 Clarion 4GL 언어로 Warehouse Inventory Control and Management 그리고 PIS Operational Test PCS C/C++ 개발했었습니다.

2001년 LG-SOFT SCHOOL 자바 전문가 과정 이수 후 CNMTechnologies 에서 e-CRM/e-SFA R&D 연구 및 개발 2년 정도 (한국산업은행/대정정부청사/영진제약) 다양한 프로젝트를 섭렵하였습니다.

2004년부터 2012년 캐나다로 올 때까지 SKT/SK C&C (IMOS), SC제일은행(TBC), 프로덴션 생명(PFMS), 교보생명 AXA Kyobo Life Insurance Account Management, Kook-min Bank 국민은행 Financial Management Reconstruction NGM외 다수 프로젝트에 참여 개발 및 리드하였습니다.

 

2012년 연말에 캐나다에 거주하면서 세 아이의 아빠이자 Scrum Master로서 에자일 개발 방식을 채택하여 핸디맨 어플/이커머스 어플/프로덱트 개발/레시피 어플 개발한 미주 캐나다 지역의 실경험자입니다.

Curriculum

All

85 lectures ∙ (6hr 39min)

Course Materials:

Lecture resources
Published: 
Last updated: 

Reviews

Not enough reviews.
Become the author of a review that helps everyone!