inflearn logo

HADOOP ECOSYSTEM : PRACTICAL OPTIMIZATION OF BIGDATA PROCESS

This course aims to overcome the technical limitations practitioners face in vast big data environments and to build systematic data management capabilities through the Hadoop ecosystem. Through this course, students will gain the following core values: Practical Problem Solving: Understand the limitations of existing systems as data scales and learn efficient distributed processing methods using Hadoop. Ensuring Data Stability: Learn practical optimization techniques to safely protect original data through Hive's external table design, even if a table is accidentally deleted. Maximizing Analysis Efficiency: Acquire expert-level skills to freely control and analyze large-scale data using HiveQL without complex programming. Expert Guidance: A knowledge sharer with years of IT education experience and know-how will directly pass on practical skills that can be applied immediately in the field. Join this 30-lecture journey that will transform you into a distinguished data engineer in the massive wave of big data.

4 learners are taking this course

Level Intermediate

Course period Unlimited

Java
Java
SQL
SQL
Hadoop
Hadoop
Linux
Linux
hiveql
hiveql
Java
Java
SQL
SQL
Hadoop
Hadoop
Linux
Linux
hiveql
hiveql

What you will gain after the course

  • Acquiring practical big data infrastructure construction and management skills: Beyond mere theory, you can fully master system operation techniques that are immediately applicable in the field, such as NameNode formatting, firewall configuration, and service execution within Hadoop 1.0.4 and Hive 0.9.0 environments.

  • Strengthening Data Stability and Efficient Analysis Design Capabilities: You will gain a clear understanding of the differences between Hive's internal and external tables, and acquire practical optimization design techniques—specifically utilizing the LOCATION option—to safely preserve original data even if the table structure is deleted.

  • Acquire large-scale data control skills without complex coding: You can develop expert-level capabilities to freely analyze and manage terabyte-scale or larger data using HiveQL—a familiar SQL-based approach—without having to directly perform complex Java-based MapReduce programming.

  • Infrastructure Setup: The entire process from HDFS NameNode formatting to firewall configuration and service activation

  • Data Stability: Securing data persistence through external table design

  • Practical Analysis: Metadata management and structured data processing techniques using HiveQL

  • Business Value: Completing a large-scale data processing process that goes beyond the limits of Excel

1. Problem Statement: "Data is overflowing, so why do we still feel limited in its utilization?"

As corporate data scales beyond Terabytes (TB) into the Petabyte (PB) era, traditional Relational Database Management Systems (RDBMS) can no longer solve issues regarding processing speed and cost. In particular, practitioners often face practical barriers such as "where and how to store data safely" and "whether large-scale data can be analyzed using only SQL without complex coding." Anxiety over data loss and decreased management efficiency lead directly to the failure of big data projects.

2. Result-Oriented Solution: "Achieving both data sovereignty and analysis efficiency through the combination of Hive and Hadoop"

This course aims to provide a perfect understanding of the core mechanisms of big data infrastructure through Hadoop 1.0.4 and Hive 0.9.0 environments. Through hands-on practice, students will clearly distinguish between internal and external tables and acquire practical optimization design techniques, such as using the LOCATION option to preserve original data even if the table structure is deleted. As a result, students will transform into analysis experts who can freely control large-scale data through HiveQL without the need for complex MapReduce programming.

3. Message from the Instructor: "Delivering skills that go beyond theory and are immediately applicable in the field"

" Hello, I am Young-hwan Jang, an IT technology education expert dedicated to helping you grow. This 30-session curriculum is not just a list of knowledge, but a distillation of the numerous trials, errors, and know-how I have experienced in the field. The foundation of AI and machine learning, the core of the Fourth Industrial Revolution, is ultimately 'data.' Through this course, I hope you will equip yourself with a powerful weapon to confidently navigate the vast flow of big data. I will be your reliable guide on your data engineering journey.

Recommended for
these people

Who is this course right for?

  • Engineers who want to design stable infrastructure without data loss: This is suitable for those who want to accurately understand the concept of Hive's External Tables and learn practical design techniques to safely preserve original data even in the event of system failures or accidental data deletion.

  • Analysts who want to process large-scale data without complex coding: Recommended for those who want to gain the ability to freely analyze and control large-scale data of terabyte-level or higher by utilizing HiveQL, a familiar SQL-based method, instead of complex Java-based MapReduce programming.

  • Beginners aiming to fully master the Hadoop ecosystem from the basics to practical operation: This is useful for those who want to systematically organize the overall flow of big data engineering by directly practicing the entire process of building a Hadoop environment, including NameNode formatting, firewall configuration, and service operation.

Need to know before starting?

  • Basic Linux Operating Skills: You must be familiar with executing shell-based commands such as start-all.sh to run Hadoop services, and a basic understanding of firewall settings and log file management on Linux systems is required.

  • Basic knowledge of SQL (Structured Query Language): Since Hive uses HiveQL, which is similar to SQL, to process data, you must be familiar with basic query language structures such as creating tables (CREATE), querying data (SELECT), and dropping tables (DROP).

  • HDFS and MapReduce Concepts: If you have a prior understanding of how the Hadoop Distributed File System (HDFS) works and the flow of MapReduce jobs, you can more quickly grasp the mechanisms by which Hive manages data within the Hadoop ecosystem.

  • Database Design Basics: Since this includes a practice session on designing internal and external tables separately to improve data analysis efficiency, basic concepts regarding table schema and data path (location) settings will be helpful.

  • VirtualBox Key Usage (Setting up a Practice Environment)

  • Essential Prerequisite Knowledge for Hadoop Practice

Hello
This is ywjang23583

I worked as a developer at LG Electronics, a telecommunications company, for about 27 years. Since retiring, I have been teaching liberal arts software coding at various universities, as well as conducting lectures at vocational schools and government offices. Currently, I am teaching an IoT course at a vocational training school.

I would like to record and share lectures on the following topics.

1. R Statistics Basic/Advanced Course

2. Arduino for the sensor data collection part of IoT (Internet of Things) technical methods

3. Raspberry Pi Technology

4.Basic/Advanced Course for AI Utilization (Understanding Basic Algorithms and Tool Usage)

5. Systemic platform implementation techniques for smart farm configuration

6. Visualization techniques using Tableau and PowerBI technology

7. Six Sigma techniques in the field

8. Building a Big Data Analysis Hadoop Ecosystem

More

Curriculum

All

4 lectures ∙ (1hr 22min)

Course Materials:

Lecture resources
Published: 
Last updated: 

Reviews

Not enough reviews.
Please write a valuable review that helps everyone!

Similar courses

Explore other courses in the same field!

Limited time deal ends in 8 days

$20.90

69%

$68.20