BEST

[Side Project After Work] Big Data Analysis Certification Practical Exam (Type 1, 2, 3)

We guide non-majors and beginners to quickly obtain the Big Data Analysis Certification (Practical Exam)! Keep the theory light and the practice solid—focusing on core points that are guaranteed to appear on the exam through past questions, without the need for complex background knowledge.

(4.9) 768 reviews

4,957 learners

Level Beginner

Course period 12 months

roadmap

Engineer Big Data Analysis

Big Data

Python

Pandas

Machine Learning(ML)

Engineer Big Data Analysis

Big Data

Python

Pandas

Machine Learning(ML)

News

77 articles

roadmap
2 months ago
Have a great end of the year and a happy Christmas and New Year! 🙇🏼‍♂️🙇🏼‍♂️🙇🏼‍♂️
The final results for the 11th Big Data Analytics Engineer Practical Exam have been announced!
Congratulations to those who passed. If you received disappointing results, let's use this experience as a stepping stone and join us again next year with the determination to grow even more!!
I will also reflect on this exam content and the feedback you've provided, and come back next year with an even more updated course. 💪💪💪
And
I'm a bit embarrassed, but thanks to all of you, I received an award at the Inflearn Awards yesterday! Thank you so much :)
Wrap up the year well and have a happy Christmas and New Year! 🙇🏼‍♂️🙇🏼‍♂️🙇🏼‍♂️
1
roadmap
3 months ago
Big Data Analytics Engineer Certification Exam Round 11: Did I pass?
We'll have to see how it turns out, but I've organized it with the 11th exam video.
https://youtu.be/X_fcHPYcPMo
0
roadmap
3 months ago
Great job on completing the 11th exam. 👏👏👏
Congratulations to everyone who took the Big Data Analytics Engineer exam - great job! 😊
Excluding ttest and sensitivity
How did you find it compared to previous exams? I've heard opinions that it was similar to past questions and relatively manageable, but I'm curious about your experience! 🤔
5
roadmap
3 months ago
･
Edited
Why is equal_var=True when the problem doesn't mention equal variance?
Why is equal_var=True when the problem doesn't mention equal variance?
Thank you to Song** for your question.
In the Type 3 Work - Subproblem 3 of the practice problem,
the term "equal variance" does not directly appear in the problem text.
However, in the solution, it is as follows:
#3 from scipy import stats result = stats.ttest_ind(df[cond1]['Resistin'], df[cond2]['Resistin'], equal_var = True) print(round(result.pvalue,3))
I used the equal variance assumption (Student's t-test).
The reasons are as follows.
The problem was a typical three-stage testing problem structured with the following flow.
# Checking Variance Differences Between Two Groups with F-test
Calculating the Pooled Variance Estimator
Perform independent samples t-test using the pooled variance
The very statement of calculating pooled variance already presupposes the assumption that the variances of the two groups are equal.
Therefore, I approached the solution using equal_var=True.

Additionally,
Single-sample t-test: Equal variance test not required (no two groups to compare)
Paired t-test: Equal variance test not required (uses only difference values)
Independent Samples t-test: Considering Equal Variance Test
0
roadmap
3 months ago
**Example Summary of Task Type 3 Problem Expressions**
Tomorrow is the Big Data Analytics Engineer exam.
I wish you well on your exam, and I've organized examples of problem expressions for the practical type 3 questions.
Good luck on your exam 👏👏
Example Problem Type Learning
- Non-parametric methods are excluded due to low priority
0
roadmap
3 months ago
･
Edited
✅ Practical Exam Type 2: When do you delete columns?
Differences between Past Exam Questions vs Practice Problems
In past exam questions or example problems, there were no cases where columns were deleted.
However, when dealing with more complex data in practice/mock problems, situations arise where column deletion becomes necessary.
1⃣ When all values are unique
# Example: ID, customer number, order number, etc. df['customer_id'].nunique() == len(df) # Consider deletion if True
Numeric: Even if left as is, the model automatically evaluates it with low importance
No major issues even if not deleted
String type: Deletion recommended due to dimension explosion during encoding! ⚠
Label Encoding creates meaningless ordinal relationships
When One-Hot Encoding is applied, the number of columns = number of rows increases rapidly. (Can only be digested within 1 minute)
2⃣ When encoding is difficult
# Example: Free text, addresses, emails, etc. df['comment'].head() # "Fast delivery", "Clean packaging", "Will repurchase"...
Baseline: Delete first and run the model
Advanced Strategy: If you have time left, think about ways to save it
Creating derived variables such as text length, presence of specific keywords, etc.
ex) Flight number(KE1234) → Airline(KE) + Flight number(1234) extracted separately
3⃣ When there are excessively many missing values (80-90% or more)
df['컬럼'].isnull().sum() / len(df)
Baseline: Delete first and play it safe
Advanced Strategy: If you have time left, think about ways to save it
Replace the missing status itself with random values
Comparison of deleted evaluation indicator results and results after filling
💡 What if you encounter columns that are difficult to process like the above?
Phase 1: Quickly Complete the Baseline (30~40 minutes)
Cases 2 and 3 should be boldly deleted
For item 1, if it's a string type, delete it; if it's a numeric type, it's OK to leave it as is.
Complete the code that can be submitted for now
Phase 2: Advanced Topics if Time Permits (only when there's spare time)
Attempting to recover deleted columns
Performance improvement verification
⚠ Precautions
Time management is the top priority! Submittable code is more important than perfect preprocessing
Delete from the baseline and resubmit after the 1st submission, then try again when there's time left! 2nd submission
0
roadmap
3 months ago
Practical Problem Type 3 Frequently Asked Questions: When to use C()?
✅1. ANOVA / Two-way ANOVA / One-way ANOVA
→ For categorical factors, C() is the standard practice
Yes:
model = ols("y ~ C(group)", data=df).fit() anova_lm(model)
ANOVA is originally an analysis that compares "differences in means between groups" → factors are categorical.
Therefore, even if the problem doesn't explicitly state "categorical" in words,
Since the factor itself is a group variable, C() is the default.
In other words,
✔ Even if it's in numbers → C()
✔ Even if it's in text → C()
❌2. Regression Analysis (OLS)
➡Only variables explicitly specified as categorical in the problem should use C()
Yes:
ols("y ~ x1 + region", data=df)
Just because it's in numbers doesn't mean it should automatically be treated as categorical data - that's wrong.
Treat numeric variables as continuous unless the problem specifically states they are "categorical variables"
❌3. Logistic Regression (logit)
➡Same principle as ols
Yes:
logit("target ~ x1 + job_type", data=df)
logit only needs C() when the problem explicitly states "categorical".
Otherwise, never automatically add C().
0
roadmap
3 months ago
Chrome browser shortcuts for exam environment
Unfortunately, there are no execution shortcuts.
Comment: Ctrl + /
Multi-line comment: Select block then Ctrl + /
Zoom In: Ctrl + '+'
Zoom out: Ctrl + '-' If the monitor is small...
Move to beginning of line: Ctrl + Left arrow key mainly used when bracketing
Move to end of line: Ctrl + Right arrow key Mainly used when bracketing
Find (Search): Ctrl + f
Ctrl + f can also be used in the basic data tab
Copy and paste the content output from dir and help commands to 'Notepad' (must be done with mouse)
Search functionality is available
Search is not possible within the execution results (output) itself
Hands-on Experience Link
https://dataq.goorm.io/exam/3/%EB%B9%85%EB%8D%B0%EC%9D%B4%ED%84%B0%EB%B6%84%EC%84%9D%EA%B8%B0%EC%82%AC-%EC%8B%A4%EA%B8%B0-%EC%B2%B4%ED%97%98/quiz/2%3Fembed
0

[Side Project After Work] Big Data Analysis Certification Practical Exam (Type 1, 2, 3)

News

Differences between Past Exam Questions vs Practice Problems

1⃣ When all values are unique

2⃣ When encoding is difficult

3⃣ When there are excessively many missing values (80-90% or more)

💡 What if you encounter columns that are difficult to process like the above?

⚠ Precautions

✅1. ANOVA / Two-way ANOVA / One-way ANOVA

❌2. Regression Analysis (OLS)

❌3. Logistic Regression (logit)