inflearn logo
inflearn logo

[Side Project After Work] Big Data Analysis Certification Practical Exam (Type 1, 2, 3)

We guide non-majors and beginners to quickly obtain the Big Data Analysis Certification (Practical Exam)! Keep the theory light and the practice solid—focusing on core points that are guaranteed to appear on the exam through past questions, without the need for complex background knowledge.

(4.9) 768 reviews

4,957 learners

Level Beginner

Course period 12 months

Engineer Big Data Analysis
Engineer Big Data Analysis
Big Data
Big Data
Python
Python
Pandas
Pandas
Machine Learning(ML)
Machine Learning(ML)
Engineer Big Data Analysis
Engineer Big Data Analysis
Big Data
Big Data
Python
Python
Pandas
Pandas
Machine Learning(ML)
Machine Learning(ML)

News

77 articles

  • roadmap님의 프로필 이미지

    The final results for the 11th Big Data Analytics Engineer Practical Exam have been announced!

    Congratulations to those who passed. If you received disappointing results, let's use this experience as a stepping stone and join us again next year with the determination to grow even more!!

    I will also reflect on this exam content and the feedback you've provided, and come back next year with an even more updated course. 💪💪💪

    And

    I'm a bit embarrassed, but thanks to all of you, I received an award at the Inflearn Awards yesterday! Thank you so much :)

    Wrap up the year well and have a happy Christmas and New Year! 🙇🏼‍♂️🙇🏼‍♂️🙇🏼‍♂️

    IMG_4398.JPG

    1
  • roadmap님의 프로필 이미지

    We'll have to see how it turns out, but I've organized it with the 11th exam video.

    https://youtu.be/X_fcHPYcPMo

    0
  • roadmap님의 프로필 이미지

    Congratulations to everyone who took the Big Data Analytics Engineer exam - great job! 😊

    Excluding ttest and sensitivity

    How did you find it compared to previous exams? I've heard opinions that it was similar to past questions and relatively manageable, but I'm curious about your experience! 🤔

    5
  • roadmap님의 프로필 이미지

    Edited

    Why is equal_var=True when the problem doesn't mention equal variance?
    Thank you to Song** for your question.

    In the Type 3 Work - Subproblem 3 of the practice problem,
    the term "equal variance" does not directly appear in the problem text.

    However, in the solution, it is as follows:

    #3
    from scipy import stats
    result = stats.ttest_ind(df[cond1]['Resistin'], df[cond2]['Resistin'], equal_var = True)
    print(round(result.pvalue,3))

    I used the equal variance assumption (Student's t-test).
    The reasons are as follows.

    The problem was a typical three-stage testing problem structured with the following flow.

    • # Checking Variance Differences Between Two Groups with F-test

    • Calculating the Pooled Variance Estimator

    • Perform independent samples t-test using the pooled variance

    The very statement of calculating pooled variance already presupposes the assumption that the variances of the two groups are equal.

    Therefore, I approached the solution using equal_var=True.


    Additionally,

    • Single-sample t-test: Equal variance test not required (no two groups to compare)

    • Paired t-test: Equal variance test not required (uses only difference values)

    • Independent Samples t-test: Considering Equal Variance Test

    0
  • roadmap님의 프로필 이미지

    Tomorrow is the Big Data Analytics Engineer exam.

    I wish you well on your exam, and I've organized examples of problem expressions for the practical type 3 questions.

    Good luck on your exam 👏👏

    image.png

    Example Problem Type Learning

    - Non-parametric methods are excluded due to low priority

    0
  • roadmap님의 프로필 이미지

    Edited

    Differences between Past Exam Questions vs Practice Problems

    In past exam questions or example problems, there were no cases where columns were deleted.

    However, when dealing with more complex data in practice/mock problems, situations arise where column deletion becomes necessary.

    1⃣ When all values are unique

    # Example: ID, customer number, order number, etc.
    df['customer_id'].nunique() == len(df)  # Consider deletion if True
    • Numeric: Even if left as is, the model automatically evaluates it with low importance

      • No major issues even if not deleted

    • String type: Deletion recommended due to dimension explosion during encoding!

      • Label Encoding creates meaningless ordinal relationships

      • When One-Hot Encoding is applied, the number of columns = number of rows increases rapidly. (Can only be digested within 1 minute)

    2⃣ When encoding is difficult

    # Example: Free text, addresses, emails, etc.
    df['comment'].head()
    # "Fast delivery", "Clean packaging", "Will repurchase"...
    • Baseline: Delete first and run the model

    • Advanced Strategy: If you have time left, think about ways to save it

      • Creating derived variables such as text length, presence of specific keywords, etc.

      • ex) Flight number(KE1234) → Airline(KE) + Flight number(1234) extracted separately

    3⃣ When there are excessively many missing values (80-90% or more)

    df['컬럼'].isnull().sum() / len(df)
    • Baseline: Delete first and play it safe

    • Advanced Strategy: If you have time left, think about ways to save it

      • Replace the missing status itself with random values

        Comparison of deleted evaluation indicator results and results after filling

    💡 What if you encounter columns that are difficult to process like the above?

    1. Phase 1: Quickly Complete the Baseline (30~40 minutes)

      • Cases 2 and 3 should be boldly deleted

      • For item 1, if it's a string type, delete it; if it's a numeric type, it's OK to leave it as is.

      • Complete the code that can be submitted for now

    2. Phase 2: Advanced Topics if Time Permits (only when there's spare time)

      • Attempting to recover deleted columns

      • Performance improvement verification

    Precautions

    • Time management is the top priority! Submittable code is more important than perfect preprocessing

    • Delete from the baseline and resubmit after the 1st submission, then try again when there's time left! 2nd submission

    0
  • roadmap님의 프로필 이미지

    1. ANOVA / Two-way ANOVA / One-way ANOVA

    → For categorical factors, C() is the standard practice

    Yes:

    model = ols("y ~ C(group)", data=df).fit()
    anova_lm(model)
    • ANOVA is originally an analysis that compares "differences in means between groups" → factors are categorical.

    • Therefore, even if the problem doesn't explicitly state "categorical" in words,

    • Since the factor itself is a group variable, C() is the default.

    In other words,
    Even if it's in numbers → C()
    Even if it's in text → C()


    2. Regression Analysis (OLS)

    Only variables explicitly specified as categorical in the problem should use C()

    Yes:

    ols("y ~ x1 + region", data=df)
    
    • Just because it's in numbers doesn't mean it should automatically be treated as categorical data - that's wrong.

    • Treat numeric variables as continuous unless the problem specifically states they are "categorical variables"


    3. Logistic Regression (logit)

    Same principle as ols

    Yes:

    logit("target ~ x1 + job_type", data=df)
    
    • logit only needs C() when the problem explicitly states "categorical".
      Otherwise, never automatically add C().

    0
  • roadmap님의 프로필 이미지

    Unfortunately, there are no execution shortcuts.

    • Comment: Ctrl + /
      Multi-line comment: Select block then Ctrl + /

    • Zoom In: Ctrl + '+'

    • Zoom out: Ctrl + '-' If the monitor is small...

    • Move to beginning of line: Ctrl + Left arrow key mainly used when bracketing

    • Move to end of line: Ctrl + Right arrow key Mainly used when bracketing

    • Find (Search): Ctrl + f

      • Ctrl + f can also be used in the basic data tab

        image.png
      • Copy and paste the content output from dir and help commands to 'Notepad' (must be done with mouse)

      • Search functionality is available

      • Search is not possible within the execution results (output) itself

        image.png

    Hands-on Experience Link

    https://dataq.goorm.io/exam/3/%EB%B9%85%EB%8D%B0%EC%9D%B4%ED%84%B0%EB%B6%84%EC%84%9D%EA%B8%B0%EC%82%AC-%EC%8B%A4%EA%B8%B0-%EC%B2%B4%ED%97%98/quiz/2%3Fembed

    0

$93.50