[Side Project After Work] Big Data Analysis Certification Practical Exam (Type 1, 2, 3)
We guide non-majors and beginners to quickly obtain the Big Data Analysis Certification (Practical Exam)! Keep the theory light and the practice solidโfocusing on core points that are guaranteed to appear on the exam through past questions, without the need for complex background knowledge.
4,964 learners
Level Beginner
Course period 12 months

Practical Problem Type 3 Frequently Asked Questions: When to use C()?
โ 1. ANOVA / Two-way ANOVA / One-way ANOVA
โ For categorical factors, C() is the standard practice
Yes:
model = ols("y ~ C(group)", data=df).fit()
anova_lm(model)ANOVA is originally an analysis that compares "differences in means between groups" โ factors are categorical.
Therefore, even if the problem doesn't explicitly state "categorical" in words,
Since the factor itself is a group variable, C() is the default.
In other words,
โ Even if it's in numbers โ C()
โ Even if it's in text โ C()
โ2. Regression Analysis (OLS)
โกOnly variables explicitly specified as categorical in the problem should use C()
Yes:
ols("y ~ x1 + region", data=df)
Just because it's in numbers doesn't mean it should automatically be treated as categorical data - that's wrong.
Treat numeric variables as continuous unless the problem specifically states they are "categorical variables"
โ3. Logistic Regression (logit)
โกSame principle as ols
Yes:
logit("target ~ x1 + job_type", data=df)
logit only needs C() when the problem explicitly states "categorical".
Otherwise, never automatically add C().




