[Side Project After Work] Big Data Analysis Certification Practical Exam (Type 1, 2, 3)
We guide non-majors and beginners to quickly obtain the Big Data Analysis Certification (Practical Exam)! Keep the theory light and the practice solid—focusing on core points that are guaranteed to appear on the exam through past questions, without the need for complex background knowledge.
4,964 learners
Level Beginner
Course period 12 months

✅ Practical Exam Type 2: When do you delete columns?
Differences between Past Exam Questions vs Practice Problems
In past exam questions or example problems, there were no cases where columns were deleted.
However, when dealing with more complex data in practice/mock problems, situations arise where column deletion becomes necessary.
1⃣ When all values are unique
# Example: ID, customer number, order number, etc.
df['customer_id'].nunique() == len(df) # Consider deletion if TrueNumeric: Even if left as is, the model automatically evaluates it with low importance
No major issues even if not deleted
String type: Deletion recommended due to dimension explosion during encoding! ⚠
Label Encoding creates meaningless ordinal relationships
When One-Hot Encoding is applied, the number of columns = number of rows increases rapidly. (Can only be digested within 1 minute)
2⃣ When encoding is difficult
# Example: Free text, addresses, emails, etc.
df['comment'].head()
# "Fast delivery", "Clean packaging", "Will repurchase"...Baseline: Delete first and run the model
Advanced Strategy: If you have time left, think about ways to save it
Creating derived variables such as text length, presence of specific keywords, etc.
ex) Flight number(KE1234) → Airline(KE) + Flight number(1234) extracted separately
3⃣ When there are excessively many missing values (80-90% or more)
df['컬럼'].isnull().sum() / len(df)Baseline: Delete first and play it safe
Advanced Strategy: If you have time left, think about ways to save it
Replace the missing status itself with random values
Comparison of deleted evaluation indicator results and results after filling
💡 What if you encounter columns that are difficult to process like the above?
Phase 1: Quickly Complete the Baseline (30~40 minutes)
Cases 2 and 3 should be boldly deleted
For item 1, if it's a string type, delete it; if it's a numeric type, it's OK to leave it as is.
Complete the code that can be submitted for now
Phase 2: Advanced Topics if Time Permits (only when there's spare time)
Attempting to recover deleted columns
Performance improvement verification
⚠ Precautions
Time management is the top priority! Submittable code is more important than perfect preprocessing
Delete from the baseline and resubmit after the 1st submission, then try again when there's time left! 2nd submission




