인프런 커뮤니티 질문&답변

ycann

작성한 질문수

[퇴근후딴짓] 빅데이터 분석기사 실기 (작업형1,2,3)

"모의고사2" 질문

해결된 질문

작성

381

선생님!!

궁금한 점을 질문드리면 항상 빠른 답변 드리시는 점에 감사드립니다.

"모의고사2" 관련하여 궁금한점을 질문드립니다.

평가지표가 f1 인데, f1도 점수가 높은 경우 성능이 좋은건가요? 만약 하이퍼파라미터 투닝을 할 경우 f1 점수가 높게 나오는 것으로 test 데이터를 에측해서 csv 화일을 작성해야 하는 건지요?
'모의고사2'를 시험환경에서 연습하는데, 랜덤포레스트는 큰 문제 없이 진행되었는데, xgboost 모델로 할 경우 진행이 안되고 에러가 발생합니다.
이유가 무엇이고 어떻게 수정해야 하는지요?

python 머신러닝 빅데이터 pandas 빅데이터분석기사

답변 2

퇴근후딴짓

지식공유자

네~ f1스코어 높은 것이 좋습니다.
분류 평가 지표의 경우 대부분 높은 것이 좋고
회귀 평가 지표는 e가 붙어 있으면 에러 값이에요 대부분 작을 수록 좋습니다 (R2제외)
전체코드를 올려주시겠어요?

ycann

질문자

아래 제가 작성한 코드입니다.

import pandas as pd

train = pd.read_csv("train.csv")

test = pd.read_csv("test.csv")

pd.set_option('display.max_columns', None)

# print(train.shape, test.shape)

# print(train.head(3))

# print(train.info())

# print(train.describe())

# print(train.isnull().sum())

from sklearn.model_selection import train_test_split

X_tr, X_val, y_tr, y_val = train_test_split(train.drop('target',axis=1), train['target'], test_size=0.1, random_state=2022)

# print(X_tr.shape, X_val.shape, y_tr.shape, y_val.shape)

# from sklearn.metrics import f1_score

# from sklearn.ensemble import RandomForestClassifier

# model = RandomForestClassifier(random_state=2022)

# model.fit(X_tr, y_tr)

# pred = model.predict(X_val)

# print(pred[:10])

# print(f1_score(y_val, pred))

from sklearn.metrics import f1_score

from xgboost import XGBClassifier

xgb = XGBClassifier(random_state=2022)

xgb.fit(X_tr, y_tr)

pred = xgb.predict(X_val)

print(pred[:10])

아래는 에러 내용입니다.

프로세스가 시작되었습니다.(입력값을 직접 입력해 주세요)

> [23:06:53] WARNING: ../src/learner.cc:1095: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.

[0 1 0 0 0 1 0 1 1 1]

/usr/local/lib/python3.9/dist-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.

from pandas import MultiIndex, Int64Index

/usr/local/lib/python3.9/dist-packages/xgboost/sklearn.py:1146: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].

warnings.warn(label_encoder_deprecation_msg, UserWarning)

/usr/local/lib/python3.9/dist-packages/xgboost/data.py:208: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.

from pandas import MultiIndex, Int64Index

프로세스가 종료되었습니다.

퇴근후딴짓

지식공유자

안녕하세요 xgb 파라미터값을 수정하는 것 외에도 판다스 인덱스를 수정해야 할 것 같아요.
저는 lightgbm을 주로 사용하는데 xgb말고 이걸 추천할게요!!
(영상을 만들땐 lightgbm을 지원을 안 했어서 못넣었거든요)

lightGBM은
XGBoost와 함께 인기있는 부스팅계열 모델입니다.
학습과 예측 속도가 XGBoost에 비해 빠릅니다.

####### 분류 #######

import lightgbm as lgb

model = lgb.LGBMClassifier()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

####### 회귀 #######