예시문제 작업형2 test 데이터 예측시 발생하는 오류

Question

안녕하세요! 복습하는 도중에 이런 에러가 발생되어 질문드립니다 ㅠㅠ import pandas as pd X_train = pd.read_csv('X_train.csv',encoding="euc-kr") y_train = pd.read_csv('y_train.csv') X_test = pd.read_csv('X_test.csv',encoding="euc-kr") print(X_train.shape,y_train.shape) # X_train.head() # y_train.head() # X_train.info() #X_train 환불금액 결측치, 오브젝트 두개 # y_train.info() X_train = X_train.fillna(0) X_train.isnull().sum() X_train = X_train.drop(['cust_id'],axis=1) cust_id = X_test.pop('cust_id') #라벨인코딩 from sklearn.preprocessing import LabelEncoder cols = X_train.select_dtypes( include = 'object').columns cols for col in cols : le = LabelEncoder() X_train[col] = le.fit_transform(X_train[col]) X_test[col] = le.transform(X_test[col]) X_train.head() #검증데이터 분리 from sklearn.model_selection import train_test_split X_tr, X_val, y_tr, y_val = train_test_split(X_train, y_train['gender'], test_size = 0.2, random_state = 2022) X_tr.shape, X_val.shape, y_tr.shape, y_val.shape #모델링 - 랜덤포레스트 from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import roc_auc_score model = RandomForestClassifier(random_state=2022) model.fit(X_tr,y_tr) pred = model.predict_proba(X_val) roc_auc_score(y_val,pred[:,1]) pred = model.predict_proba(X_test) <------------------이 과정에서 발생되는 오류입니다 pred ValueError: Input X contains NaN. RandomForestClassifier does not accept missing values encoded as NaN natively. 라는 에러가 발생합니다 ㅠㅠ 코드를 검토해봐도 이상은 없는거같은데.. 뭐가문제일까요? ㅠㅠ

ji_nhee · Answer

아!!! X_test 데이터에 결측치를 제거안해서 그런거네요 ㅠㅠ

퇴근후딴짓 · Answer

화이팅입니다:)