ValueError: Number of labels=712 does not match number of samples=713

Question

안녕하세요. 타이타닉 부분 공부하고 있는데요. 교차검증 관련해서 이런 에러가 나네요. 코드는 그대로 작성한거 같은데요. from sklearn.model_selection import KFold def exec_kfold(clf, folds=5): kfold = KFold(n_splits=folds) scores = [] for iter_count, (train_index, test_index) in enumerate(kfold.split(X_titanic_df)): X_train, X_test = X_titanic_df.values[train_index], X_titanic_df.values[test_index] Y_train, y_test = y_titanic_df.values[train_index], y_titanic_df.values[test_index] clf.fit(X_train, y_train) predictions = clf.predict(X_test) accuracy = accuracy_score(y_test, predictions) scores.append(accuracy) print("교차 검증 {0} 정확도: {1:.4f}".format(iter_count, accuracy)) mean_score = np.mean(scores) print("평균 정확도: {0:.4f}".format(mean_score)) exec_kfold(dt_clf, folds=5) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in 17 mean_score = np . mean ( scores ) 18 print ( "평균 정확도: {0:.4f}" . format ( mean_score ) ) ---> 19 exec_kfold ( dt_clf , folds = 5 ) in exec_kfold (clf, folds) 9 Y_train , y_test = y_titanic_df . values [ train_index ] , y_titanic_df . values [ test_index ] 10 ---> 11 clf . fit ( X_train , y_train ) 12 predictions = clf . predict ( X_test ) 13 accuracy = accuracy_score ( y_test , predictions ) ~/opt/anaconda3/lib/python3.7/site-packages/sklearn/tree/_classes.py in fit (self, X, y, sample_weight, check_input, X_idx_sorted) 875 sample_weight = sample_weight , 876 check_input = check_input , --> 877 X_idx_sorted=X_idx_sorted) 878 return self 879 ~/opt/anaconda3/lib/python3.7/site-packages/sklearn/tree/_classes.py in fit (self, X, y, sample_weight, check_input, X_idx_sorted) 263 if len ( y ) != n_samples : 264 raise ValueError("Number of labels=%d does not match " --> 265 "number of samples=%d" % (len(y), n_samples)) 266 if not 0 <= self . min_weight_fraction_leaf <= 0.5 : 267 raise ValueError ( "min_weight_fraction_leaf must in [0, 0.5]" ) ValueError : Number of labels=712 does not match number of samples=713

권 철민 · Answer

안녕하십니까, 오류 내용으로만 봐서는 학습용 feature(sample) 데이터는 713개 인데, target(label) 데이터는 712개로 짝이 안맞는 다는 내용입니다. 뭔가 중간에 데이터 가공이 된 상태로 메모리에 올라간것 같습니다. 커널을 재 기동 한 후 타이타닉 데이터를 재 로드 한 후에 수행을 다시 해보시기 바랍니다. 감사합니다.