에러메세지와 마주하기

Question

4번째 에러에서 xtrain,ytrain을 합친후 삭제를 하는데 각 데이터의 순서(인덱스 순서?)가 다를듯한데 합친후 삭제를 해도 상관없나요?

퇴근후딴짓 · Answer

안녕하세요:) 타겟을 합쳐서 삭제하는 것도 좋습니다. 영상에서는 index값으로 삭제하는 방식이에요 아래와 같이 작업할 수도 있을 것 같네요!! 화이팅입니다!! import pandas as pd from sklearn.datasets import load_wine from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier # 와인 데이터 로드 wine = load_wine() df = pd.DataFrame(wine.data, columns=wine.feature_names) df['target'] = wine.target # 훈련 데이터와 테스트 데이터 분리 X_train, X_test, y_train, _ = train_test_split(df.drop('target', axis=1), df['target'], test_size=0.2, random_state=2022) # 'proline' 컬럼에서 1500 이상인 데이터 찾기 outlier_indices = X_train[X_train['proline'] >= 1500].index print("1500이상인 데이터 수:", len(outlier_indices)) # 이상치가 있는 행 삭제 X_train = X_train.drop(outlier_indices) y_train = y_train.drop(outlier_indices) # 랜덤 포레스트 모델 훈련 model = RandomForestClassifier() model.fit(X_train, y_train)