해결된 질문
작성
·
45
0
마지막 강의 영상 <정리>부분에서
데이터 불러오기
X_train = pd.read_csv("https://raw.githubusercontent.com/lovedlim/inf/main/p2/data_atype/X_train.csv")
y_train = pd.read_csv("https://raw.githubusercontent.com/lovedlim/inf/main/p2/data_atype/y_train.csv")
X_test = pd.read_csv("https://raw.githubusercontent.com/lovedlim/inf/main/p2/data_atype/X_test.csv")
데이터 분리
n_train = X_train.select_dtypes(exclude='object').copy()
n_test = X_test.select_dtypes(exclude='object').copy()
c_train = X_train.select_dtypes(include='object').copy()
c_test = X_test.select_dtypes(include='object').copy()
수치형 민맥스 스케일
cols = ['age', 'fnlwgt', 'education.num', 'capital.gain', 'capital.loss', 'hours.per.week']
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
n_train[cols] = scaler.fit_transform(n_train[cols])
n_test[cols] = scaler.transform(n_test[cols])
라벨인코딩
cols = ['workclass', 'education', 'marital.status', 'occupation', 'relationship', 'race', 'sex', 'native.country']
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
for col in cols:
le = LabelEncoder()
c_train[col] = le.fit_transform(c_train[col])
c_test[col] = le.transform(c_test[col])
이 부분에서 이러한 에러가 납니다..
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\anaconda3\lib\site-packages\sklearn\preprocessing\_label.py in _encode(values, uniques, encode, check_unknown)
112 try:
--> 113 res = _encode_python(values, uniques, encode)
114 except TypeError:
~\anaconda3\lib\site-packages\sklearn\preprocessing\_label.py in _encode_python(values, uniques, encode)
60 if uniques is None:
---> 61 uniques = sorted(set(values))
62 uniques = np.array(uniques, dtype=values.dtype)
TypeError: '<' not supported between instances of 'str' and 'float'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-95-295cc9604042> in <module>
7 for col in cols:
8 le = LabelEncoder()
----> 9 c_train[col] = le.fit_transform(c_train[col])
10 c_test[col] = le.transform(c_test[col])
~\anaconda3\lib\site-packages\sklearn\preprocessing\_label.py in fit_transform(self, y)
254 """
255 y = column_or_1d(y, warn=True)
--> 256 self.classes_, y = _encode(y, encode=True)
257 return y
258
~\anaconda3\lib\site-packages\sklearn\preprocessing\_label.py in _encode(values, uniques, encode, check_unknown)
115 types = sorted(t.__qualname__
116 for t in set(type(v) for v in values))
--> 117 raise TypeError("Encoders require their input to be uniformly "
118 f"strings or numbers. Got {types}")
119 return res
TypeError: Encoders require their input to be uniformly strings or numbers. Got ['float', 'str']
답변 1
0
이전시간 작업 전처리가 되지 않은 상태여서 인것 같습니다.
예를 들면, 결측치 처리가 되지 않은 것 같아요
제가 공유한 노트북 보면 "데이터 전처리 (이전시간 작업)"
이 있어요 이부분 실행후 작업해주세요 🙂