교수님, 감사드립니다. 혹시 csv화일 다운 받을 수 있을지요?
0
훌륭한 강의 감사드립니다.
학습중에 질문이 있어 멜로 올렸습니다. 답변을 주시면 정말 감사드리겠습니다.
저는 Vidual Studio Code를 활용하여 실습중입니다.
회귀실습화일의 마지막 저장과 확인에서 에러가 발생되는 원인과 해결안을
부탁드립니다.
code 및 error messase>
# 모형평가
from sklearn.metrics import mean_squared_error
print("\n")
print('선형회귀 MSE', mean_squared_error(y_valid, pred1))
print('랜덤포레스트 MSE', mean_squared_error(y_valid, pred2))
print('스태킹 MSE', mean_squared_error(y_valid, pred3))
print("\n")
print('선형회귀 RMSE', np.sqrt(mean_squared_error(y_valid, pred1)))
print('랜덤포레스트 RMSE', np.sqrt(mean_squared_error(y_valid, pred2)))
print('스태킹 RMSE', np.sqrt(mean_squared_error(y_valid, pred3)))
'''
형회귀 MSE 12.966610337469962
랜덤포레스트 MSE 9.82210290624999
스태킹 MSE 11.662706015625002
선형회귀 RMSE 3.6009179853851103
랜덤포레스트 RMSE 3.134023437412361
스태킹 RMSE 3.415070426158881
'''
# 10. 하이퍼파라미터 튜닝
from sklearn.model_selection import GridSearchCV
parameters = {'n_estimators':[50, 100], 'max_depth':[4, 6]}
model4 = RandomForestRegressor()
clf = GridSearchCV(estimator = model4, param_grid = parameters, cv = 3)
clf.fit(X_train, y_train)
print('최적의파라미터', clf.best_params_)
'''
최적의파라미터 {'max_depth': 6, 'n_estimators': 100}
'''
# print(X_train.isna().sum())
# 11. 파일저장
result = pd.DataFrame(model2.predict(X_test))
result = result.iloc[:,1]
print(result)
pd.DataFrame({'id':X_test.index, 'result': result}).to_csv('D:/Bigdata_engineer/Bigdata_engineer(python)/bigdata_freelec/0000307.csv', index=False)
check = pd.read_csv('D:/Bigdata_engineer/Bigdata_engineer(python)/bigdata_freelec/0000307.csv')
print(check.head())
# # 11. 파일저장
result = pd.DataFrame(model3.predict(X_test))
result = result.iloc[:,1]
pd.DataFrame({'id':X_test.index, 'result': result}).to_csv('D:/Bigdata_engineer/Bigdata_engineer(python)/bigdata_freelec/0000304.csv', index=False)
# # 확인
check = pd.read_csv('D:/Bigdata_engineer/Bigdata_engineer(python)/bigdata_freelec/0000304.csv')
print(check.head())
(254, 11)
(64, 11)
선형회귀 MSE 12.966610337469962
랜덤포레스트 MSE 9.294580265624989
스태킹 MSE 11.197939625000004
선형회귀 RMSE 3.6009179853851103
랜덤포레스트 RMSE 3.048701406439304
스태킹 RMSE 3.346332264584616
최적의파라미터 {'max_depth': 6, 'n_estimators': 100}
Traceback (most recent call last):
File "c:/Users/user/Desktop/pythonworkstation/big_data_engineer(Infleon_free)/6_ml_regression.py", line 260, in <module>
result = pd.DataFrame(model2.predict(X_test))
File "C:\Users\user\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 784, in predict
X = self._validate_X_predict(X)
File "C:\Users\user\anaconda3\lib\site-packages\sklearn\ensemble\_forest.py", line 422, in _validate_X_predict
return self.estimators_[0]._validate_X_predict(X, check_input=True)
File "C:\Users\user\anaconda3\lib\site-packages\sklearn\tree\_classes.py", line 402, in _validate_X_predict
X = self._validate_data(X, dtype=DTYPE, accept_sparse="csr",
File "C:\Users\user\anaconda3\lib\site-packages\sklearn\base.py", line 421, in _validate_data
X = check_array(X, **check_params)
File "C:\Users\user\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\user\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 663, in check_array
_assert_all_finite(array,
File "C:\Users\user\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 103, in _assert_all_finite
raise ValueError(
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
(base) C:\Users\user\Desktop\pythonworkstation>
하지만, 분류문제는 정상적으로 작동했습니다.
# 7. 모형학습, 앙상블
from sklearn.linear_model import LogisticRegression
model1 = LogisticRegression()
model1.fit(X_train, y_train)
pred1 = pd.DataFrame(model1.predict_proba(X_valid))
from sklearn.ensemble import RandomForestClassifier
model2 = RandomForestClassifier()
model2.fit(X_train, y_train)
pred2 = pd.DataFrame(model2.predict_proba(X_valid))
from sklearn.ensemble import VotingClassifier
model3 = VotingClassifier(estimators = [('logistic', model1), ('radom', model2)], voting='soft')
model3.fit(X_train, y_train)
pred3 = pd.DataFrame(model3.predict_proba(X_valid))
print(pred3)
'''
0 1
0 0.123002 0.876998
1 0.712284 0.287716
2 0.164530 0.835470
3 0.965229 0.034771
4 0.872625 0.127375
.. ... ...
138 0.908079 0.091921
139 0.914812 0.085188
140 0.919630 0.080370
141 0.027397 0.972603
142 0.535595 0.464405
'''
# 9. 모형평가
from sklearn.metrics import roc_auc_score
print('로지스틱', roc_auc_score(y_valid, pred1.iloc[:, 1]))
print('랜덤포레스트', roc_auc_score(y_valid, pred2.iloc[:, 1]))
print('Voting', roc_auc_score(y_valid, pred3.iloc[:, 1]))
'''
로지스틱 0.8560950413223141
랜덤포레스트 0.8599173553719007
Voting 0.8662190082644629
'''
# 10. 하이퍼파라미터 튜닝
from sklearn.model_selection import GridSearchCV
parameters = {'n_estimators':[50, 100], 'max_depth':[4, 6]}
model5 = RandomForestClassifier()
clf = GridSearchCV(estimator = model5, param_grid = parameters, cv = 3)
clf.fit(X_train, y_train)
print('최적의 파라미터', clf.best_params_)
'''
최적의 파라미터 {'max_depth': 4, 'n_estimators': 50}
'''
# 11. 파일저장
result = pd.DataFrame(model3.predict_proba(X_test))
result = result.iloc[:,1]
pd.DataFrame({'id':X_test.index, 'result': result}).to_csv('D:/Bigdata_engineer/Bigdata_engineer(python)/bigdata_freelec/0000304.csv', index=False)
# 확인
check = pd.read_csv('D:/Bigdata_engineer/Bigdata_engineer(python)/bigdata_freelec/0000304.csv')
print(check.head())
'''
최적의 파라미터 {'max_depth': 4, 'n_estimators': 50}
id result
0 565 0.185380
1 160 0.088969
2 553 0.074872
3 860 0.081200
4 241 0.739738
'''