피처별 회귀계수 시각화

Question

강의 회귀 실습 1: 자전거 대여(공유) 수요 예측 -02에서 19분 52초 경에 나오는 선형 회귀의 피처별 회귀계수 시각화 부분에서 저 회귀계수 값들이 다르게 나올 수가 있는지, 질문드립니다. github의 주피터노트북 코드 다운로드 받아서 그대로 시행했는데 LinearRegression/Lasso/Ridge 각 회귀에 대한 RMSLE, RMSE, MAE까지는 값이 정확히 동일하게 나오는데 회귀 계수의 값을 보려고 lr_reg.coef_ 부분에서 결과가 다르게 나옵니다. 상식적으로 회귀 모형에서 이런 결과가 나올 수가 없다고 생각되는데 무슨 이유인지 모르겠어서 질문드립니다! 감사합니다

Jaehyun Lee · Answer

오.. 교수님 코드를 그대로 실행해봤는데
모델 결과가 약간 다르게 나오긴하네요

### RandomForestRegressor ### RMSLE: 0.355, RMSE: 50.466, MAE: 31.198 ### GradientBoostingRegressor ### RMSLE: 0.330, RMSE: 53.342, MAE: 32.750 ### XGBRegressor ### RMSLE: 0.339, RMSE: 51.475, MAE: 31.357 [LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000588 seconds. You can set force_row_wise=true to remove the overhead. And if memory is not enough, you can set force_col_wise=true. [LightGBM] [Info] Total Bins 348 [LightGBM] [Info] Number of data points in the train set: 7620, number of used features: 72 [LightGBM] [Info] Start training from score 4.582043 ### LGBMRegressor ### RMSLE: 0.319, RMSE: 47.215, MAE: 29.029

자스민 · Answer

안녕하세요. 너무 좋은 강의 잘 듣고 있습니다. Bike Sharing Demend 예제소스 에러 질문이 있어서요.. [ 로그 변환, 피처 인코딩, 모델 학습/예측/평가 ] from sklearn.model_selection import train_test_split , GridSearchCV from sklearn.linear_model import LinearRegression , Ridge , Lasso y_target = bike_df['count'] X_features = bike_df.drop(['count'],axis=1,inplace=False) X_train, X_test, y_train, y_test = train_test_split(X_features, y_target, test_size=0.3, random_state=0) lr_reg = LinearRegression() lr_reg.fit(X_train, y_train) pred = lr_reg.predict(X_test) evaluate_regr(y_test ,pred) 에러 --------------------------------------------------------------------------- DTypePromotionError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_19124\3974685920.py in 11 lr_reg = LinearRegression() 12 ---> 13 lr_reg.fit(X_train, y_train) 14 pred = lr_reg.predict(X_test) 15 D:\dev03\anaconda\lib\site-packages\sklearn\linear_model\_base.py in fit(self, X, y, sample_weight) 660 accept_sparse = False if self.positive else ["csr", "csc", "coo"] 661 --> 662 X, y = self._validate_data( 663 X, y, accept_sparse=accept_sparse, y_numeric=True, multi_output=True 664 ) D:\dev03\anaconda\lib\site-packages\sklearn\base.py in _validate_data(self, X, y, reset, validate_separately, **check_params) 579 y = check_array(y, **check_y_params) 580 else: --> 581 X, y = check_X_y(X, y, **check_params) 582 out = X, y 583 D:\dev03\anaconda\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator) 962 raise ValueError("y cannot be None") 963 --> 964 X = check_array( 965 X, 966 accept_sparse=accept_sparse, D:\dev03\anaconda\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator) 663 664 if all(isinstance(dtype, np.dtype) for dtype in dtypes_orig): --> 665 dtype_orig = np.result_type(*dtypes_orig) 666 667 if dtype_numeric: DTypePromotionError: The DType could not be promoted by . This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (, , , , , , , , , , , , , , )

권 철민 · Answer

안녕하십니까, 실습 코드와 동일하게 코드를 수행하는 데 강의 결과와 다른 피처 중요도 값이 나오는 건지요? 그렇다면 현재 사용하시는 사이킷런 버전을 알 수 있을까요? 아래와 같이 수행해 주시면 됩니다. import sklearn print(sklearn.__version__) 감사합니다.