결정 트리 과적합

Question

from sklearn.datasets import make_classification

import matplotlib.pyplot as plt

%matplotlib inline

plt.title("3 Class values with 2 Features Sample data creation")

# 2차원 시각화를 위해서 feature는 2개, 결정값 클래스는 3가지 유형의 classification 샘플 데이터 생성.

X_features, y_labels = make_classification(n_features=2, n_redundant=0, n_informative=2,

n_classes=3, n_clusters_per_class=1,random_state=0)

# plot 형태로 2개의 feature로 2차원 좌표 시각화, 각 클래스값은 다른 색깔로 표시됨.

plt.scatter(X_features[:, 0], X_features[:, 1], marker='o', c=y_labels, s=25, cmap='rainbow', edgecolor='k')

-------------------------------------

책 p.199에 나와있는 코드인데, 수업중에는 다루지 않아 질문 남깁니다.

plt.scatter(X_features[:, 0], X_features[:, 1], marker='o', c=y_labels, s=25, cmap='rainbow', edgecolor='k')이 부분에서

1) X_features[:,0]은 0에 대한 예측 확률, X_features[:, 1]은 1에 대한 예측 확률값 인가요?

앞서 배운 predict_proba() 수행시 반환되는 ndarray값과 혼동되어 질문 남깁니다.

답변 미리 감사드립니다.

Answer

안녕하십니까,

X_features, y_labels = make_classification(n_features=2, ,,,, n_classes=3) 하게 되면 2개의 feature와 3가지 유형 레이블값을 가진 데이터셋을 만들게 됩니다.

X_features[:,0]는 첫번째 feature로 이뤄진 데이터값들, X_features[:,2]는 두번째 feature 로 이뤄진 데이터 값들 입니다.

0에 대한 예측 확률, 1에 대한 예측 확률은 model의 predict_proba(테스트 데이터셋)를 호출할때 반환값이 가지는 것입니다.

감사합니다.