# 부스팅 모형 (Boost)

- Weak Learner: 동전던지기 보다 조금 더 잘 예측하는 모형
- Boosting: Weak Learner를 앙상블로 결합시켜 강한 예측 모형을 개발하는 방법론

![](https://upload.wikimedia.org/wikipedia/commons/b/b5/Ensemble_Boosting.svg)

- 부스팅 모형 진화
  - Adaboost
  - Gradient Boosting - Decision Tree
  - Stochastic Gradient Boosting (SGB) - Random Forest
  - xgBoost - Optimization

## 환경설정

In [1]:
import pandas as pd
import numpy as np

from sklearn import preprocessing # 전처리

from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score, f1_score

from sklearn.ensemble import GradientBoostingClassifier

## 데이터셋

In [2]:
cancer_df = pd.read_csv('data/breast_cancer.csv')

# list(cancer_df.columns)
y = cancer_df[['diagnosis']]
X = cancer_df.loc[:, 'radius_mean':'fractal_dimension_worst']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=1)
y_train = np.ravel(y_train, order='C') # KNN : A column-vector y was passed when a 1d array was expected

## 기계학습 - Gradient Boosting

In [5]:
clf_sgb = GradientBoostingClassifier(n_estimators = 100,
                                    max_depth    = 2,
                                    subsample    = 0.8,
                                    max_features = 0.5,
                                    random_state = 777)

clf_sgb.fit(X_train, y_train)

GradientBoostingClassifier(max_depth=2, max_features=0.5, random_state=777,
                           subsample=0.8)

## 예측 성능

In [6]:
y_pred = clf_sgb.predict(X_test)

print('Train F1: {:.3f}'.format(f1_score(y_test, y_pred, pos_label = 'M')))

Train F1: 0.925
