# 앙상블 모형 (배깅, Bagging)

- Voting 과 Bagging 분류모형 비교
  - Voting: 동일한 훈련 데이터, 다른 알고리즘
  - Bagging: 다른 훈련 데이터, 한가지 알고리즘

![](https://upload.wikimedia.org/wikipedia/commons/c/c8/Ensemble_Bagging.svg)

## 환경설정

In [1]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score, f1_score

from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier

## 데이터셋

In [7]:
cancer_df = pd.read_csv('data/breast_cancer.csv')

# list(cancer_df.columns)
y = cancer_df[['diagnosis']]
X = cancer_df.loc[:, 'radius_mean':'fractal_dimension_worst']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=1)
y_train = np.ravel(y_train,order='C') # KNN : A column-vector y was passed when a 1d array was expected

## 기계학습 - CV


In [11]:
clf_base = DecisionTreeClassifier(max_depth=4, min_samples_leaf = 0.1, random_state = 777)

clf_bagging = BaggingClassifier(base_estimator = clf_base, n_estimators = 300, oob_score=True, n_jobs=-1)

clf_bagging.fit(X_train, y_train)

BaggingClassifier(base_estimator=DecisionTreeClassifier(max_depth=4,
                                                        min_samples_leaf=0.1,
                                                        random_state=777),
                  n_estimators=300, n_jobs=-1, oob_score=True)

## 예측 성능

In [12]:
y_pred = clf_bagging.predict(X_test)
print('Bagging Classifier: {:.3f}'.format(f1_score(y_test, y_pred,  pos_label="M")))

Bagging Classifier: 0.923
