{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Hyper Parameter 튜닝 - Random Forest\n", "\n", "\n", "참고: [하이퍼파라미터 튜닝, emseoyk.log](https://velog.io/@emseoyk/%ED%95%98%EC%9D%B4%ED%8D%BC%ED%8C%8C%EB%9D%BC%EB%AF%B8%ED%84%B0-%ED%8A%9C%EB%8B%9D)\n", "\n", "## 환경설정" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "\n", "from sklearn import preprocessing # 전처리\n", "\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.model_selection import cross_val_score\n", "from sklearn.metrics import accuracy_score, f1_score\n", "\n", "from sklearn.ensemble import RandomForestClassifier" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 데이터셋" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\statkclee\\anaconda3\\lib\\site-packages\\sklearn\\preprocessing\\_label.py:115: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n", " y = column_or_1d(y, warn=True)\n" ] } ], "source": [ "cancer_df = pd.read_csv('data/breast_cancer.csv')\n", "\n", "# list(cancer_df.columns)\n", "y = cancer_df[['diagnosis']]\n", "X = cancer_df.loc[:, 'radius_mean':'fractal_dimension_worst']\n", "\n", "le = preprocessing.LabelEncoder()\n", "y = le.fit_transform(y)\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=1)\n", "y_train = np.ravel(y_train, order='C') # KNN : A column-vector y was passed when a 1d array was expected" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 기계학습\n", "\n", "### 1. Hyper Parameters for Random Forest" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | 초기값 | \n", "
---|---|
bootstrap | \n", "True | \n", "
ccp_alpha | \n", "0.0 | \n", "
class_weight | \n", "None | \n", "
criterion | \n", "gini | \n", "
max_depth | \n", "None | \n", "
max_features | \n", "auto | \n", "
max_leaf_nodes | \n", "None | \n", "
max_samples | \n", "None | \n", "
min_impurity_decrease | \n", "0.0 | \n", "
min_samples_leaf | \n", "1 | \n", "
min_samples_split | \n", "2 | \n", "
min_weight_fraction_leaf | \n", "0.0 | \n", "
n_estimators | \n", "100 | \n", "
n_jobs | \n", "None | \n", "
oob_score | \n", "False | \n", "
random_state | \n", "777 | \n", "
verbose | \n", "0 | \n", "
warm_start | \n", "False | \n", "