
PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. https://pycaret.org/

pip install pycaret
from pycaret.utils import enable_colab
enable_colab()
from pycaret.datasets import get_data
dataset = get_data('iris')| sepal_length | sepal_width | petal_length | petal_width | species | |
|---|---|---|---|---|---|
| 0 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
| 1 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
| 2 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
| 3 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
| 4 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |

data = dataset.sample(frac=0.9, random_state=786)
data_unseen = dataset.drop(data.index)
data.reset_index(drop=True, inplace=True)
data_unseen.reset_index(drop=True, inplace=True)
print('Data for Modeling: ' + str(data.shape))
print('Unseen Data For Predictions: ' + str(data_unseen.shape))setup() を使って前処理をします。
from pycaret.classification import *
exp_mclf101 = setup(data = data, target = 'species', session_id=123)| Description | Value | |
|---|---|---|
| 0 | session_id | 123 |
| 1 | Target | species |
| 2 | Target Type | Multiclass |
| 3 | Label Encoded | Iris-setosa: 0, Iris-versicolor: 1, Iris-virginica: 2 |
| 4 | Original Data | (135, 5) |
| 5 | Missing Values | False |
| 6 | Numeric Features | 4 |
| 7 | Categorical Features | 0 |
| 8 | Ordinal Features | False |
| 9 | High Cardinality Features | False |
| 10 | High Cardinality Method | None |
| 11 | Transformed Train Set | (94, 4) |
| 12 | Transformed Test Set | (41, 4) |
| 13 | Shuffle Train-Test | True |
| 14 | Stratify Train-Test | False |
| 15 | Fold Generator | StratifiedKFold |
| 16 | Fold Number | 10 |
| 17 | CPU Jobs | -1 |
| 18 | Use GPU | False |
| 19 | Log Experiment | False |
| 20 | Experiment Name | clf-default-name |
| 21 | USI | caab |
| 22 | Imputation Type | simple |
| 23 | Iterative Imputation Iteration | None |
| 24 | Numeric Imputer | mean |
| 25 | Iterative Imputation Numeric Model | None |
| 26 | Categorical Imputer | constant |
| 27 | Iterative Imputation Categorical Model | None |
| 28 | Unknown Categoricals Handling | least_frequent |
| 29 | Normalize | False |
| 30 | Normalize Method | None |
| 31 | Transformation | False |
| 32 | Transformation Method | None |
| 33 | PCA | False |
| 34 | PCA Method | None |
| 35 | PCA Components | None |
| 36 | Ignore Low Variance | False |
| 37 | Combine Rare Levels | False |
| 38 | Rare Level Threshold | None |
| 39 | Numeric Binning | False |
| 40 | Remove Outliers | False |
| 41 | Outliers Threshold | None |
| 42 | Remove Multicollinearity | False |
| 43 | Multicollinearity Threshold | None |
| 44 | Remove Perfect Collinearity | True |
| 45 | Clustering | False |
| 46 | Clustering Iteration | None |
| 47 | Polynomial Features | False |
| 48 | Polynomial Degree | None |
| 49 | Trignometry Features | False |
| 50 | Polynomial Threshold | None |
| 51 | Group Features | False |
| 52 | Feature Selection | False |
| 53 | Feature Selection Method | classic |
| 54 | Features Selection Threshold | None |
| 55 | Feature Interaction | False |
| 56 | Feature Ratio | False |
| 57 | Interaction Threshold | None |
| 58 | Fix Imbalance | False |
| 59 | Fix Imbalance Method | SMOTE |

best = compare_models()| Model | Accuracy | AUC | Recall | Prec. | F1 | Kappa | MCC | TT (Sec) | |
|---|---|---|---|---|---|---|---|---|---|
| lda | Linear Discriminant Analysis | 0.9678 | 0.9963 | 0.9667 | 0.9758 | 0.9669 | 0.9515 | 0.9560 | 0.0100 |
| nb | Naive Bayes | 0.9578 | 0.9897 | 0.9556 | 0.9713 | 0.9546 | 0.9364 | 0.9442 | 0.0090 |
| qda | Quadratic Discriminant Analysis | 0.9567 | 1.0000 | 0.9556 | 0.9708 | 0.9533 | 0.9348 | 0.9433 | 0.0100 |
| lr | Logistic Regression | 0.9478 | 0.9963 | 0.9444 | 0.9638 | 0.9444 | 0.9212 | 0.9304 | 0.2640 |
| knn | K Neighbors Classifier | 0.9467 | 0.9926 | 0.9444 | 0.9630 | 0.9432 | 0.9197 | 0.9291 | 0.1190 |
| lightgbm | Light Gradient Boosting Machine | 0.9456 | 0.9852 | 0.9444 | 0.9625 | 0.9419 | 0.9182 | 0.9282 | 0.0290 |
| ada | Ada Boost Classifier | 0.9256 | 0.9809 | 0.9222 | 0.9505 | 0.9194 | 0.8879 | 0.9026 | 0.0620 |
| gbc | Gradient Boosting Classifier | 0.9256 | 0.9815 | 0.9222 | 0.9505 | 0.9194 | 0.8879 | 0.9026 | 0.1270 |
| et | Extra Trees Classifier | 0.9256 | 0.9926 | 0.9222 | 0.9505 | 0.9194 | 0.8879 | 0.9026 | 0.4650 |
| dt | Decision Tree Classifier | 0.9144 | 0.9369 | 0.9111 | 0.9366 | 0.9086 | 0.8712 | 0.8843 | 0.0100 |
| rf | Random Forest Classifier | 0.9144 | 0.9852 | 0.9111 | 0.9305 | 0.9101 | 0.8712 | 0.8813 | 0.5310 |
| svm | SVM - Linear Kernel | 0.8522 | 0.0000 | 0.8361 | 0.8261 | 0.8197 | 0.7755 | 0.8099 | 0.0640 |
| ridge | Ridge Classifier | 0.8300 | 0.0000 | 0.8222 | 0.8544 | 0.8178 | 0.7433 | 0.7648 | 0.0080 |
| dummy | Dummy Classifier | 0.3822 | 0.5000 | 0.3333 | 0.1480 | 0.2128 | 0.0000 | 0.0000 | 0.0070 |
create_model() を使ってモデルを作っていきます。
ここでは、Logistic Regression ('lr') でモデルを作成してきます。
lr = create_model('lr')| Accuracy | AUC | Recall | Prec. | F1 | Kappa | MCC | |
|---|---|---|---|---|---|---|---|
| Fold | |||||||
| 0 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| 1 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| 2 | 0.9000 | 1.0000 | 0.8889 | 0.9250 | 0.8971 | 0.8485 | 0.8616 |
| 3 | 0.8000 | 1.0000 | 0.7778 | 0.8800 | 0.7750 | 0.6970 | 0.7435 |
| 4 | 0.8889 | 0.9630 | 0.8889 | 0.9167 | 0.8857 | 0.8333 | 0.8492 |
| 5 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| 6 | 0.8889 | 1.0000 | 0.8889 | 0.9167 | 0.8857 | 0.8333 | 0.8492 |
| 7 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| 8 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| 9 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| Mean | 0.9478 | 0.9963 | 0.9444 | 0.9638 | 0.9444 | 0.9212 | 0.9304 |
| Std | 0.0689 | 0.0111 | 0.0745 | 0.0456 | 0.0751 | 0.1041 | 0.0905 |

#trained model object is stored in the variable 'lr'.
print(lr)tune_model() を使い、ハイパーパラメータチューニングを実施します。Random Grid Search とのことです。また、 Accuracy を optimize するとのことです。
tuned_lr = tune_model(lr)| Accuracy | AUC | Recall | Prec. | F1 | Kappa | MCC | |
|---|---|---|---|---|---|---|---|
| Fold | |||||||
| 0 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| 1 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| 2 | 0.9000 | 1.0000 | 0.8889 | 0.9250 | 0.8971 | 0.8485 | 0.8616 |
| 3 | 0.8000 | 1.0000 | 0.7778 | 0.8800 | 0.7750 | 0.6970 | 0.7435 |
| 4 | 0.8889 | 1.0000 | 0.8889 | 0.9167 | 0.8857 | 0.8333 | 0.8492 |
| 5 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| 6 | 0.8889 | 1.0000 | 0.8889 | 0.9167 | 0.8857 | 0.8333 | 0.8492 |
| 7 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| 8 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| 9 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
| Mean | 0.9478 | 1.0000 | 0.9444 | 0.9638 | 0.9444 | 0.9212 | 0.9304 |
| Std | 0.0689 | 0.0000 | 0.0745 | 0.0456 | 0.0751 | 0.1041 | 0.0905 |

#tuned model object is stored in the variable 'tuned_dt'.
print(tuned_lr)plot_model() を使ってプロットしていきます。
plot_model(tuned_lr, plot = 'confusion_matrix')
plot_model(tuned_lr, plot = 'class_report')
plot_model(tuned_lr, plot='boundary')
plot_model(tuned_lr, plot = 'error')evaluate_model() を使うとユーザーインターフェースでプロットを見ることができます。
evaluate_model(tuned_lr)| Parameters | |
|---|---|
| C | 2.833 |
| class_weight | balanced |
| dual | False |
| fit_intercept | True |
| intercept_scaling | 1 |
| l1_ratio | None |
| max_iter | 1000 |
| multi_class | auto |
| n_jobs | None |
| penalty | l2 |
| random_state | 123 |
| solver | lbfgs |
| tol | 0.0001 |
| verbose | 0 |
| warm_start | False |
こちらの画像のように Plot Type を指定でき、この画像の例では Decision Tree を可視化したものです。
predict_model(tuned_lr);| Model | Accuracy | AUC | Recall | Prec. | F1 | Kappa | MCC | |
|---|---|---|---|---|---|---|---|---|
| 0 | Logistic Regression | 0.9512 | 1.0000 | 0.9556 | 0.9566 | 0.9509 | 0.9253 | 0.9287 |
finalize_model() を使いモデルを仕上げていきます。
final_lr = finalize_model(tuned_lr)
print(final_lr)
unseen_predictions = predict_model(final_knn, data=data_unseen)
unseen_predictions.head()| Model | Accuracy | AUC | Recall | Prec. | F1 | Kappa | MCC | |
|---|---|---|---|---|---|---|---|---|
| 0 | K Neighbors Classifier | 0.0000 | 1.0000 | 0 | 0 | 0 | 0 | 0 |
| sepal_length | sepal_width | petal_length | petal_width | species | Label | Score | |
|---|---|---|---|---|---|---|---|
| 0 | 5.4 | 3.9 | 1.7 | 0.4 | Iris-setosa | Iris-setosa | 1.0000 |
| 1 | 5.4 | 3.4 | 1.7 | 0.2 | Iris-setosa | Iris-setosa | 1.0000 |
| 2 | 5.1 | 3.3 | 1.7 | 0.5 | Iris-setosa | Iris-setosa | 1.0000 |
| 3 | 4.8 | 3.1 | 1.6 | 0.2 | Iris-setosa | Iris-setosa | 1.0000 |
| 4 | 6.9 | 3.1 | 4.9 | 1.5 | Iris-versicolor | Iris-versicolor | 0.5455 |

unseen_predictions = predict_model(final_lr, data=data_unseen)
unseen_predictions.head()| Model | Accuracy | AUC | Recall | Prec. | F1 | Kappa | MCC | |
|---|---|---|---|---|---|---|---|---|
| 0 | Logistic Regression | 0.0000 | 1.0000 | 0 | 0 | 0 | 0 | 0 |
| sepal_length | sepal_width | petal_length | petal_width | species | Label | Score | |
|---|---|---|---|---|---|---|---|
| 0 | 5.4 | 3.9 | 1.7 | 0.4 | Iris-setosa | Iris-setosa | 0.9831 |
| 1 | 5.4 | 3.4 | 1.7 | 0.2 | Iris-setosa | Iris-setosa | 0.9644 |
| 2 | 5.1 | 3.3 | 1.7 | 0.5 | Iris-setosa | Iris-setosa | 0.9699 |
| 3 | 4.8 | 3.1 | 1.6 | 0.2 | Iris-setosa | Iris-setosa | 0.9781 |
| 4 | 6.9 | 3.1 | 4.9 | 1.5 | Iris-versicolor | Iris-versicolor | 0.8227 |
Label と Score が算出されているのが、わかるかと思います。save_model() を使い、 load は load_model() を使います。
save_model(final_lr,'Final lr Model')
saved_final_lr = load_model('Final lr Model')
new_prediction = predict_model(saved_final_lr, data=data_unseen)
new_prediction.head()| Model | Accuracy | AUC | Recall | Prec. | F1 | Kappa | MCC | |
|---|---|---|---|---|---|---|---|---|
| 0 | Logistic Regression | 0.0000 | 1.0000 | 0 | 0 | 0 | 0 | 0 |
| sepal_length | sepal_width | petal_length | petal_width | species | Label | Score | |
|---|---|---|---|---|---|---|---|
| 0 | 5.4 | 3.9 | 1.7 | 0.4 | Iris-setosa | Iris-setosa | 0.9831 |
| 1 | 5.4 | 3.4 | 1.7 | 0.2 | Iris-setosa | Iris-setosa | 0.9644 |
| 2 | 5.1 | 3.3 | 1.7 | 0.5 | Iris-setosa | Iris-setosa | 0.9699 |
| 3 | 4.8 | 3.1 | 1.6 | 0.2 | Iris-setosa | Iris-setosa | 0.9781 |
| 4 | 6.9 | 3.1 | 4.9 | 1.5 | Iris-versicolor | Iris-versicolor | 0.8227 |