39か月前公開・39か月前更新・0 pv・5 min read

数行のコードで機械学習ができる PyCaret のチュートリアルを試してみた多クラス分類

https://cdn.magicode.io/media/notebox/f63a8f90-91e2-4370-9655-7a2860368830.jpeg

PyCaret とは

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. https://pycaret.org/

と公式に記載あるように、わずかなコード量で実装できる Python の機械学習のライブラリです。
いくつかの主要な手法を比較してくれるなど本当に便利でびっくりです。

今回は、 PyCaret の多クラス分類のチュートリアルに沿ってiris (アヤメ)の品種分類を試してみたいと思います。

多クラス分類のチュートリアル

PyCaret のインストール

pip install pycaret

Collecting pycaret

Downloading pycaret-2.3.10-py3-none-any.whl (320 kB) [?25l |█ | 10 kB 14.9 MB/s eta 0:00:01 |██ | 20 kB 16.9 MB/s eta 0:00:01 |███ | 30 kB 16.7 MB/s eta 0:00:01 |████ | 40 kB 11.1 MB/s eta 0:00:01 |█████▏ | 51 kB 6.6 MB/s eta 0:00:01 |██████▏ | 61 kB 7.7 MB/s eta 0:00:01 |███████▏ | 71 kB 7.7 MB/s eta 0:00:01 |████████▏ | 81 kB 7.2 MB/s eta 0:00:01 |█████████▏ | 92 kB 7.9 MB/s eta 0:00:01 |██████████▎ | 102 kB 8.2 MB/s eta 0:00:01 |███████████▎ | 112 kB 8.2 MB/s eta 0:00:01 |████████████▎ | 122 kB 8.2 MB/s eta 0:00:01 |█████████████▎ | 133 kB 8.2 MB/s eta 0:00:01 |██████████████▎ | 143 kB 8.2 MB/s eta 0:00:01 |███████████████▍ | 153 kB 8.2 MB/s eta 0:00:01 |████████████████▍ | 163 kB 8.2 MB/s eta 0:00:01 |█████████████████▍ | 174 kB 8.2 MB/s eta 0:00:01 |██████████████████▍ | 184 kB 8.2 MB/s eta 0:00:01 |███████████████████▍ | 194 kB 8.2 MB/s eta 0:00:01 |████████████████████▌ | 204 kB 8.2 MB/s eta 0:00:01 |█████████████████████▌ | 215 kB 8.2 MB/s eta 0:00:01 |██████████████████████▌ | 225 kB 8.2 MB/s eta 0:00:01 |███████████████████████▌ | 235 kB 8.2 MB/s eta 0:00:01 |████████████████████████▋ | 245 kB 8.2 MB/s eta 0:00:01 |█████████████████████████▋ | 256 kB 8.2 MB/s eta 0:00:01 |██████████████████████████▋ | 266 kB 8.2 MB/s eta 0:00:01 |███████████████████████████▋ | 276 kB 8.2 MB/s eta 0:00:01 |████████████████████████████▋ | 286 kB 8.2 MB/s eta 0:00:01 |█████████████████████████████▊ | 296 kB 8.2 MB/s eta 0:00:01 |██████████████████████████████▊ | 307 kB 8.2 MB/s eta 0:00:01 |███████████████████████████████▊| 317 kB 8.2 MB/s eta 0:00:01

|████████████████████████████████| 320 kB 8.2 MB/s [?25h

Collecting pyyaml<6.0.0

Downloading PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636 kB) [?25l |▌ | 10 kB 30.0 MB/s eta 0:00:01 |█ | 20 kB 35.6 MB/s eta 0:00:01 |█▌ | 30 kB 43.7 MB/s eta 0:00:01 |██ | 40 kB 49.1 MB/s eta 0:00:01 |██▋ | 51 kB 52.9 MB/s eta 0:00:01 |███ | 61 kB 57.4 MB/s eta 0:00:01 |███▋ | 71 kB 60.3 MB/s eta 0:00:01 |████▏ | 81 kB 44.1 MB/s eta 0:00:01 |████▋ | 92 kB 46.7 MB/s eta 0:00:01 |█████▏ | 102 kB 45.7 MB/s eta 0:00:01 |█████▋ | 112 kB 45.7 MB/s eta 0:00:01 |██████▏ | 122 kB 45.7 MB/s eta 0:00:01 |██████▊ | 133 kB 45.7 MB/s eta 0:00:01 |███████▏ | 143 kB 45.7 MB/s eta 0:00:01 |███████▊ | 153 kB 45.7 MB/s eta 0:00:01 |████████▎ | 163 kB 45.7 MB/s eta 0:00:01 |████████▊ | 174 kB 45.7 MB/s eta 0:00:01 |█████████▎ | 184 kB 45.7 MB/s eta 0:00:01 |█████████▉ | 194 kB 45.7 MB/s eta 0:00:01 |██████████▎ | 204 kB 45.7 MB/s eta 0:00:01 |██████████▉ | 215 kB 45.7 MB/s eta 0:00:01 |███████████▎ | 225 kB 45.7 MB/s eta 0:00:01 |███████████▉ | 235 kB 45.7 MB/s eta 0:00:01 |████████████▍ | 245 kB 45.7 MB/s eta 0:00:01 |████████████▉ | 256 kB 45.7 MB/s eta 0:00:01 |█████████████▍ | 266 kB 45.7 MB/s eta 0:00:01 |██████████████ | 276 kB 45.7 MB/s eta 0:00:01 |██████████████▍ | 286 kB 45.7 MB/s eta 0:00:01 |███████████████ | 296 kB 45.7 MB/s eta 0:00:01 |███████████████▍ | 307 kB 45.7 MB/s eta 0:00:01 |████████████████ | 317 kB 45.7 MB/s eta 0:00:01 |████████████████▌ | 327 kB 45.7 MB/s eta 0:00:01 |█████████████████ | 337 kB 45.7 MB/s eta 0:00:01 |█████████████████▌ | 348 kB 45.7 MB/s eta 0:00:01 |██████████████████ | 358 kB 45.7 MB/s eta 0:00:01 |██████████████████▌ | 368 kB 45.7 MB/s eta 0:00:01 |███████████████████ | 378 kB 45.7 MB/s eta 0:00:01 |███████████████████▋ | 389 kB 45.7 MB/s eta 0:00:01 |████████████████████ | 399 kB 45.7 MB/s eta 0:00:01 |████████████████████▋ | 409 kB 45.7 MB/s eta 0:00:01 |█████████████████████ | 419 kB 45.7 MB/s eta 0:00:01 |█████████████████████▋ | 430 kB 45.7 MB/s eta 0:00:01 |██████████████████████▏ | 440 kB 45.7 MB/s eta 0:00:01 |██████████████████████▋ | 450 kB 45.7 MB/s eta 0:00:01 |███████████████████████▏ | 460 kB 45.7 MB/s eta 0:00:01 |███████████████████████▊ | 471 kB 45.7 MB/s eta 0:00:01 |████████████████████████▏ | 481 kB 45.7 MB/s eta 0:00:01 |████████████████████████▊ | 491 kB 45.7 MB/s eta 0:00:01 |█████████████████████████▏ | 501 kB 45.7 MB/s eta 0:00:01 |█████████████████████████▊ | 512 kB 45.7 MB/s eta 0:00:01 |██████████████████████████▎ | 522 kB 45.7 MB/s eta 0:00:01 |██████████████████████████▊ | 532 kB 45.7 MB/s eta 0:00:01 |███████████████████████████▎ | 542 kB 45.7 MB/s eta 0:00:01 |███████████████████████████▉ | 552 kB 45.7 MB/s eta 0:00:01 |████████████████████████████▎ | 563 kB 45.7 MB/s eta 0:00:01 |████████████████████████████▉ | 573 kB 45.7 MB/s eta 0:00:01 |█████████████████████████████▍ | 583 kB 45.7 MB/s eta 0:00:01 |█████████████████████████████▉ | 593 kB 45.7 MB/s eta 0:00:01 |██████████████████████████████▍ | 604 kB 45.7 MB/s eta 0:00:01 |██████████████████████████████▉ | 614 kB 45.7 MB/s eta 0:00:01 |███████████████████████████████▍| 624 kB 45.7 MB/s eta 0:00:01 |████████████████████████████████| 634 kB 45.7 MB/s eta 0:00:01 |████████████████████████████████| 636 kB 45.7 MB/s [?25h

Collecting pyod Downloading pyod-1.0.0.tar.gz (118 kB) [?25l |██▊ | 10 kB 35.4 MB/s eta 0:00:01 |█████▌ | 20 kB 42.1 MB/s eta 0:00:01 |████████▎ | 30 kB 51.6 MB/s eta 0:00:01 |███████████ | 40 kB 57.7 MB/s eta 0:00:01 |█████████████▉ | 51 kB 62.4 MB/s eta 0:00:01 |████████████████▌ | 61 kB 68.0 MB/s eta 0:00:01 |███████████████████▎ | 71 kB 62.8 MB/s eta 0:00:01 |██████████████████████ | 81 kB 54.8 MB/s eta 0:00:01 |████████████████████████▉ | 92 kB 58.1 MB/s eta 0:00:01 |███████████████████████████▋ | 102 kB 60.0 MB/s eta 0:00:01 |██████████████████████████████▍ | 112 kB 60.0 MB/s eta 0:00:01 |████████████████████████████████| 118 kB 60.0 MB/s [?25h

Preparing metadata (setup.py) ... [?25l

done [?25h

Collecting scikit-plot Downloading scikit_plot-0.3.7-py3-none-any.whl (33 kB)

Collecting textblob Downloading textblob-0.17.1-py2.py3-none-any.whl (636 kB) [?25l |▌ | 10 kB 34.9 MB/s eta 0:00:01 |█ | 20 kB 40.8 MB/s eta 0:00:01 |█▌ | 30 kB 50.9 MB/s eta 0:00:01 |██ | 40 kB 58.2 MB/s eta 0:00:01 |██▋ | 51 kB 61.0 MB/s eta 0:00:01 |███ | 61 kB 66.0 MB/s eta 0:00:01 |███▋ | 71 kB 65.0 MB/s eta 0:00:01 |████▏ | 81 kB 54.7 MB/s eta 0:00:01 |████▋ | 92 kB 57.1 MB/s eta 0:00:01 |█████▏ | 102 kB 55.2 MB/s eta 0:00:01 |█████▋ | 112 kB 55.2 MB/s eta 0:00:01 |██████▏ | 122 kB 55.2 MB/s eta 0:00:01 |██████▊ | 133 kB 55.2 MB/s eta 0:00:01 |███████▏ | 143 kB 55.2 MB/s eta 0:00:01 |███████▊ | 153 kB 55.2 MB/s eta 0:00:01 |████████▎ | 163 kB 55.2 MB/s eta 0:00:01 |████████▊ | 174 kB 55.2 MB/s eta 0:00:01 |█████████▎ | 184 kB 55.2 MB/s eta 0:00:01 |█████████▊ | 194 kB 55.2 MB/s eta 0:00:01

|██████████▎ | 204 kB 55.2 MB/s eta 0:00:01 |██████████▉ | 215 kB 55.2 MB/s eta 0:00:01 |███████████▎ | 225 kB 55.2 MB/s eta 0:00:01 |███████████▉ | 235 kB 55.2 MB/s eta 0:00:01 |████████████▍ | 245 kB 55.2 MB/s eta 0:00:01 |████████████▉ | 256 kB 55.2 MB/s eta 0:00:01 |█████████████▍ | 266 kB 55.2 MB/s eta 0:00:01 |██████████████ | 276 kB 55.2 MB/s eta 0:00:01 |██████████████▍ | 286 kB 55.2 MB/s eta 0:00:01 |███████████████ | 296 kB 55.2 MB/s eta 0:00:01 |███████████████▍ | 307 kB 55.2 MB/s eta 0:00:01 |████████████████ | 317 kB 55.2 MB/s eta 0:00:01 |████████████████▌ | 327 kB 55.2 MB/s eta 0:00:01 |█████████████████ | 337 kB 55.2 MB/s eta 0:00:01 |█████████████████▌ | 348 kB 55.2 MB/s eta 0:00:01 |██████████████████ | 358 kB 55.2 MB/s eta 0:00:01 |██████████████████▌ | 368 kB 55.2 MB/s eta 0:00:01 |███████████████████ | 378 kB 55.2 MB/s eta 0:00:01 |███████████████████▌ | 389 kB 55.2 MB/s eta 0:00:01 |████████████████████ | 399 kB 55.2 MB/s eta 0:00:01 |████████████████████▋ | 409 kB 55.2 MB/s eta 0:00:01 |█████████████████████ | 419 kB 55.2 MB/s eta 0:00:01 |█████████████████████▋ | 430 kB 55.2 MB/s eta 0:00:01 |██████████████████████▏ | 440 kB 55.2 MB/s eta 0:00:01 |██████████████████████▋ | 450 kB 55.2 MB/s eta 0:00:01 |███████████████████████▏ | 460 kB 55.2 MB/s eta 0:00:01 |███████████████████████▊ | 471 kB 55.2 MB/s eta 0:00:01 |████████████████████████▏ | 481 kB 55.2 MB/s eta 0:00:01 |████████████████████████▊ | 491 kB 55.2 MB/s eta 0:00:01 |█████████████████████████▏ | 501 kB 55.2 MB/s eta 0:00:01 |█████████████████████████▊ | 512 kB 55.2 MB/s eta 0:00:01 |██████████████████████████▎ | 522 kB 55.2 MB/s eta 0:00:01 |██████████████████████████▊ | 532 kB 55.2 MB/s eta 0:00:01 |███████████████████████████▎ | 542 kB 55.2 MB/s eta 0:00:01 |███████████████████████████▉ | 552 kB 55.2 MB/s eta 0:00:01 |████████████████████████████▎ | 563 kB 55.2 MB/s eta 0:00:01 |████████████████████████████▉ | 573 kB 55.2 MB/s eta 0:00:01 |█████████████████████████████▎ | 583 kB 55.2 MB/s eta 0:00:01 |█████████████████████████████▉ | 593 kB 55.2 MB/s eta 0:00:01 |██████████████████████████████▍ | 604 kB 55.2 MB/s eta 0:00:01 |██████████████████████████████▉ | 614 kB 55.2 MB/s eta 0:00:01 |███████████████████████████████▍| 624 kB 55.2 MB/s eta 0:00:01 |████████████████████████████████| 634 kB 55.2 MB/s eta 0:00:01 |████████████████████████████████| 636 kB 55.2 MB/s [?25hRequirement already satisfied: IPython in /srv/conda/envs/notebook/lib/python3.7/site-packages (from pycaret) (7.31.1)

Collecting wordcloud Downloading wordcloud-1.8.1-cp37-cp37m-manylinux1_x86_64.whl (366 kB) [?25l |█ | 10 kB 37.4 MB/s eta 0:00:01 |█▉ | 20 kB 44.7 MB/s eta 0:00:01 |██▊ | 30 kB 54.4 MB/s eta 0:00:01 |███▋ | 40 kB 60.0 MB/s eta 0:00:01 |████▌ | 51 kB 65.0 MB/s eta 0:00:01 |█████▍ | 61 kB 70.9 MB/s eta 0:00:01 |██████▎ | 71 kB 74.8 MB/s eta 0:00:01 |███████▏ | 81 kB 56.8 MB/s eta 0:00:01 |████████ | 92 kB 59.1 MB/s eta 0:00:01 |█████████ | 102 kB 56.8 MB/s eta 0:00:01 |█████████▉ | 112 kB 56.8 MB/s eta 0:00:01 |██████████▊ | 122 kB 56.8 MB/s eta 0:00:01 |███████████▋ | 133 kB 56.8 MB/s eta 0:00:01 |████████████▌ | 143 kB 56.8 MB/s eta 0:00:01 |█████████████▍ | 153 kB 56.8 MB/s eta 0:00:01 |██████████████▎ | 163 kB 56.8 MB/s eta 0:00:01 |███████████████▏ | 174 kB 56.8 MB/s eta 0:00:01 |████████████████ | 184 kB 56.8 MB/s eta 0:00:01 |█████████████████ | 194 kB 56.8 MB/s eta 0:00:01 |██████████████████ | 204 kB 56.8 MB/s eta 0:00:01 |██████████████████▉ | 215 kB 56.8 MB/s eta 0:00:01 |███████████████████▊ | 225 kB 56.8 MB/s eta 0:00:01 |████████████████████▋ | 235 kB 56.8 MB/s eta 0:00:01 |█████████████████████▌ | 245 kB 56.8 MB/s eta 0:00:01 |██████████████████████▍ | 256 kB 56.8 MB/s eta 0:00:01 |███████████████████████▎ | 266 kB 56.8 MB/s eta 0:00:01 |████████████████████████▏ | 276 kB 56.8 MB/s eta 0:00:01 |█████████████████████████ | 286 kB 56.8 MB/s eta 0:00:01 |██████████████████████████ | 296 kB 56.8 MB/s eta 0:00:01 |██████████████████████████▉ | 307 kB 56.8 MB/s eta 0:00:01 |███████████████████████████▊ | 317 kB 56.8 MB/s eta 0:00:01 |████████████████████████████▋ | 327 kB 56.8 MB/s eta 0:00:01 |█████████████████████████████▌ | 337 kB 56.8 MB/s eta 0:00:01 |██████████████████████████████▍ | 348 kB 56.8 MB/s eta 0:00:01 |███████████████████████████████▎| 358 kB 56.8 MB/s eta 0:00:01 |████████████████████████████████| 366 kB 56.8 MB/s [?25h

Collecting mlflow Downloading mlflow-1.25.1-py3-none-any.whl (16.8 MB) [?25l | | 10 kB 37.5 MB/s eta 0:00:01 | | 20 kB 44.9 MB/s eta 0:00:01 | | 30 kB 55.4 MB/s eta 0:00:01 | | 40 kB 60.7 MB/s eta 0:00:01 | | 51 kB 65.6 MB/s eta 0:00:01 |▏ | 61 kB 71.7 MB/s eta 0:00:01 |▏ | 71 kB 67.1 MB/s eta 0:00:01 |▏ | 81 kB 57.7 MB/s eta 0:00:01 |▏ | 92 kB 60.8 MB/s eta 0:00:01 |▏ | 102 kB 51.9 MB/s eta 0:00:01 |▏ | 112 kB 51.9 MB/s eta 0:00:01 |▎ | 122 kB 51.9 MB/s eta 0:00:01 |▎ | 133 kB 51.9 MB/s eta 0:00:01 |▎ | 143 kB 51.9 MB/s eta 0:00:01 |▎ | 153 kB 51.9 MB/s eta 0:00:01 |▎ | 163 kB 51.9 MB/s eta 0:00:01 |▎ | 174 kB 51.9 MB/s eta 0:00:01 |▍ | 184 kB 51.9 MB/s eta 0:00:01 |▍ | 194 kB 51.9 MB/s eta 0:00:01 |▍ | 204 kB 51.9 MB/s eta 0:00:01 |▍ | 215 kB 51.9 MB/s eta 0:00:01 |▍ | 225 kB 51.9 MB/s eta 0:00:01 |▌ | 235 kB 51.9 MB/s eta 0:00:01 |▌ | 245 kB 51.9 MB/s eta 0:00:01 |▌ | 256 kB 51.9 MB/s eta 0:00:01 |▌ | 266 kB 51.9 MB/s eta 0:00:01 |▌ | 276 kB 51.9 MB/s eta 0:00:01 |▌ | 286 kB 51.9 MB/s eta 0:00:01 |▋ | 296 kB 51.9 MB/s eta 0:00:01 |▋ | 307 kB 51.9 MB/s eta 0:00:01 |▋ | 317 kB 51.9 MB/s eta 0:00:01 |▋ | 327 kB 51.9 MB/s eta 0:00:01 |▋ | 337 kB 51.9 MB/s eta 0:00:01 |▋ | 348 kB 51.9 MB/s eta 0:00:01 |▊ | 358 kB 51.9 MB/s eta 0:00:01 |▊ | 368 kB 51.9 MB/s eta 0:00:01 |▊ | 378 kB 51.9 MB/s eta 0:00:01 |▊ | 389 kB 51.9 MB/s eta 0:00:01 |▊ | 399 kB 51.9 MB/s eta 0:00:01 |▉ | 409 kB 51.9 MB/s eta 0:00:01 |▉ | 419 kB 51.9 MB/s eta 0:00:01 |▉ | 430 kB 51.9 MB/s eta 0:00:01 |▉ | 440 kB 51.9 MB/s eta 0:00:01 |▉ | 450 kB 51.9 MB/s eta 0:00:01 |▉ | 460 kB 51.9 MB/s eta 0:00:01 |█ | 471 kB 51.9 MB/s eta 0:00:01 |█ | 481 kB 51.9 MB/s eta 0:00:01 |█ | 491 kB 51.9 MB/s eta 0:00:01 |█ | 501 kB 51.9 MB/s eta 0:00:01 |█ | 512 kB 51.9 MB/s eta 0:00:01 |█ | 522 kB 51.9 MB/s eta 0:00:01 |█ | 532 kB 51.9 MB/s eta 0:00:01 |█ | 542 kB 51.9 MB/s eta 0:00:01 |█ | 552 kB 51.9 MB/s eta 0:00:01 |█ | 563 kB 51.9 MB/s eta 0:00:01 |█ | 573 kB 51.9 MB/s eta 0:00:01 |█▏ | 583 kB 51.9 MB/s eta 0:00:01 |█▏ | 593 kB 51.9 MB/s eta 0:00:01 |█▏ | 604 kB 51.9 MB/s eta 0:00:01 |█▏ | 614 kB 51.9 MB/s eta 0:00:01 |█▏ | 624 kB 51.9 MB/s eta 0:00:01 |█▏ | 634 kB 51.9 MB/s eta 0:00:01 |█▎ | 645 kB 51.9 MB/s eta 0:00:01 |█▎ | 655 kB 51.9 MB/s eta 0:00:01 |█▎ | 665 kB 51.9 MB/s eta 0:00:01 |█▎ | 675 kB 51.9 MB/s eta 0:00:01 |█▎ | 686 kB 51.9 MB/s eta 0:00:01 |█▎ | 696 kB 51.9 MB/s eta 0:00:01 |█▍ | 706 kB 51.9 MB/s eta 0:00:01 |█▍ | 716 kB 51.9 MB/s eta 0:00:01 |█▍ | 727 kB 51.9 MB/s eta 0:00:01 |█▍ | 737 kB 51.9 MB/s eta 0:00:01 |█▍ | 747 kB 51.9 MB/s eta 0:00:01 too many strings

一応 Colab mode を有効にします。

from pycaret.utils import enable_colab
enable_colab()

Colab mode enabled.

データの取得

用意されている iris のデータを取得していきます。

from pycaret.datasets import get_data
dataset = get_data('iris')

	sepal_length	sepal_width	petal_length	petal_width	species
0	5.1	3.5	1.4	0.2	Iris-setosa
1	4.9	3.0	1.4	0.2	Iris-setosa
2	4.7	3.2	1.3	0.2	Iris-setosa
3	4.6	3.1	1.5	0.2	Iris-setosa
4	5.0	3.6	1.4	0.2	Iris-setosa

説明変数のがく片の長さと幅、花びらの長さと幅、予測対象の品種のデータセットを取得できました。
取得したデータセットをモデル作成用( data )と予測用( data_unseen )とに9:1に分けます。

data = dataset.sample(frac=0.9, random_state=786)
data_unseen = dataset.drop(data.index)

data.reset_index(drop=True, inplace=True)
data_unseen.reset_index(drop=True, inplace=True)

print('Data for Modeling: ' + str(data.shape))
print('Unseen Data For Predictions: ' + str(data_unseen.shape))

Data for Modeling: (135, 5) Unseen Data For Predictions: (15, 5)

前処理

setup() を使って前処理をします。
予測対象の品種を target = 'species' として指定します。
実行すると、投入したデータの型のチェックを自動で実施してくれます。
問題なければ、カーソルをあわせ、 Enter を押します。

from pycaret.classification import *
exp_mclf101 = setup(data = data, target = 'species', session_id=123)

	Description	Value
0	session_id	123
1	Target	species
2	Target Type	Multiclass
3	Label Encoded	Iris-setosa: 0, Iris-versicolor: 1, Iris-virginica: 2
4	Original Data	(135, 5)
5	Missing Values	False
6	Numeric Features	4
7	Categorical Features	0
8	Ordinal Features	False
9	High Cardinality Features	False
10	High Cardinality Method	None
11	Transformed Train Set	(94, 4)
12	Transformed Test Set	(41, 4)
13	Shuffle Train-Test	True
14	Stratify Train-Test	False
15	Fold Generator	StratifiedKFold
16	Fold Number	10
17	CPU Jobs	-1
18	Use GPU	False
19	Log Experiment	False
20	Experiment Name	clf-default-name
21	USI	caab
22	Imputation Type	simple
23	Iterative Imputation Iteration	None
24	Numeric Imputer	mean
25	Iterative Imputation Numeric Model	None
26	Categorical Imputer	constant
27	Iterative Imputation Categorical Model	None
28	Unknown Categoricals Handling	least_frequent
29	Normalize	False
30	Normalize Method	None
31	Transformation	False
32	Transformation Method	None
33	PCA	False
34	PCA Method	None
35	PCA Components	None
36	Ignore Low Variance	False
37	Combine Rare Levels	False
38	Rare Level Threshold	None
39	Numeric Binning	False
40	Remove Outliers	False
41	Outliers Threshold	None
42	Remove Multicollinearity	False
43	Multicollinearity Threshold	None
44	Remove Perfect Collinearity	True
45	Clustering	False
46	Clustering Iteration	None
47	Polynomial Features	False
48	Polynomial Degree	None
49	Trignometry Features	False
50	Polynomial Threshold	None
51	Group Features	False
52	Feature Selection	False
53	Feature Selection Method	classic
54	Features Selection Threshold	None
55	Feature Interaction	False
56	Feature Ratio	False
57	Interaction Threshold	None
58	Fix Imbalance	False
59	Fix Imbalance Method	SMOTE

Target や Target Type、Train/Test Set の shape を表示してくれます。
詳細はここでは割愛します。

モデルの比較

setup の次は、主要なモデルのパフォーマンスを比較していきます。

best = compare_models()

	Model	Accuracy	AUC	Recall	Prec.	F1	Kappa	MCC	TT (Sec)
lda	Linear Discriminant Analysis	0.9678	0.9963	0.9667	0.9758	0.9669	0.9515	0.9560	0.0100
nb	Naive Bayes	0.9578	0.9897	0.9556	0.9713	0.9546	0.9364	0.9442	0.0090
qda	Quadratic Discriminant Analysis	0.9567	1.0000	0.9556	0.9708	0.9533	0.9348	0.9433	0.0100
lr	Logistic Regression	0.9478	0.9963	0.9444	0.9638	0.9444	0.9212	0.9304	0.2640
knn	K Neighbors Classifier	0.9467	0.9926	0.9444	0.9630	0.9432	0.9197	0.9291	0.1190
lightgbm	Light Gradient Boosting Machine	0.9456	0.9852	0.9444	0.9625	0.9419	0.9182	0.9282	0.0290
ada	Ada Boost Classifier	0.9256	0.9809	0.9222	0.9505	0.9194	0.8879	0.9026	0.0620
gbc	Gradient Boosting Classifier	0.9256	0.9815	0.9222	0.9505	0.9194	0.8879	0.9026	0.1270
et	Extra Trees Classifier	0.9256	0.9926	0.9222	0.9505	0.9194	0.8879	0.9026	0.4650
dt	Decision Tree Classifier	0.9144	0.9369	0.9111	0.9366	0.9086	0.8712	0.8843	0.0100
rf	Random Forest Classifier	0.9144	0.9852	0.9111	0.9305	0.9101	0.8712	0.8813	0.5310
svm	SVM - Linear Kernel	0.8522	0.0000	0.8361	0.8261	0.8197	0.7755	0.8099	0.0640
ridge	Ridge Classifier	0.8300	0.0000	0.8222	0.8544	0.8178	0.7433	0.7648	0.0080
dummy	Dummy Classifier	0.3822	0.5000	0.3333	0.1480	0.2128	0.0000	0.0000	0.0070

主要なモデルの各スコア( average Accuracy, Recall, Precision, F1, Kappa, and MCC accross the folds ) を表示してくれます。
一行のコードで、いたせりつくせりですね。便利。

モデルの作成

create_model() を使ってモデルを作っていきます。ここでは、Logistic Regression ('lr') でモデルを作成してきます。

lr = create_model('lr')

	Accuracy	AUC	Recall	Prec.	F1	Kappa	MCC
Fold
0	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000
1	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000
2	0.9000	1.0000	0.8889	0.9250	0.8971	0.8485	0.8616
3	0.8000	1.0000	0.7778	0.8800	0.7750	0.6970	0.7435
4	0.8889	0.9630	0.8889	0.9167	0.8857	0.8333	0.8492
5	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000
6	0.8889	1.0000	0.8889	0.9167	0.8857	0.8333	0.8492
7	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000
8	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000
9	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000
Mean	0.9478	0.9963	0.9444	0.9638	0.9444	0.9212	0.9304
Std	0.0689	0.0111	0.0745	0.0456	0.0751	0.1041	0.0905

#trained model object is stored in the variable 'lr'. 
print(lr)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=1000, multi_class='auto', n_jobs=None, penalty='l2', random_state=123, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False)

モデルのチューニング

tune_model() を使い、ハイパーパラメータチューニングを実施します。
手法は Random Grid Search とのことです。また、 Accuracy を optimize するとのことです。

tuned_lr = tune_model(lr)

	Accuracy	AUC	Recall	Prec.	F1	Kappa	MCC
Fold
0	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000
1	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000
2	0.9000	1.0000	0.8889	0.9250	0.8971	0.8485	0.8616
3	0.8000	1.0000	0.7778	0.8800	0.7750	0.6970	0.7435
4	0.8889	1.0000	0.8889	0.9167	0.8857	0.8333	0.8492
5	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000
6	0.8889	1.0000	0.8889	0.9167	0.8857	0.8333	0.8492
7	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000
8	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000
9	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000
Mean	0.9478	1.0000	0.9444	0.9638	0.9444	0.9212	0.9304
Std	0.0689	0.0000	0.0745	0.0456	0.0751	0.1041	0.0905

#tuned model object is stored in the variable 'tuned_dt'. 
print(tuned_lr)

LogisticRegression(C=2.833, class_weight='balanced', dual=False, fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=1000, multi_class='auto', n_jobs=None, penalty='l2', random_state=123, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False)

パラメータチューニングの次は、モデルをプロットしていきます。

モデルをプロット

plot_model() を使ってプロットしていきます。

Confusion Matrix

plot_model(tuned_lr, plot = 'confusion_matrix')

findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.

Classification Report

plot_model(tuned_lr, plot = 'class_report')

findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.

Decision Boundary Plot

plot_model(tuned_lr, plot='boundary')

Prediction Error Plot

plot_model(tuned_lr, plot = 'error')

evaluate_model()

evaluate_model() を使うとユーザーインターフェースでプロットを見ることができます。

evaluate_model(tuned_lr)

	Parameters
C	2.833
class_weight	balanced
dual	False
fit_intercept	True
intercept_scaling	1
l1_ratio	None
max_iter	1000
multi_class	auto
n_jobs	None
penalty	l2
random_state	123
solver	lbfgs
tol	0.0001
verbose	0
warm_start	False

残念ながら magicode では、うまく表示できませんでしたが、 Colab では実施できました。

こちらの画像のように Plot Type を指定でき、この画像の例では Decision Tree を可視化したものです。

テストデータでの予測

Train Set で学習させたモデルを使い、Test Set で予測してみます。

predict_model(tuned_lr);

	Model	Accuracy	AUC	Recall	Prec.	F1	Kappa	MCC
0	Logistic Regression	0.9512	1.0000	0.9556	0.9566	0.9509	0.9253	0.9287

デプロイに向けたモデルの仕上げ

finalize_model() を使いモデルを仕上げていきます。
テスト/ホールドアウトサンプルを含む完全なデータセットでモデルを学習させます。

final_lr = finalize_model(tuned_lr)
print(final_lr)

unseen_predictions = predict_model(final_knn, data=data_unseen)
unseen_predictions.head()

	Model	Accuracy	AUC	Recall	Prec.	F1	Kappa	MCC
0	K Neighbors Classifier	0.0000	1.0000	0	0	0	0	0

	sepal_length	sepal_width	petal_length	petal_width	species	Label	Score
0	5.4	3.9	1.7	0.4	Iris-setosa	Iris-setosa	1.0000
1	5.4	3.4	1.7	0.2	Iris-setosa	Iris-setosa	1.0000
2	5.1	3.3	1.7	0.5	Iris-setosa	Iris-setosa	1.0000
3	4.8	3.1	1.6	0.2	Iris-setosa	Iris-setosa	1.0000
4	6.9	3.1	4.9	1.5	Iris-versicolor	Iris-versicolor	0.5455

予測用データでの予測

予測用に取っておいた( data_unseen )を使い予測します。

unseen_predictions = predict_model(final_lr, data=data_unseen)
unseen_predictions.head()

	Model	Accuracy	AUC	Recall	Prec.	F1	Kappa	MCC
0	Logistic Regression	0.0000	1.0000	0	0	0	0	0

	sepal_length	sepal_width	petal_length	petal_width	species	Label	Score
0	5.4	3.9	1.7	0.4	Iris-setosa	Iris-setosa	0.9831
1	5.4	3.4	1.7	0.2	Iris-setosa	Iris-setosa	0.9644
2	5.1	3.3	1.7	0.5	Iris-setosa	Iris-setosa	0.9699
3	4.8	3.1	1.6	0.2	Iris-setosa	Iris-setosa	0.9781
4	6.9	3.1	4.9	1.5	Iris-versicolor	Iris-versicolor	0.8227

モデルを使った予測ができ、Label と Score が算出されているのが、わかるかと思います。

モデルの save と load

保存は save_model() を使い、 load は load_model() を使います。

save_model(final_lr,'Final lr Model')

Transformation Pipeline and Model Successfully Saved

(Pipeline(memory=None, steps=[('dtypes', DataTypes_Auto_infer(categorical_features=[], display_types=True, features_todrop=[], id_columns=[], ml_usecase='classification', numerical_features=[], target='species', time_features=[])), ('imputer', Simple_Imputer(categorical_strategy='not_available', fill_value_categorical=None, fill_value_numerical=None, numeric_stra... ('feature_select', 'passthrough'), ('fix_multi', 'passthrough'), ('dfs', 'passthrough'), ('pca', 'passthrough'), ['trained_model', LogisticRegression(C=2.833, class_weight='balanced', dual=False, fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=1000, multi_class='auto', n_jobs=None, penalty='l2', random_state=123, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False)]], verbose=False), 'Final lr Model.pkl')

saved_final_lr = load_model('Final lr Model')

Transformation Pipeline and Model Successfully Loaded

save ＆ load したモデルを使って再度予測を実行してみます。

new_prediction = predict_model(saved_final_lr, data=data_unseen)
new_prediction.head()

	Model	Accuracy	AUC	Recall	Prec.	F1	Kappa	MCC
0	Logistic Regression	0.0000	1.0000	0	0	0	0	0

	sepal_length	sepal_width	petal_length	petal_width	species	Label	Score
0	5.4	3.9	1.7	0.4	Iris-setosa	Iris-setosa	0.9831
1	5.4	3.4	1.7	0.2	Iris-setosa	Iris-setosa	0.9644
2	5.1	3.3	1.7	0.5	Iris-setosa	Iris-setosa	0.9699
3	4.8	3.1	1.6	0.2	Iris-setosa	Iris-setosa	0.9781
4	6.9	3.1	4.9	1.5	Iris-versicolor	Iris-versicolor	0.8227

無事、予測できていることがわかります。

以上、 PyCaret の多クラス分類のチュートリアルでした。
数行書くだけで機械学習できちゃうのは本当に便利ですね。

数行のコードで機械学習ができる PyCaret のチュートリアルを試してみた多クラス分類

PyCaret とは

多クラス分類のチュートリアル

PyCaret のインストール

データの取得

前処理

モデルの比較

モデルの作成

モデルのチューニング

モデルをプロット

Confusion Matrix

Classification Report

Decision Boundary Plot

Prediction Error Plot

evaluate_model()

テストデータでの予測

デプロイに向けたモデルの仕上げ

予測用データでの予測

モデルの save と load

Discussion

コメントにはログインが必要です。

数行のコードで機械学習ができる PyCaret のチュートリアルを試してみた 多クラス分類

PyCaret とは

多クラス分類のチュートリアル

PyCaret のインストール

データの取得

前処理

モデルの比較

モデルの作成

モデルのチューニング

モデルをプロット

Confusion Matrix

Classification Report

Decision Boundary Plot

Prediction Error Plot

evaluate_model()

テストデータでの予測

デプロイに向けたモデルの仕上げ

予測用データでの予測

モデルの save と load

Discussion

コメントにはログインが必要です。

数行のコードで機械学習ができる PyCaret のチュートリアルを試してみた多クラス分類