Automated Machine Learning with Scikit-Learn

4. Automated Machine Learning with Scikit-Learn

Open In Colab

In this tutorial, we will show how to automatically search among different machine learning algorithms from Scikit-Learn. Automated machine learning only requires the user to link the data with a predifined problem and run function that we provide.

Let us start by installing DeepHyper.

[1]:
!pip install deephyper
Requirement already satisfied: deephyper in /usr/local/lib/python3.7/dist-packages (0.3.3)
Requirement already satisfied: networkx in /usr/local/lib/python3.7/dist-packages (from deephyper) (2.6.3)
Requirement already satisfied: tensorflow>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from deephyper) (2.7.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from deephyper) (4.62.3)
Requirement already satisfied: statsmodels in /usr/local/lib/python3.7/dist-packages (from deephyper) (0.10.2)
Requirement already satisfied: scikit-learn>=0.23.1 in /usr/local/lib/python3.7/dist-packages (from deephyper) (1.0.1)
Requirement already satisfied: pydot in /usr/local/lib/python3.7/dist-packages (from deephyper) (1.3.0)
Requirement already satisfied: pandas>=0.24.2 in /usr/local/lib/python3.7/dist-packages (from deephyper) (1.1.5)
Requirement already satisfied: tensorflow-probability in /usr/local/lib/python3.7/dist-packages (from deephyper) (0.14.1)
Requirement already satisfied: ray[default]>=1.3.0 in /usr/local/lib/python3.7/dist-packages (from deephyper) (1.8.0)
Requirement already satisfied: xgboost in /usr/local/lib/python3.7/dist-packages (from deephyper) (0.90)
Requirement already satisfied: matplotlib>=3.0.3 in /usr/local/lib/python3.7/dist-packages (from deephyper) (3.2.2)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from deephyper) (1.19.5)
Requirement already satisfied: Jinja2 in /usr/local/lib/python3.7/dist-packages (from deephyper) (2.11.3)
Requirement already satisfied: typeguard in /usr/local/lib/python3.7/dist-packages (from deephyper) (2.7.1)
Requirement already satisfied: dh-scikit-optimize==0.9.4 in /usr/local/lib/python3.7/dist-packages (from deephyper) (0.9.4)
Requirement already satisfied: joblib>=0.10.3 in /usr/local/lib/python3.7/dist-packages (from deephyper) (1.1.0)
Requirement already satisfied: openml==0.10.2 in /usr/local/lib/python3.7/dist-packages (from deephyper) (0.10.2)
Requirement already satisfied: ConfigSpace>=0.4.18 in /usr/local/lib/python3.7/dist-packages (from deephyper) (0.4.20)
Requirement already satisfied: pyaml>=16.9 in /usr/local/lib/python3.7/dist-packages (from dh-scikit-optimize==0.9.4->deephyper) (21.10.1)
Requirement already satisfied: scipy>=0.19.1 in /usr/local/lib/python3.7/dist-packages (from dh-scikit-optimize==0.9.4->deephyper) (1.4.1)
Requirement already satisfied: liac-arff>=2.4.0 in /usr/local/lib/python3.7/dist-packages (from openml==0.10.2->deephyper) (2.5.0)
Requirement already satisfied: xmltodict in /usr/local/lib/python3.7/dist-packages (from openml==0.10.2->deephyper) (0.12.0)
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from openml==0.10.2->deephyper) (2.23.0)
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.7/dist-packages (from openml==0.10.2->deephyper) (2.8.2)
Requirement already satisfied: pyparsing in /usr/local/lib/python3.7/dist-packages (from ConfigSpace>=0.4.18->deephyper) (2.4.7)
Requirement already satisfied: cython in /usr/local/lib/python3.7/dist-packages (from ConfigSpace>=0.4.18->deephyper) (0.29.24)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.3->deephyper) (1.3.2)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.3->deephyper) (0.11.0)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.24.2->deephyper) (2018.9)
Requirement already satisfied: PyYAML in /usr/local/lib/python3.7/dist-packages (from pyaml>=16.9->dh-scikit-optimize==0.9.4->deephyper) (3.13)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil->openml==0.10.2->deephyper) (1.15.0)
Requirement already satisfied: grpcio>=1.28.1 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (1.41.1)
Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (7.1.2)
Requirement already satisfied: protobuf>=3.15.3 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (3.17.3)
Requirement already satisfied: redis>=3.5.0 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (4.0.1)
Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (3.3.2)
Requirement already satisfied: msgpack<2.0.0,>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (1.0.2)
Requirement already satisfied: jsonschema in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (2.6.0)
Requirement already satisfied: attrs in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (21.2.0)
Requirement already satisfied: colorful in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (0.5.4)
Requirement already satisfied: prometheus-client>=0.7.1 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (0.12.0)
Requirement already satisfied: aiohttp>=3.7 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (3.8.1)
Requirement already satisfied: aioredis<2 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (1.3.1)
Requirement already satisfied: aiohttp-cors in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (0.7.0)
Requirement already satisfied: gpustat>=1.0.0b1 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (1.0.0b1)
Requirement already satisfied: py-spy>=0.2.0 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (0.3.11)
Requirement already satisfied: opencensus in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (0.8.0)
Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (1.2.0)
Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (1.7.2)
Requirement already satisfied: typing-extensions>=3.7.4 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (3.10.0.2)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (4.0.1)
Requirement already satisfied: charset-normalizer<3.0,>=2.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (2.0.7)
Requirement already satisfied: asynctest==0.13.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (0.13.0)
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (1.2.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (5.2.0)
Requirement already satisfied: hiredis in /usr/local/lib/python3.7/dist-packages (from aioredis<2->ray[default]>=1.3.0->deephyper) (2.0.0)
Requirement already satisfied: blessed>=1.17.1 in /usr/local/lib/python3.7/dist-packages (from gpustat>=1.0.0b1->ray[default]>=1.3.0->deephyper) (1.19.0)
Requirement already satisfied: nvidia-ml-py3>=7.352.0 in /usr/local/lib/python3.7/dist-packages (from gpustat>=1.0.0b1->ray[default]>=1.3.0->deephyper) (7.352.0)
Requirement already satisfied: psutil in /usr/local/lib/python3.7/dist-packages (from gpustat>=1.0.0b1->ray[default]>=1.3.0->deephyper) (5.4.8)
Requirement already satisfied: wcwidth>=0.1.4 in /usr/local/lib/python3.7/dist-packages (from blessed>=1.17.1->gpustat>=1.0.0b1->ray[default]>=1.3.0->deephyper) (0.2.5)
Requirement already satisfied: deprecated in /usr/local/lib/python3.7/dist-packages (from redis>=3.5.0->ray[default]>=1.3.0->deephyper) (1.2.13)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.23.1->deephyper) (3.0.0)
Requirement already satisfied: flatbuffers<3.0,>=1.12 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (2.0)
Requirement already satisfied: keras<2.8,>=2.7.0rc0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (2.7.0)
Requirement already satisfied: wheel<1.0,>=0.32.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (0.37.0)
Requirement already satisfied: libclang>=9.0.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (12.0.0)
Requirement already satisfied: h5py>=2.9.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (3.1.0)
Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (3.3.0)
Requirement already satisfied: keras-preprocessing>=1.1.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (1.1.2)
Requirement already satisfied: tensorflow-estimator<2.8,~=2.7.0rc0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (2.7.0)
Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.21.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (0.22.0)
Requirement already satisfied: google-pasta>=0.1.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (0.2.0)
Requirement already satisfied: astunparse>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (1.6.3)
Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (1.1.0)
Requirement already satisfied: tensorboard~=2.6 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (2.7.0)
Requirement already satisfied: wrapt>=1.11.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (1.13.3)
Requirement already satisfied: absl-py>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (0.12.0)
Requirement already satisfied: gast<0.5.0,>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (0.4.0)
Requirement already satisfied: cached-property in /usr/local/lib/python3.7/dist-packages (from h5py>=2.9.0->tensorflow>=2.0.0->deephyper) (1.5.2)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (3.3.4)
Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (0.6.1)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (1.8.0)
Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (1.0.1)
Requirement already satisfied: setuptools>=41.0.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (57.4.0)
Requirement already satisfied: google-auth<3,>=1.6.3 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (1.35.0)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (0.4.6)
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.6.3->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (4.7.2)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.6.3->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (0.2.8)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.6.3->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (4.2.4)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.7/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (1.3.0)
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from markdown>=2.6.8->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (4.8.2)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.7/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (0.4.8)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->openml==0.10.2->deephyper) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->openml==0.10.2->deephyper) (2.10)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->openml==0.10.2->deephyper) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->openml==0.10.2->deephyper) (2021.10.8)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (3.1.1)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->markdown>=2.6.8->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (3.6.0)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.7/dist-packages (from Jinja2->deephyper) (2.0.1)
Requirement already satisfied: opencensus-context==0.1.2 in /usr/local/lib/python3.7/dist-packages (from opencensus->ray[default]>=1.3.0->deephyper) (0.1.2)
Requirement already satisfied: google-api-core<3.0.0,>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from opencensus->ray[default]>=1.3.0->deephyper) (1.26.3)
Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]>=1.3.0->deephyper) (1.53.0)
Requirement already satisfied: packaging>=14.3 in /usr/local/lib/python3.7/dist-packages (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]>=1.3.0->deephyper) (21.2)
Requirement already satisfied: patsy>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from statsmodels->deephyper) (0.5.2)
Requirement already satisfied: cloudpickle>=1.3 in /usr/local/lib/python3.7/dist-packages (from tensorflow-probability->deephyper) (1.3.0)
Requirement already satisfied: dm-tree in /usr/local/lib/python3.7/dist-packages (from tensorflow-probability->deephyper) (0.1.6)
Requirement already satisfied: decorator in /usr/local/lib/python3.7/dist-packages (from tensorflow-probability->deephyper) (4.4.2)

Warning

By design asyncio does not allow nested event loops. Jupyter is using Tornado which already starts an event loop. Therefore the following patch is required to run this tutorial.

[2]:
!pip install nest_asyncio

import nest_asyncio
nest_asyncio.apply()
Requirement already satisfied: nest_asyncio in /usr/local/lib/python3.7/dist-packages (1.5.1)

4.1. Classification

On this part of the tutorial we focus on the classification case.

Create run function to train and evaluate the model corresponding to the configuration generated by the search. This function has to return a scalar value (typically, validation accuracy), which will be maximized by the search algorithm. In the case of automated machine learning we use the run function provided at deephyper.sklearn.classifier.run_autosklearn1 and wrap it with our data such as:

[3]:
from deephyper.sklearn.classifier import run_autosklearn1


def load_data():
    from sklearn.datasets import load_breast_cancer

    X, y = load_breast_cancer(return_X_y=True)

    return X, y


def run(config):
    return run_autosklearn1(config, load_data)

We are ready to go! But, let us look at the problem provided by DeepHyper in deephyper.sklearn.classifier.problem_autosklearn1 to understand better what is happening under the hood.

[4]:
from deephyper.sklearn.classifier import problem_autosklearn1

problem_autosklearn1
[4]:
Configuration space object:
  Hyperparameters:
    C, Type: UniformFloat, Range: [1e-05, 10.0], Default: 0.01, on log-scale
    alpha, Type: UniformFloat, Range: [1e-05, 10.0], Default: 0.01, on log-scale
    classifier, Type: Categorical, Choices: {RandomForest, Logistic, AdaBoost, KNeighbors, MLP, SVC, XGBoost}, Default: RandomForest
    gamma, Type: UniformFloat, Range: [1e-05, 10.0], Default: 0.01, on log-scale
    kernel, Type: Categorical, Choices: {linear, poly, rbf, sigmoid}, Default: linear
    max_depth, Type: UniformInteger, Range: [2, 100], Default: 14, on log-scale
    n_estimators, Type: UniformInteger, Range: [1, 2000], Default: 45, on log-scale
    n_neighbors, Type: UniformInteger, Range: [1, 100], Default: 50
  Conditions:
    (C | classifier == 'Logistic' || C | classifier == 'SVC')
    (gamma | kernel == 'rbf' || gamma | kernel == 'poly' || gamma | kernel == 'sigmoid')
    (n_estimators | classifier == 'RandomForest' || n_estimators | classifier == 'AdaBoost')
    alpha | classifier == 'MLP'
    kernel | classifier == 'SVC'
    max_depth | classifier == 'RandomForest'
    n_neighbors | classifier == 'KNeighbors'

Create an Evaluator object using the ray backend to distribute the evaluation of the run-function defined previously.

[5]:
from deephyper.evaluator import Evaluator
from deephyper.evaluator.callback import LoggerCallback

evaluator = Evaluator.create(run,
                 method="ray",
                 method_kwargs={
                     "address": None,
                     "num_cpus": 1,
                     "num_cpus_per_task": 1,
                     "callbacks": [LoggerCallback()]
                 })

print("Number of workers: ", evaluator.num_workers)
Number of workers:  1

Tip

If you execute this tutorial locally, you can open the ray-dashboard at an address like http://127.0.0.1:port in a browser to monitor the CPU usage of the execution.

Finally, you can define a Bayesian optimization search called AMBS (for Asynchronous Model-Based Search) and link to it the defined problem_autosklearn1 and evaluator.

[6]:
from deephyper.search.hps import AMBS

search = AMBS(problem_autosklearn1, evaluator)
[7]:
results = search.search(10)
[00001] -- best objective: -1.00000 -- received objective: -1.00000
[00002] -- best objective: 0.96277 -- received objective: 0.96277
[00003] -- best objective: 0.96277 -- received objective: 0.94149
[00004] -- best objective: 0.97872 -- received objective: 0.97872
(pid=733) /usr/local/lib/python3.7/dist-packages/sklearn/neural_network/_multilayer_perceptron.py:696: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
(pid=733)   ConvergenceWarning,
[00005] -- best objective: 0.97872 -- received objective: 0.97872
(pid=733) /usr/local/lib/python3.7/dist-packages/sklearn/neural_network/_multilayer_perceptron.py:696: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
(pid=733)   ConvergenceWarning,
[00006] -- best objective: 0.97872 -- received objective: 0.95745
[00007] -- best objective: 0.97872 -- received objective: 0.96809
[00008] -- best objective: 0.97872 -- received objective: 0.97340
[00009] -- best objective: 0.97872 -- received objective: 0.96277
[00010] -- best objective: 0.97872 -- received objective: 0.96809

Once the search is over, a file named results.csv is saved in the current directory. The same dataframe is returned by the search.search(...) call. It contains the hyperparameters configurations evaluated during the search and their corresponding objective value (i.e, validation accuracy), duration of computation and time of computation with elapsed_sec.

[8]:
results
[8]:
classifier C alpha kernel max_depth n_estimators n_neighbors gamma id objective elapsed_sec duration
0 AdaBoost NaN NaN NaN NaN 1.0 NaN NaN 1 -1.000000 13.452292 3.579814
1 KNeighbors NaN NaN NaN NaN NaN 12.0 NaN 2 0.962766 15.119490 0.128026
2 KNeighbors NaN NaN NaN NaN NaN 26.0 NaN 3 0.941489 16.815022 0.126511
3 MLP NaN 0.001721 NaN NaN NaN NaN NaN 4 0.978723 18.776632 0.450363
4 MLP NaN 0.719261 NaN NaN NaN NaN NaN 5 0.978723 20.717520 0.451758
5 RandomForest NaN NaN NaN 7.0 1284.0 NaN NaN 6 0.957447 25.709774 3.390466
6 RandomForest NaN NaN NaN 18.0 140.0 NaN NaN 7 0.968085 27.733081 0.481195
7 AdaBoost NaN NaN NaN NaN 256.0 NaN NaN 8 0.973404 30.094701 0.792026
8 AdaBoost NaN NaN NaN NaN 13.0 NaN NaN 9 0.962766 31.660862 0.056839
9 AdaBoost NaN NaN NaN NaN 7.0 NaN NaN 10 0.968085 33.250499 0.037967

The deephyper-analytics command line is a way of analyzing this type of file. For example, we want to output the best configuration we can use the topk functionnality.

[9]:
!deephyper-analytics topk results.csv -k 3
'0': {C: null, alpha: 0.0017211808, classifier: MLP, duration: 0.4503633976, elapsed_sec: 18.7766315937,
  gamma: null, id: 4, kernel: null, max_depth: null, n_estimators: null, n_neighbors: null,
  objective: 0.9787234043}
'1': {C: null, alpha: 0.7192606656, classifier: MLP, duration: 0.4517583847, elapsed_sec: 20.7175195217,
  gamma: null, id: 5, kernel: null, max_depth: null, n_estimators: null, n_neighbors: null,
  objective: 0.9787234043}
'2': {C: null, alpha: null, classifier: AdaBoost, duration: 0.7920260429, elapsed_sec: 30.0947012901,
  gamma: null, id: 8, kernel: null, max_depth: null, n_estimators: 256.0, n_neighbors: null,
  objective: 0.9734042553}

4.2. Regression

On this part of the tutorial we focus on the regression case.

Create run function to train and evaluate the model corresponding to the configuration generated by the search. This function has to return a scalar value (typically, validation \(R^2\)), which will be maximized by the search algorithm. In the case of automated machine learning we use the run-function provided at deephyper.sklearn.regressor.run_autosklearn1 and wrap it with our data such as:

[10]:
from deephyper.sklearn.regressor import run_autosklearn1


def load_data():
    from sklearn.datasets import fetch_california_housing

    X, y = fetch_california_housing(return_X_y=True)
    return X, y


def run(config):
    return run_autosklearn1(config, load_data)

We are ready to go! But, let us look at the problem provided by DeepHyper to understand better what is happening under the hood.

[11]:
from deephyper.sklearn.regressor import problem_autosklearn1

problem_autosklearn1
[11]:
Configuration space object:
  Hyperparameters:
    C, Type: UniformFloat, Range: [1e-05, 10.0], Default: 0.01, on log-scale
    alpha, Type: UniformFloat, Range: [1e-05, 10.0], Default: 0.01, on log-scale
    gamma, Type: UniformFloat, Range: [1e-05, 10.0], Default: 0.01, on log-scale
    kernel, Type: Categorical, Choices: {linear, poly, rbf, sigmoid}, Default: linear
    max_depth, Type: UniformInteger, Range: [2, 100], Default: 14, on log-scale
    n_estimators, Type: UniformInteger, Range: [1, 2000], Default: 45, on log-scale
    n_neighbors, Type: UniformInteger, Range: [1, 100], Default: 50
    regressor, Type: Categorical, Choices: {RandomForest, Linear, AdaBoost, KNeighbors, MLP, SVR, XGBoost}, Default: RandomForest
  Conditions:
    (gamma | kernel == 'rbf' || gamma | kernel == 'poly' || gamma | kernel == 'sigmoid')
    (n_estimators | regressor == 'RandomForest' || n_estimators | regressor == 'AdaBoost')
    C | regressor == 'SVR'
    alpha | regressor == 'MLP'
    kernel | regressor == 'SVR'
    max_depth | regressor == 'RandomForest'
    n_neighbors | regressor == 'KNeighbors'

Create an Evaluator object using the ray backend to distribute the evaluation of the run-function defined previously.

[12]:
from deephyper.evaluator import Evaluator
from deephyper.evaluator.callback import LoggerCallback

evaluator = Evaluator.create(run,
                 method="ray",
                 method_kwargs={
                     "address": None,
                     "num_cpus": 2,
                     "num_cpus_per_task": 1,
                     "callbacks": [LoggerCallback()]
                 })

print("Number of workers: ", evaluator.num_workers)
Number of workers:  1

Tip

You can open the ray-dashboard at an address like http://127.0.0.1:port in a browser to monitor the CPU usage of the execution.

Finally, you can define a Bayesian optimization search called AMBS (for Asynchronous Model-Based Search) and link to it the defined Problem and evaluator.

[13]:
from deephyper.search.hps import AMBS

search = AMBS(problem_autosklearn1, evaluator)
[14]:
results = search.search(10)
[00001] -- best objective: -0.05886 -- received objective: -0.05886
[00002] -- best objective: 0.56451 -- received objective: 0.56451
[00003] -- best objective: 0.56451 -- received objective: 0.44250
(pid=733) [14:49:38] WARNING: /workspace/src/objective/regression_obj.cu:152: reg:linear is now deprecated in favor of reg:squarederror.
[00004] -- best objective: 0.78391 -- received objective: 0.78391
[00005] -- best objective: 0.78391 -- received objective: 0.76766
[00006] -- best objective: 0.78391 -- received objective: 0.57538
[00007] -- best objective: 0.78391 -- received objective: 0.47485
[00008] -- best objective: 0.78391 -- received objective: 0.77684
(pid=733) /usr/local/lib/python3.7/dist-packages/sklearn/neural_network/_multilayer_perceptron.py:696: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
(pid=733)   ConvergenceWarning,
[00009] -- best objective: 0.80735 -- received objective: 0.80735
[00010] -- best objective: 0.80735 -- received objective: 0.71937

Once the search is over, a file named results.csv is saved in the current directory. The same dataframe is returned by the search.search(...) call. It contains the hyperparameters configurations evaluated during the search and their corresponding objective value (i.e, validation accuracy), duration of computation and time of computation with elapsed_sec.

[16]:
results
[16]:
regressor C alpha kernel max_depth n_estimators n_neighbors gamma id objective elapsed_sec duration
0 SVR 0.000317 NaN sigmoid NaN NaN NaN 0.000371 1 -0.058856 26.522054 22.169491
1 AdaBoost NaN NaN NaN NaN 13.0 NaN NaN 2 0.564506 28.500448 0.489093
2 AdaBoost NaN NaN NaN NaN 322.0 NaN NaN 3 0.442501 31.241331 1.240041
3 XGBoost NaN NaN NaN NaN NaN NaN NaN 4 0.783908 33.657469 1.129844
4 MLP NaN 0.074061 NaN NaN NaN NaN NaN 5 0.767663 46.345952 11.166475
5 SVR 1.562227 NaN linear NaN NaN NaN NaN 6 0.575381 72.394516 24.590795
6 SVR 0.008012 NaN rbf NaN NaN NaN 0.012148 7 0.474847 91.398495 17.416007
7 MLP NaN 0.000198 NaN NaN NaN NaN NaN 8 0.776843 104.560521 11.689193
8 RandomForest NaN NaN NaN 29.0 988.0 NaN NaN 9 0.807347 182.046024 75.990584
9 RandomForest NaN NaN NaN 7.0 70.0 NaN NaN 10 0.719367 186.149946 2.577926

The deephyper-analytics command line is a way of analyzing this type of file. For example, we want to output the best configuration we can use the topk functionnality.

[17]:
!deephyper-analytics topk results.csv -k 3
'0': {C: null, alpha: null, duration: 75.9905841351, elapsed_sec: 182.0460243225,
  gamma: null, id: 9, kernel: null, max_depth: 29.0, n_estimators: 988.0, n_neighbors: null,
  objective: 0.8073473164, regressor: RandomForest}
'1': {C: null, alpha: null, duration: 1.1298441887, elapsed_sec: 33.6574685574, gamma: null,
  id: 4, kernel: null, max_depth: null, n_estimators: null, n_neighbors: null, objective: 0.7839078942,
  regressor: XGBoost}
'2': {C: null, alpha: 0.0001976475, duration: 11.6891927719, elapsed_sec: 104.560520649,
  gamma: null, id: 8, kernel: null, max_depth: null, n_estimators: null, n_neighbors: null,
  objective: 0.7768434694, regressor: MLP}