5. Automated Machine Learning with Scikit-Learn#

Open In Colab

In this tutorial, we will show how to automatically search among different machine learning algorithms from Scikit-Learn. Automated machine learning only requires the user to link the data with a predifined problem and run function that we provide.

Let us start by installing DeepHyper.

[1]:
!pip install deephyper["popt"]
!pip install ray
Requirement already satisfied: deephyper in /usr/local/lib/python3.7/dist-packages (0.3.3)
Requirement already satisfied: networkx in /usr/local/lib/python3.7/dist-packages (from deephyper) (2.6.3)
Requirement already satisfied: tensorflow>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from deephyper) (2.7.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from deephyper) (4.62.3)
Requirement already satisfied: statsmodels in /usr/local/lib/python3.7/dist-packages (from deephyper) (0.10.2)
Requirement already satisfied: scikit-learn>=0.23.1 in /usr/local/lib/python3.7/dist-packages (from deephyper) (1.0.1)
Requirement already satisfied: pydot in /usr/local/lib/python3.7/dist-packages (from deephyper) (1.3.0)
Requirement already satisfied: pandas>=0.24.2 in /usr/local/lib/python3.7/dist-packages (from deephyper) (1.1.5)
Requirement already satisfied: tensorflow-probability in /usr/local/lib/python3.7/dist-packages (from deephyper) (0.14.1)
Requirement already satisfied: ray[default]>=1.3.0 in /usr/local/lib/python3.7/dist-packages (from deephyper) (1.8.0)
Requirement already satisfied: xgboost in /usr/local/lib/python3.7/dist-packages (from deephyper) (0.90)
Requirement already satisfied: matplotlib>=3.0.3 in /usr/local/lib/python3.7/dist-packages (from deephyper) (3.2.2)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from deephyper) (1.19.5)
Requirement already satisfied: Jinja2 in /usr/local/lib/python3.7/dist-packages (from deephyper) (2.11.3)
Requirement already satisfied: typeguard in /usr/local/lib/python3.7/dist-packages (from deephyper) (2.7.1)
Requirement already satisfied: dh-scikit-optimize==0.9.4 in /usr/local/lib/python3.7/dist-packages (from deephyper) (0.9.4)
Requirement already satisfied: joblib>=0.10.3 in /usr/local/lib/python3.7/dist-packages (from deephyper) (1.1.0)
Requirement already satisfied: openml==0.10.2 in /usr/local/lib/python3.7/dist-packages (from deephyper) (0.10.2)
Requirement already satisfied: ConfigSpace>=0.4.18 in /usr/local/lib/python3.7/dist-packages (from deephyper) (0.4.20)
Requirement already satisfied: pyaml>=16.9 in /usr/local/lib/python3.7/dist-packages (from dh-scikit-optimize==0.9.4->deephyper) (21.10.1)
Requirement already satisfied: scipy>=0.19.1 in /usr/local/lib/python3.7/dist-packages (from dh-scikit-optimize==0.9.4->deephyper) (1.4.1)
Requirement already satisfied: liac-arff>=2.4.0 in /usr/local/lib/python3.7/dist-packages (from openml==0.10.2->deephyper) (2.5.0)
Requirement already satisfied: xmltodict in /usr/local/lib/python3.7/dist-packages (from openml==0.10.2->deephyper) (0.12.0)
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from openml==0.10.2->deephyper) (2.23.0)
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.7/dist-packages (from openml==0.10.2->deephyper) (2.8.2)
Requirement already satisfied: pyparsing in /usr/local/lib/python3.7/dist-packages (from ConfigSpace>=0.4.18->deephyper) (2.4.7)
Requirement already satisfied: cython in /usr/local/lib/python3.7/dist-packages (from ConfigSpace>=0.4.18->deephyper) (0.29.24)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.3->deephyper) (1.3.2)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.3->deephyper) (0.11.0)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.24.2->deephyper) (2018.9)
Requirement already satisfied: PyYAML in /usr/local/lib/python3.7/dist-packages (from pyaml>=16.9->dh-scikit-optimize==0.9.4->deephyper) (3.13)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil->openml==0.10.2->deephyper) (1.15.0)
Requirement already satisfied: grpcio>=1.28.1 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (1.41.1)
Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (7.1.2)
Requirement already satisfied: protobuf>=3.15.3 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (3.17.3)
Requirement already satisfied: redis>=3.5.0 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (4.0.1)
Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (3.3.2)
Requirement already satisfied: msgpack<2.0.0,>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (1.0.2)
Requirement already satisfied: jsonschema in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (2.6.0)
Requirement already satisfied: attrs in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (21.2.0)
Requirement already satisfied: colorful in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (0.5.4)
Requirement already satisfied: prometheus-client>=0.7.1 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (0.12.0)
Requirement already satisfied: aiohttp>=3.7 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (3.8.1)
Requirement already satisfied: aioredis<2 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (1.3.1)
Requirement already satisfied: aiohttp-cors in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (0.7.0)
Requirement already satisfied: gpustat>=1.0.0b1 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (1.0.0b1)
Requirement already satisfied: py-spy>=0.2.0 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (0.3.11)
Requirement already satisfied: opencensus in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (0.8.0)
Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (1.2.0)
Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (1.7.2)
Requirement already satisfied: typing-extensions>=3.7.4 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (3.10.0.2)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (4.0.1)
Requirement already satisfied: charset-normalizer<3.0,>=2.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (2.0.7)
Requirement already satisfied: asynctest==0.13.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (0.13.0)
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (1.2.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (5.2.0)
Requirement already satisfied: hiredis in /usr/local/lib/python3.7/dist-packages (from aioredis<2->ray[default]>=1.3.0->deephyper) (2.0.0)
Requirement already satisfied: blessed>=1.17.1 in /usr/local/lib/python3.7/dist-packages (from gpustat>=1.0.0b1->ray[default]>=1.3.0->deephyper) (1.19.0)
Requirement already satisfied: nvidia-ml-py3>=7.352.0 in /usr/local/lib/python3.7/dist-packages (from gpustat>=1.0.0b1->ray[default]>=1.3.0->deephyper) (7.352.0)
Requirement already satisfied: psutil in /usr/local/lib/python3.7/dist-packages (from gpustat>=1.0.0b1->ray[default]>=1.3.0->deephyper) (5.4.8)
Requirement already satisfied: wcwidth>=0.1.4 in /usr/local/lib/python3.7/dist-packages (from blessed>=1.17.1->gpustat>=1.0.0b1->ray[default]>=1.3.0->deephyper) (0.2.5)
Requirement already satisfied: deprecated in /usr/local/lib/python3.7/dist-packages (from redis>=3.5.0->ray[default]>=1.3.0->deephyper) (1.2.13)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.23.1->deephyper) (3.0.0)
Requirement already satisfied: flatbuffers<3.0,>=1.12 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (2.0)
Requirement already satisfied: keras<2.8,>=2.7.0rc0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (2.7.0)
Requirement already satisfied: wheel<1.0,>=0.32.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (0.37.0)
Requirement already satisfied: libclang>=9.0.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (12.0.0)
Requirement already satisfied: h5py>=2.9.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (3.1.0)
Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (3.3.0)
Requirement already satisfied: keras-preprocessing>=1.1.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (1.1.2)
Requirement already satisfied: tensorflow-estimator<2.8,~=2.7.0rc0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (2.7.0)
Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.21.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (0.22.0)
Requirement already satisfied: google-pasta>=0.1.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (0.2.0)
Requirement already satisfied: astunparse>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (1.6.3)
Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (1.1.0)
Requirement already satisfied: tensorboard~=2.6 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (2.7.0)
Requirement already satisfied: wrapt>=1.11.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (1.13.3)
Requirement already satisfied: absl-py>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (0.12.0)
Requirement already satisfied: gast<0.5.0,>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (0.4.0)
Requirement already satisfied: cached-property in /usr/local/lib/python3.7/dist-packages (from h5py>=2.9.0->tensorflow>=2.0.0->deephyper) (1.5.2)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (3.3.4)
Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (0.6.1)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (1.8.0)
Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (1.0.1)
Requirement already satisfied: setuptools>=41.0.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (57.4.0)
Requirement already satisfied: google-auth<3,>=1.6.3 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (1.35.0)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (0.4.6)
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.6.3->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (4.7.2)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.6.3->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (0.2.8)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.6.3->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (4.2.4)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.7/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (1.3.0)
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from markdown>=2.6.8->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (4.8.2)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.7/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (0.4.8)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->openml==0.10.2->deephyper) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->openml==0.10.2->deephyper) (2.10)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->openml==0.10.2->deephyper) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->openml==0.10.2->deephyper) (2021.10.8)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (3.1.1)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->markdown>=2.6.8->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (3.6.0)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.7/dist-packages (from Jinja2->deephyper) (2.0.1)
Requirement already satisfied: opencensus-context==0.1.2 in /usr/local/lib/python3.7/dist-packages (from opencensus->ray[default]>=1.3.0->deephyper) (0.1.2)
Requirement already satisfied: google-api-core<3.0.0,>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from opencensus->ray[default]>=1.3.0->deephyper) (1.26.3)
Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]>=1.3.0->deephyper) (1.53.0)
Requirement already satisfied: packaging>=14.3 in /usr/local/lib/python3.7/dist-packages (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]>=1.3.0->deephyper) (21.2)
Requirement already satisfied: patsy>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from statsmodels->deephyper) (0.5.2)
Requirement already satisfied: cloudpickle>=1.3 in /usr/local/lib/python3.7/dist-packages (from tensorflow-probability->deephyper) (1.3.0)
Requirement already satisfied: dm-tree in /usr/local/lib/python3.7/dist-packages (from tensorflow-probability->deephyper) (0.1.6)
Requirement already satisfied: decorator in /usr/local/lib/python3.7/dist-packages (from tensorflow-probability->deephyper) (4.4.2)

5.1. Classification#

On this part of the tutorial we focus on the classification case.

Create run function to train and evaluate the model corresponding to the configuration generated by the search. This function has to return a scalar value (typically, validation accuracy), which will be maximized by the search algorithm. In the case of automated machine learning we use the run function provided at deephyper.sklearn.classifier.run_autosklearn1 and wrap it with our data such as:

[1]:
from deephyper.sklearn.classifier import run_autosklearn1


def load_data():
    from sklearn.datasets import load_breast_cancer

    X, y = load_breast_cancer(return_X_y=True)

    return X, y


def run(config):
    return run_autosklearn1(config, load_data)

We are ready to go! But, let us look at the problem provided by DeepHyper in deephyper.sklearn.classifier.problem_autosklearn1 to understand better what is happening under the hood.

[2]:
from deephyper.sklearn.classifier import problem_autosklearn1

problem_autosklearn1
[2]:
Configuration space object:
  Hyperparameters:
    C, Type: UniformFloat, Range: [1e-05, 10.0], Default: 0.01, on log-scale
    alpha, Type: UniformFloat, Range: [1e-05, 10.0], Default: 0.01, on log-scale
    classifier, Type: Categorical, Choices: {RandomForest, Logistic, AdaBoost, KNeighbors, MLP, SVC, XGBoost}, Default: RandomForest
    gamma, Type: UniformFloat, Range: [1e-05, 10.0], Default: 0.01, on log-scale
    kernel, Type: Categorical, Choices: {linear, poly, rbf, sigmoid}, Default: linear
    max_depth, Type: UniformInteger, Range: [2, 100], Default: 14, on log-scale
    n_estimators, Type: UniformInteger, Range: [1, 2000], Default: 45, on log-scale
    n_neighbors, Type: UniformInteger, Range: [1, 100], Default: 50
  Conditions:
    (C | classifier == 'Logistic' || C | classifier == 'SVC')
    (gamma | kernel == 'rbf' || gamma | kernel == 'poly' || gamma | kernel == 'sigmoid')
    (n_estimators | classifier == 'RandomForest' || n_estimators | classifier == 'AdaBoost')
    alpha | classifier == 'MLP'
    kernel | classifier == 'SVC'
    max_depth | classifier == 'RandomForest'
    n_neighbors | classifier == 'KNeighbors'

Create an Evaluator object using the ray backend to distribute the evaluation of the run-function defined previously.

[3]:
from deephyper.evaluator import Evaluator
from deephyper.evaluator.callback import TqdmCallback

evaluator = Evaluator.create(run,
                 method="ray",
                 method_kwargs={
                     "address": None,
                     "num_cpus": 1,
                     "num_cpus_per_task": 1,
                     "callbacks": [TqdmCallback()]
                 })

print("Number of workers: ", evaluator.num_workers)
Number of workers:  1

Finally, you can define a Bayesian optimization search called CBO (for Centralized Bayesian Optimization) and link to it the defined problem_autosklearn1 and evaluator.

[4]:
from deephyper.search.hps import CBO

search = CBO(problem_autosklearn1, evaluator, log_dir="exp-automl-2")
[5]:
results = search.search(100)
(pid=35708) /Users/romainegele/miniforge3/envs/dh-arm/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
(pid=35708)   from pandas import MultiIndex, Int64Index
(run pid=35708) /Users/romainegele/miniforge3/envs/dh-arm/lib/python3.9/site-packages/sklearn/neighbors/_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning.
(run pid=35708)   mode, _ = stats.mode(_y[neigh_ind, k], axis=1)

Once the search is over, a file named results.csv is saved in the current directory. The same dataframe is returned by the search.search(...) call. It contains the hyperparameters configurations evaluated during the search and their corresponding objective value (i.e, validation accuracy), timestamp_submit the time when the evaluator submitted the configuration to be evaluated and timestamp_gather the time when the evaluator received the configuration once evaluated (both are relative times with respect to the creation of the Evaluator instance).

[6]:
results
[6]:
p:classifier p:C p:alpha p:kernel p:max_depth p:n_estimators p:n_neighbors p:gamma objective job_id m:timestamp_submit m:timestamp_gather
0 Logistic 0.000986 NaN NaN NaN NaN NaN NaN 0.893617 0 1.847845 4.197699
1 KNeighbors NaN NaN NaN NaN NaN 41.0 NaN 0.946809 1 4.348262 4.359891
2 RandomForest NaN NaN NaN 48.0 51.0 NaN NaN 0.957447 2 4.588290 4.632172
3 Logistic 0.000341 NaN NaN NaN NaN NaN NaN 0.819149 3 4.764461 4.772226
4 SVC 0.000063 NaN linear NaN NaN NaN NaN 0.643617 4 4.907278 4.917221
... ... ... ... ... ... ... ... ... ... ... ... ...
95 MLP NaN 1.782067 NaN NaN NaN NaN NaN 0.989362 95 37.920646 38.037480
96 MLP NaN 1.742599 NaN NaN NaN NaN NaN 0.989362 96 38.265214 38.381637
97 MLP NaN 1.769931 NaN NaN NaN NaN NaN 0.989362 97 38.609994 38.726015
98 MLP NaN 2.019310 NaN NaN NaN NaN NaN 0.989362 98 39.031003 39.148036
99 MLP NaN 1.862691 NaN NaN NaN NaN NaN 0.989362 99 39.378585 39.494249

100 rows × 12 columns

Now that we have the full list of results we can display the top-3.

[7]:
results.nlargest(n=3, columns="objective")
[7]:
p:classifier p:C p:alpha p:kernel p:max_depth p:n_estimators p:n_neighbors p:gamma objective job_id m:timestamp_submit m:timestamp_gather
10 MLP NaN 3.685755 NaN NaN NaN NaN NaN 0.989362 10 6.240106 6.363712
12 MLP NaN 3.717318 NaN NaN NaN NaN NaN 0.989362 12 7.013312 7.129871
13 MLP NaN 2.902145 NaN NaN NaN NaN NaN 0.989362 13 7.361486 7.477184

5.2. Regression#

On this part of the tutorial we focus on the regression case.

Create run function to train and evaluate the model corresponding to the configuration generated by the search. This function has to return a scalar value (typically, validation \(R^2\)), which will be maximized by the search algorithm. In the case of automated machine learning we use the run-function provided at deephyper.sklearn.regressor.run_autosklearn1 and wrap it with our data such as:

[8]:
from deephyper.sklearn.regressor import run_autosklearn1


def load_data():
    from sklearn.datasets import fetch_california_housing

    X, y = fetch_california_housing(return_X_y=True)
    return X, y


def run(config):
    return run_autosklearn1(config, load_data)

We are ready to go! But, let us look at the problem provided by DeepHyper to understand better what is happening under the hood.

[9]:
from deephyper.sklearn.regressor import problem_autosklearn1

problem_autosklearn1
[9]:
Configuration space object:
  Hyperparameters:
    C, Type: UniformFloat, Range: [1e-05, 10.0], Default: 0.01, on log-scale
    alpha, Type: UniformFloat, Range: [1e-05, 10.0], Default: 0.01, on log-scale
    gamma, Type: UniformFloat, Range: [1e-05, 10.0], Default: 0.01, on log-scale
    kernel, Type: Categorical, Choices: {linear, poly, rbf, sigmoid}, Default: linear
    max_depth, Type: UniformInteger, Range: [2, 100], Default: 14, on log-scale
    n_estimators, Type: UniformInteger, Range: [1, 2000], Default: 45, on log-scale
    n_neighbors, Type: UniformInteger, Range: [1, 100], Default: 50
    regressor, Type: Categorical, Choices: {RandomForest, Linear, AdaBoost, KNeighbors, MLP, SVR, XGBoost}, Default: RandomForest
  Conditions:
    (gamma | kernel == 'rbf' || gamma | kernel == 'poly' || gamma | kernel == 'sigmoid')
    (n_estimators | regressor == 'RandomForest' || n_estimators | regressor == 'AdaBoost')
    C | regressor == 'SVR'
    alpha | regressor == 'MLP'
    kernel | regressor == 'SVR'
    max_depth | regressor == 'RandomForest'
    n_neighbors | regressor == 'KNeighbors'

Create an Evaluator object using the ray backend to distribute the evaluation of the run-function defined previously.

[10]:
from deephyper.evaluator import Evaluator
from deephyper.evaluator.callback import TqdmCallback

evaluator = Evaluator.create(run,
                 method="ray",
                 method_kwargs={
                     "address": None,
                     "num_cpus": 1,
                     "num_cpus_per_task": 1,
                     "callbacks": [TqdmCallback()]
                 })

print("Number of workers: ", evaluator.num_workers)
Number of workers:  1

Finally, you can define a Bayesian optimization search called CBO (for Centralized Bayesian Optimization) and link to it the defined Problem and evaluator.

[11]:
from deephyper.search.hps import CBO

search = CBO(problem_autosklearn1, evaluator)
[12]:
results = search.search(10)

Once the search is over, a file named results.csv is saved in the current directory. The same dataframe is returned by the search.search(...) call. It contains the hyperparameters configurations evaluated during the search and their corresponding objective value (i.e, validation \(R^2\)), timestamp_submit the time when the evaluator submitted the configuration to be evaluated and timestamp_gather the time when the evaluator received the configuration once evaluated (both are relative times with respect to the creation of the Evaluator instance).

[13]:
results
[13]:
p:regressor p:C p:alpha p:kernel p:max_depth p:n_estimators p:n_neighbors p:gamma objective job_id m:timestamp_submit m:timestamp_gather
0 Linear NaN NaN NaN NaN NaN NaN NaN 0.597049 0 0.135866 0.154057
1 KNeighbors NaN NaN NaN NaN NaN 41.0 NaN 0.666496 1 0.363004 0.664028
2 RandomForest NaN NaN NaN 48.0 51.0 NaN NaN 0.802510 2 0.788540 3.497786
3 RandomForest NaN NaN NaN 7.0 245.0 NaN NaN 0.719056 3 3.625416 10.249726
4 SVR 0.000063 NaN linear NaN NaN NaN NaN 0.322115 4 10.450442 13.565911
5 SVR 0.000016 NaN sigmoid NaN NaN NaN 0.004180 -0.059354 5 13.690964 17.727409
6 SVR 0.422234 NaN sigmoid NaN NaN NaN 2.779419 -321050.500503 6 17.852672 24.922506
7 RandomForest NaN NaN NaN 91.0 15.0 NaN NaN 0.796552 7 25.120666 25.918885
8 MLP NaN 1.350762 NaN NaN NaN NaN NaN 0.708333 8 26.042982 28.260683
9 MLP NaN 0.033863 NaN NaN NaN NaN NaN 0.771833 9 28.383097 31.670986

Now that we have the full list of results we can display the top-3.

[14]:
results.nlargest(n=3, columns="objective")
[14]:
p:regressor p:C p:alpha p:kernel p:max_depth p:n_estimators p:n_neighbors p:gamma objective job_id m:timestamp_submit m:timestamp_gather
2 RandomForest NaN NaN NaN 48.0 51.0 NaN NaN 0.802510 2 0.788540 3.497786
7 RandomForest NaN NaN NaN 91.0 15.0 NaN NaN 0.796552 7 25.120666 25.918885
9 MLP NaN 0.033863 NaN NaN NaN NaN NaN 0.771833 9 28.383097 31.670986