Automated Machine Learning with Scikit-Learn
Contents
5. Automated Machine Learning with Scikit-Learn#
In this tutorial, we will show how to automatically search among different machine learning algorithms from Scikit-Learn. Automated machine learning only requires the user to link the data with a predifined problem and run function that we provide.
Let us start by installing DeepHyper.
[1]:
!pip install deephyper["popt"]
!pip install ray
Requirement already satisfied: deephyper in /usr/local/lib/python3.7/dist-packages (0.3.3)
Requirement already satisfied: networkx in /usr/local/lib/python3.7/dist-packages (from deephyper) (2.6.3)
Requirement already satisfied: tensorflow>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from deephyper) (2.7.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.7/dist-packages (from deephyper) (4.62.3)
Requirement already satisfied: statsmodels in /usr/local/lib/python3.7/dist-packages (from deephyper) (0.10.2)
Requirement already satisfied: scikit-learn>=0.23.1 in /usr/local/lib/python3.7/dist-packages (from deephyper) (1.0.1)
Requirement already satisfied: pydot in /usr/local/lib/python3.7/dist-packages (from deephyper) (1.3.0)
Requirement already satisfied: pandas>=0.24.2 in /usr/local/lib/python3.7/dist-packages (from deephyper) (1.1.5)
Requirement already satisfied: tensorflow-probability in /usr/local/lib/python3.7/dist-packages (from deephyper) (0.14.1)
Requirement already satisfied: ray[default]>=1.3.0 in /usr/local/lib/python3.7/dist-packages (from deephyper) (1.8.0)
Requirement already satisfied: xgboost in /usr/local/lib/python3.7/dist-packages (from deephyper) (0.90)
Requirement already satisfied: matplotlib>=3.0.3 in /usr/local/lib/python3.7/dist-packages (from deephyper) (3.2.2)
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from deephyper) (1.19.5)
Requirement already satisfied: Jinja2 in /usr/local/lib/python3.7/dist-packages (from deephyper) (2.11.3)
Requirement already satisfied: typeguard in /usr/local/lib/python3.7/dist-packages (from deephyper) (2.7.1)
Requirement already satisfied: dh-scikit-optimize==0.9.4 in /usr/local/lib/python3.7/dist-packages (from deephyper) (0.9.4)
Requirement already satisfied: joblib>=0.10.3 in /usr/local/lib/python3.7/dist-packages (from deephyper) (1.1.0)
Requirement already satisfied: openml==0.10.2 in /usr/local/lib/python3.7/dist-packages (from deephyper) (0.10.2)
Requirement already satisfied: ConfigSpace>=0.4.18 in /usr/local/lib/python3.7/dist-packages (from deephyper) (0.4.20)
Requirement already satisfied: pyaml>=16.9 in /usr/local/lib/python3.7/dist-packages (from dh-scikit-optimize==0.9.4->deephyper) (21.10.1)
Requirement already satisfied: scipy>=0.19.1 in /usr/local/lib/python3.7/dist-packages (from dh-scikit-optimize==0.9.4->deephyper) (1.4.1)
Requirement already satisfied: liac-arff>=2.4.0 in /usr/local/lib/python3.7/dist-packages (from openml==0.10.2->deephyper) (2.5.0)
Requirement already satisfied: xmltodict in /usr/local/lib/python3.7/dist-packages (from openml==0.10.2->deephyper) (0.12.0)
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from openml==0.10.2->deephyper) (2.23.0)
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.7/dist-packages (from openml==0.10.2->deephyper) (2.8.2)
Requirement already satisfied: pyparsing in /usr/local/lib/python3.7/dist-packages (from ConfigSpace>=0.4.18->deephyper) (2.4.7)
Requirement already satisfied: cython in /usr/local/lib/python3.7/dist-packages (from ConfigSpace>=0.4.18->deephyper) (0.29.24)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.3->deephyper) (1.3.2)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib>=3.0.3->deephyper) (0.11.0)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas>=0.24.2->deephyper) (2018.9)
Requirement already satisfied: PyYAML in /usr/local/lib/python3.7/dist-packages (from pyaml>=16.9->dh-scikit-optimize==0.9.4->deephyper) (3.13)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil->openml==0.10.2->deephyper) (1.15.0)
Requirement already satisfied: grpcio>=1.28.1 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (1.41.1)
Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (7.1.2)
Requirement already satisfied: protobuf>=3.15.3 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (3.17.3)
Requirement already satisfied: redis>=3.5.0 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (4.0.1)
Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (3.3.2)
Requirement already satisfied: msgpack<2.0.0,>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (1.0.2)
Requirement already satisfied: jsonschema in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (2.6.0)
Requirement already satisfied: attrs in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (21.2.0)
Requirement already satisfied: colorful in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (0.5.4)
Requirement already satisfied: prometheus-client>=0.7.1 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (0.12.0)
Requirement already satisfied: aiohttp>=3.7 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (3.8.1)
Requirement already satisfied: aioredis<2 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (1.3.1)
Requirement already satisfied: aiohttp-cors in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (0.7.0)
Requirement already satisfied: gpustat>=1.0.0b1 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (1.0.0b1)
Requirement already satisfied: py-spy>=0.2.0 in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (0.3.11)
Requirement already satisfied: opencensus in /usr/local/lib/python3.7/dist-packages (from ray[default]>=1.3.0->deephyper) (0.8.0)
Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (1.2.0)
Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (1.7.2)
Requirement already satisfied: typing-extensions>=3.7.4 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (3.10.0.2)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (4.0.1)
Requirement already satisfied: charset-normalizer<3.0,>=2.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (2.0.7)
Requirement already satisfied: asynctest==0.13.0 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (0.13.0)
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (1.2.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.7/dist-packages (from aiohttp>=3.7->ray[default]>=1.3.0->deephyper) (5.2.0)
Requirement already satisfied: hiredis in /usr/local/lib/python3.7/dist-packages (from aioredis<2->ray[default]>=1.3.0->deephyper) (2.0.0)
Requirement already satisfied: blessed>=1.17.1 in /usr/local/lib/python3.7/dist-packages (from gpustat>=1.0.0b1->ray[default]>=1.3.0->deephyper) (1.19.0)
Requirement already satisfied: nvidia-ml-py3>=7.352.0 in /usr/local/lib/python3.7/dist-packages (from gpustat>=1.0.0b1->ray[default]>=1.3.0->deephyper) (7.352.0)
Requirement already satisfied: psutil in /usr/local/lib/python3.7/dist-packages (from gpustat>=1.0.0b1->ray[default]>=1.3.0->deephyper) (5.4.8)
Requirement already satisfied: wcwidth>=0.1.4 in /usr/local/lib/python3.7/dist-packages (from blessed>=1.17.1->gpustat>=1.0.0b1->ray[default]>=1.3.0->deephyper) (0.2.5)
Requirement already satisfied: deprecated in /usr/local/lib/python3.7/dist-packages (from redis>=3.5.0->ray[default]>=1.3.0->deephyper) (1.2.13)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.23.1->deephyper) (3.0.0)
Requirement already satisfied: flatbuffers<3.0,>=1.12 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (2.0)
Requirement already satisfied: keras<2.8,>=2.7.0rc0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (2.7.0)
Requirement already satisfied: wheel<1.0,>=0.32.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (0.37.0)
Requirement already satisfied: libclang>=9.0.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (12.0.0)
Requirement already satisfied: h5py>=2.9.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (3.1.0)
Requirement already satisfied: opt-einsum>=2.3.2 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (3.3.0)
Requirement already satisfied: keras-preprocessing>=1.1.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (1.1.2)
Requirement already satisfied: tensorflow-estimator<2.8,~=2.7.0rc0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (2.7.0)
Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.21.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (0.22.0)
Requirement already satisfied: google-pasta>=0.1.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (0.2.0)
Requirement already satisfied: astunparse>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (1.6.3)
Requirement already satisfied: termcolor>=1.1.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (1.1.0)
Requirement already satisfied: tensorboard~=2.6 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (2.7.0)
Requirement already satisfied: wrapt>=1.11.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (1.13.3)
Requirement already satisfied: absl-py>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (0.12.0)
Requirement already satisfied: gast<0.5.0,>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from tensorflow>=2.0.0->deephyper) (0.4.0)
Requirement already satisfied: cached-property in /usr/local/lib/python3.7/dist-packages (from h5py>=2.9.0->tensorflow>=2.0.0->deephyper) (1.5.2)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (3.3.4)
Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (0.6.1)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (1.8.0)
Requirement already satisfied: werkzeug>=0.11.15 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (1.0.1)
Requirement already satisfied: setuptools>=41.0.0 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (57.4.0)
Requirement already satisfied: google-auth<3,>=1.6.3 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (1.35.0)
Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /usr/local/lib/python3.7/dist-packages (from tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (0.4.6)
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.6.3->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (4.7.2)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.6.3->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (0.2.8)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from google-auth<3,>=1.6.3->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (4.2.4)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.7/dist-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (1.3.0)
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from markdown>=2.6.8->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (4.8.2)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.7/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (0.4.8)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->openml==0.10.2->deephyper) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->openml==0.10.2->deephyper) (2.10)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->openml==0.10.2->deephyper) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->openml==0.10.2->deephyper) (2021.10.8)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (3.1.1)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->markdown>=2.6.8->tensorboard~=2.6->tensorflow>=2.0.0->deephyper) (3.6.0)
Requirement already satisfied: MarkupSafe>=0.23 in /usr/local/lib/python3.7/dist-packages (from Jinja2->deephyper) (2.0.1)
Requirement already satisfied: opencensus-context==0.1.2 in /usr/local/lib/python3.7/dist-packages (from opencensus->ray[default]>=1.3.0->deephyper) (0.1.2)
Requirement already satisfied: google-api-core<3.0.0,>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from opencensus->ray[default]>=1.3.0->deephyper) (1.26.3)
Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]>=1.3.0->deephyper) (1.53.0)
Requirement already satisfied: packaging>=14.3 in /usr/local/lib/python3.7/dist-packages (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[default]>=1.3.0->deephyper) (21.2)
Requirement already satisfied: patsy>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from statsmodels->deephyper) (0.5.2)
Requirement already satisfied: cloudpickle>=1.3 in /usr/local/lib/python3.7/dist-packages (from tensorflow-probability->deephyper) (1.3.0)
Requirement already satisfied: dm-tree in /usr/local/lib/python3.7/dist-packages (from tensorflow-probability->deephyper) (0.1.6)
Requirement already satisfied: decorator in /usr/local/lib/python3.7/dist-packages (from tensorflow-probability->deephyper) (4.4.2)
5.1. Classification#
On this part of the tutorial we focus on the classification case.
Create run
function to train and evaluate the model corresponding to the configuration generated by the search. This function has to return a scalar value (typically, validation accuracy), which will be maximized by the search algorithm. In the case of automated machine learning we use the run
function provided at deephyper.sklearn.classifier.run_autosklearn1
and wrap it with our data such as:
[1]:
from deephyper.sklearn.classifier import run_autosklearn1
def load_data():
from sklearn.datasets import load_breast_cancer
X, y = load_breast_cancer(return_X_y=True)
return X, y
def run(config):
return run_autosklearn1(config, load_data)
/Users/romainegele/miniforge3/envs/dh-arm/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
We are ready to go! But, let us look at the problem provided by DeepHyper in deephyper.sklearn.classifier.problem_autosklearn1
to understand better what is happening under the hood.
[2]:
from deephyper.sklearn.classifier import problem_autosklearn1
problem_autosklearn1
[2]:
Configuration space object:
Hyperparameters:
C, Type: UniformFloat, Range: [1e-05, 10.0], Default: 0.01, on log-scale
alpha, Type: UniformFloat, Range: [1e-05, 10.0], Default: 0.01, on log-scale
classifier, Type: Categorical, Choices: {RandomForest, Logistic, AdaBoost, KNeighbors, MLP, SVC, XGBoost}, Default: RandomForest
gamma, Type: UniformFloat, Range: [1e-05, 10.0], Default: 0.01, on log-scale
kernel, Type: Categorical, Choices: {linear, poly, rbf, sigmoid}, Default: linear
max_depth, Type: UniformInteger, Range: [2, 100], Default: 14, on log-scale
n_estimators, Type: UniformInteger, Range: [1, 2000], Default: 45, on log-scale
n_neighbors, Type: UniformInteger, Range: [1, 100], Default: 50
Conditions:
(C | classifier == 'Logistic' || C | classifier == 'SVC')
(gamma | kernel == 'rbf' || gamma | kernel == 'poly' || gamma | kernel == 'sigmoid')
(n_estimators | classifier == 'RandomForest' || n_estimators | classifier == 'AdaBoost')
alpha | classifier == 'MLP'
kernel | classifier == 'SVC'
max_depth | classifier == 'RandomForest'
n_neighbors | classifier == 'KNeighbors'
Create an Evaluator
object using the ray
backend to distribute the evaluation of the run-function defined previously.
[3]:
from deephyper.evaluator import Evaluator
from deephyper.evaluator.callback import TqdmCallback
evaluator = Evaluator.create(run,
method="ray",
method_kwargs={
"address": None,
"num_cpus": 1,
"num_cpus_per_task": 1,
"callbacks": [TqdmCallback()]
})
print("Number of workers: ", evaluator.num_workers)
/Users/romainegele/Documents/Argonne/deephyper/deephyper/evaluator/_evaluator.py:126: UserWarning: Applying nest-asyncio patch for IPython Shell!
warnings.warn(
2023-01-30 16:04:06,151 INFO worker.py:1518 -- Started a local Ray instance.
Number of workers: 1
Finally, you can define a Bayesian optimization search called CBO
(for Centralized Bayesian Optimization) and link to it the defined problem_autosklearn1
and evaluator
.
[4]:
from deephyper.search.hps import CBO
search = CBO(problem_autosklearn1, evaluator, log_dir="exp-automl-2")
[5]:
results = search.search(100)
(pid=35708) /Users/romainegele/miniforge3/envs/dh-arm/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
(pid=35708) from pandas import MultiIndex, Int64Index
(run pid=35708) /Users/romainegele/miniforge3/envs/dh-arm/lib/python3.9/site-packages/sklearn/neighbors/_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning.
(run pid=35708) mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
Once the search is over, a file named results.csv
is saved in the current directory. The same dataframe is returned by the search.search(...)
call. It contains the hyperparameters configurations evaluated during the search and their corresponding objective
value (i.e, validation accuracy), timestamp_submit
the time when the evaluator submitted the configuration to be evaluated and timestamp_gather
the time when the evaluator received the configuration once evaluated (both are
relative times with respect to the creation of the Evaluator
instance).
[6]:
results
[6]:
p:classifier | p:C | p:alpha | p:kernel | p:max_depth | p:n_estimators | p:n_neighbors | p:gamma | objective | job_id | m:timestamp_submit | m:timestamp_gather | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Logistic | 0.000986 | NaN | NaN | NaN | NaN | NaN | NaN | 0.893617 | 0 | 1.847845 | 4.197699 |
1 | KNeighbors | NaN | NaN | NaN | NaN | NaN | 41.0 | NaN | 0.946809 | 1 | 4.348262 | 4.359891 |
2 | RandomForest | NaN | NaN | NaN | 48.0 | 51.0 | NaN | NaN | 0.957447 | 2 | 4.588290 | 4.632172 |
3 | Logistic | 0.000341 | NaN | NaN | NaN | NaN | NaN | NaN | 0.819149 | 3 | 4.764461 | 4.772226 |
4 | SVC | 0.000063 | NaN | linear | NaN | NaN | NaN | NaN | 0.643617 | 4 | 4.907278 | 4.917221 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
95 | MLP | NaN | 1.782067 | NaN | NaN | NaN | NaN | NaN | 0.989362 | 95 | 37.920646 | 38.037480 |
96 | MLP | NaN | 1.742599 | NaN | NaN | NaN | NaN | NaN | 0.989362 | 96 | 38.265214 | 38.381637 |
97 | MLP | NaN | 1.769931 | NaN | NaN | NaN | NaN | NaN | 0.989362 | 97 | 38.609994 | 38.726015 |
98 | MLP | NaN | 2.019310 | NaN | NaN | NaN | NaN | NaN | 0.989362 | 98 | 39.031003 | 39.148036 |
99 | MLP | NaN | 1.862691 | NaN | NaN | NaN | NaN | NaN | 0.989362 | 99 | 39.378585 | 39.494249 |
100 rows × 12 columns
Now that we have the full list of results we can display the top-3.
[7]:
results.nlargest(n=3, columns="objective")
[7]:
p:classifier | p:C | p:alpha | p:kernel | p:max_depth | p:n_estimators | p:n_neighbors | p:gamma | objective | job_id | m:timestamp_submit | m:timestamp_gather | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | MLP | NaN | 3.685755 | NaN | NaN | NaN | NaN | NaN | 0.989362 | 10 | 6.240106 | 6.363712 |
12 | MLP | NaN | 3.717318 | NaN | NaN | NaN | NaN | NaN | 0.989362 | 12 | 7.013312 | 7.129871 |
13 | MLP | NaN | 2.902145 | NaN | NaN | NaN | NaN | NaN | 0.989362 | 13 | 7.361486 | 7.477184 |
5.2. Regression#
On this part of the tutorial we focus on the regression case.
Create run
function to train and evaluate the model corresponding to the configuration generated by the search. This function has to return a scalar value (typically, validation \(R^2\)), which will be maximized by the search algorithm. In the case of automated machine learning we use the run
-function provided at deephyper.sklearn.regressor.run_autosklearn1
and wrap it with our data such as:
[8]:
from deephyper.sklearn.regressor import run_autosklearn1
def load_data():
from sklearn.datasets import fetch_california_housing
X, y = fetch_california_housing(return_X_y=True)
return X, y
def run(config):
return run_autosklearn1(config, load_data)
We are ready to go! But, let us look at the problem provided by DeepHyper to understand better what is happening under the hood.
[9]:
from deephyper.sklearn.regressor import problem_autosklearn1
problem_autosklearn1
[9]:
Configuration space object:
Hyperparameters:
C, Type: UniformFloat, Range: [1e-05, 10.0], Default: 0.01, on log-scale
alpha, Type: UniformFloat, Range: [1e-05, 10.0], Default: 0.01, on log-scale
gamma, Type: UniformFloat, Range: [1e-05, 10.0], Default: 0.01, on log-scale
kernel, Type: Categorical, Choices: {linear, poly, rbf, sigmoid}, Default: linear
max_depth, Type: UniformInteger, Range: [2, 100], Default: 14, on log-scale
n_estimators, Type: UniformInteger, Range: [1, 2000], Default: 45, on log-scale
n_neighbors, Type: UniformInteger, Range: [1, 100], Default: 50
regressor, Type: Categorical, Choices: {RandomForest, Linear, AdaBoost, KNeighbors, MLP, SVR, XGBoost}, Default: RandomForest
Conditions:
(gamma | kernel == 'rbf' || gamma | kernel == 'poly' || gamma | kernel == 'sigmoid')
(n_estimators | regressor == 'RandomForest' || n_estimators | regressor == 'AdaBoost')
C | regressor == 'SVR'
alpha | regressor == 'MLP'
kernel | regressor == 'SVR'
max_depth | regressor == 'RandomForest'
n_neighbors | regressor == 'KNeighbors'
Create an Evaluator
object using the ray
backend to distribute the evaluation of the run-function defined previously.
[10]:
from deephyper.evaluator import Evaluator
from deephyper.evaluator.callback import TqdmCallback
evaluator = Evaluator.create(run,
method="ray",
method_kwargs={
"address": None,
"num_cpus": 1,
"num_cpus_per_task": 1,
"callbacks": [TqdmCallback()]
})
print("Number of workers: ", evaluator.num_workers)
Number of workers: 1
Finally, you can define a Bayesian optimization search called CBO
(for Centralized Bayesian Optimization) and link to it the defined Problem
and evaluator
.
[11]:
from deephyper.search.hps import CBO
search = CBO(problem_autosklearn1, evaluator)
[12]:
results = search.search(10)
Once the search is over, a file named results.csv
is saved in the current directory. The same dataframe is returned by the search.search(...)
call. It contains the hyperparameters configurations evaluated during the search and their corresponding objective
value (i.e, validation \(R^2\)), timestamp_submit
the time when the evaluator submitted the configuration to be evaluated and timestamp_gather
the time when the evaluator received the configuration once evaluated (both
are relative times with respect to the creation of the Evaluator
instance).
[13]:
results
[13]:
p:regressor | p:C | p:alpha | p:kernel | p:max_depth | p:n_estimators | p:n_neighbors | p:gamma | objective | job_id | m:timestamp_submit | m:timestamp_gather | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Linear | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.597049 | 0 | 0.135866 | 0.154057 |
1 | KNeighbors | NaN | NaN | NaN | NaN | NaN | 41.0 | NaN | 0.666496 | 1 | 0.363004 | 0.664028 |
2 | RandomForest | NaN | NaN | NaN | 48.0 | 51.0 | NaN | NaN | 0.802510 | 2 | 0.788540 | 3.497786 |
3 | RandomForest | NaN | NaN | NaN | 7.0 | 245.0 | NaN | NaN | 0.719056 | 3 | 3.625416 | 10.249726 |
4 | SVR | 0.000063 | NaN | linear | NaN | NaN | NaN | NaN | 0.322115 | 4 | 10.450442 | 13.565911 |
5 | SVR | 0.000016 | NaN | sigmoid | NaN | NaN | NaN | 0.004180 | -0.059354 | 5 | 13.690964 | 17.727409 |
6 | SVR | 0.422234 | NaN | sigmoid | NaN | NaN | NaN | 2.779419 | -321050.500503 | 6 | 17.852672 | 24.922506 |
7 | RandomForest | NaN | NaN | NaN | 91.0 | 15.0 | NaN | NaN | 0.796552 | 7 | 25.120666 | 25.918885 |
8 | MLP | NaN | 1.350762 | NaN | NaN | NaN | NaN | NaN | 0.708333 | 8 | 26.042982 | 28.260683 |
9 | MLP | NaN | 0.033863 | NaN | NaN | NaN | NaN | NaN | 0.771833 | 9 | 28.383097 | 31.670986 |
Now that we have the full list of results we can display the top-3.
[14]:
results.nlargest(n=3, columns="objective")
[14]:
p:regressor | p:C | p:alpha | p:kernel | p:max_depth | p:n_estimators | p:n_neighbors | p:gamma | objective | job_id | m:timestamp_submit | m:timestamp_gather | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | RandomForest | NaN | NaN | NaN | 48.0 | 51.0 | NaN | NaN | 0.802510 | 2 | 0.788540 | 3.497786 |
7 | RandomForest | NaN | NaN | NaN | 91.0 | 15.0 | NaN | NaN | 0.796552 | 7 | 25.120666 | 25.918885 |
9 | MLP | NaN | 0.033863 | NaN | NaN | NaN | NaN | NaN | 0.771833 | 9 | 28.383097 | 31.670986 |