2. Hyperparameter search for classification with Tabular data (Keras)#

Open In Colab

In this tutorial we present how to use hyperparameter optimization on a basic example from the Keras documentation.

Reference: This tutorial is based on materials from the Keras Documentation: Structured data classification from scratch

Let us start with installing DeepHyper!

Warning

This tutorial should be run with tensorflow>=2.6.

[1]:
try:
    import deephyper
    print(deephyper.__version__)
except (ImportError, ModuleNotFoundError):
    !pip install deephyper

try:
    import ray
except (ImportError, ModuleNotFoundError):
    !pip install ray
0.6.0

Note

The following environment variables can be used to avoid the logging of some Tensorflow DEBUG, INFO and WARNING statements.

[2]:
import os


os.environ["TF_CPP_MIN_LOG_LEVEL"] = str(3)
os.environ["AUTOGRAPH_VERBOSITY"] = str(0)

2.1. Imports#

Warning

It is important to follow the import strategy import tensorflow as tf to prevent serialization errors that will crash the search.

The import strategy from the original Keras tutorial (shown below):

from tensorflow import keras
from tensorflow.keras import layers
...
from tensorflow.keras.layers import IntegerLookup
from tensorflow.keras.layers import Normalization
from tensorflow.keras.layers import StringLookup

resulted in non-serializable data, preventing the search from executing in parallel.

[3]:
import json

import numpy as np
import pandas as pd
import tensorflow as tf
import tensorflow.keras.backend as K

Note

The following can be used to detect if GPU devices are available on the current host. Therefore, this notebook will automatically adapt the parallel execution based on the resources available locally. However, this simple code will not detect the ressources from multiple nodes.

[4]:
from tensorflow.python.client import device_lib


def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos if x.device_type == "GPU"]


n_gpus = len(get_available_gpus())
if n_gpus > 1:
    n_gpus -= 1

is_gpu_available = n_gpus > 0

if is_gpu_available:
    print(f"{n_gpus} GPU{'s are' if n_gpus > 1 else ' is'} available.")
else:
    print("No GPU available")
No GPU available

2.2. The dataset (from Keras.io)#

The dataset is provided by the Cleveland Clinic Foundation for Heart Disease. It’s a CSV file with 303 rows. Each row contains information about a patient (a sample), and each column describes an attribute of the patient (a feature). We use the features to predict whether a patient has a heart disease (binary classification).

Here’s the description of each feature:

Column

Description

Feature Type

Age

Age in years

Numerical

Sex

(1 = male; 0 = female)

Categorical

CP

Chest pain type (0, 1, 2, 3, 4)

Categorical

Trestbpd

Resting blood pressure (in mm Hg on admission)

Numerical

Chol

Serum cholesterol in mg/dl

Numerical

FBS

fasting blood sugar in 120 mg/dl (1 = true; 0 = false)

Categorical

RestECG

Resting electrocardiogram results (0, 1, 2)

Categorical

Thalach

Maximum heart rate achieved

Numerical

Exang

Exercise induced angina (1 = yes; 0 = no)

Categorical

Oldpeak

ST depression induced by exercise relative to rest

Numerical

Slope

Slope of the peak exercise ST segment

Numerical

CA

Number of major vessels (0-3) colored by fluoroscopy

Both numerical & categorical

Thal

3 = normal; 6 = fixed defect; 7 = reversible defect

Categorical

Target

Diagnosis of heart disease (1 = true; 0 = false)

Target

[5]:
def load_data():
    file_url = "http://storage.googleapis.com/download.tensorflow.org/data/heart.csv"
    dataframe = pd.read_csv(file_url)

    val_dataframe = dataframe.sample(frac=0.2, random_state=1337)
    train_dataframe = dataframe.drop(val_dataframe.index)

    return train_dataframe, val_dataframe


def dataframe_to_dataset(dataframe):
    dataframe = dataframe.copy()
    labels = dataframe.pop("target")
    ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
    ds = ds.shuffle(buffer_size=len(dataframe))
    return ds

2.3. Preprocessing & encoding of features#

The next cells use tf.keras.layers.Normalization() to apply standard scaling on the features.

Then, the tf.keras.layers.StringLookup and tf.keras.layers.IntegerLookup are used to encode categorical variables.

[6]:
def encode_numerical_feature(feature, name, dataset):
    # Create a Normalization layer for our feature
    normalizer = tf.keras.layers.Normalization()

    # Prepare a Dataset that only yields our feature
    feature_ds = dataset.map(lambda x, y: x[name])
    feature_ds = feature_ds.map(lambda x: tf.expand_dims(x, -1))

    # Learn the statistics of the data
    normalizer.adapt(feature_ds)

    # Normalize the input feature
    encoded_feature = normalizer(feature)
    return encoded_feature


def encode_categorical_feature(feature, name, dataset, is_string):
    lookup_class = (
        tf.keras.layers.StringLookup if is_string else tf.keras.layers.IntegerLookup
    )
    # Create a lookup layer which will turn strings into integer indices
    lookup = lookup_class(output_mode="binary")

    # Prepare a Dataset that only yields our feature
    feature_ds = dataset.map(lambda x, y: x[name])
    feature_ds = feature_ds.map(lambda x: tf.expand_dims(x, -1))

    # Learn the set of possible string values and assign them a fixed integer index
    lookup.adapt(feature_ds)

    # Turn the string input into integer indices
    encoded_feature = lookup(feature)
    return encoded_feature

2.4. Define the run-function#

The run-function defines how the objective that we want to maximize is computed. It takes a config dictionary as input and often returns a scalar value that we want to maximize. The config contains a sample value of hyperparameters that we want to tune. In this example we will search for:

  • units (default value: 32)

  • activation (default value: "relu")

  • dropout_rate (default value: 0.5)

  • num_epochs (default value: 50)

  • batch_size (default value: 32)

  • learning_rate (default value: 1e-3)

A hyperparameter value can be acessed easily in the dictionary through the corresponding key, for example config["units"].

[7]:
def count_params(model: tf.keras.Model) -> dict:
    """Evaluate the number of parameters of a Keras model.

    Args:
        model (tf.keras.Model): a Keras model.

    Returns:
        dict: a dictionary with the number of trainable ``"num_parameters_train"`` and
        non-trainable parameters ``"num_parameters"``.
    """

    def count_or_null(p):
        try:
            return K.count_params(p)
        except:
            return 0

    num_parameters_train = int(
        np.sum([count_or_null(p) for p in model.trainable_weights])
    )
    num_parameters = int(
        np.sum([count_or_null(p) for p in model.non_trainable_weights])
    )
    return {
        "num_parameters": num_parameters,
        "num_parameters_train": num_parameters_train,
    }
[8]:
def run(config: dict):
    tf.autograph.set_verbosity(0)
    # Load data and split into validation set
    train_dataframe, val_dataframe = load_data()
    train_ds = dataframe_to_dataset(train_dataframe)
    val_ds = dataframe_to_dataset(val_dataframe)
    train_ds = train_ds.batch(config["batch_size"])
    val_ds = val_ds.batch(config["batch_size"])

    # Categorical features encoded as integers
    sex = tf.keras.Input(shape=(1,), name="sex", dtype="int64")
    cp = tf.keras.Input(shape=(1,), name="cp", dtype="int64")
    fbs = tf.keras.Input(shape=(1,), name="fbs", dtype="int64")
    restecg = tf.keras.Input(shape=(1,), name="restecg", dtype="int64")
    exang = tf.keras.Input(shape=(1,), name="exang", dtype="int64")
    ca = tf.keras.Input(shape=(1,), name="ca", dtype="int64")

    # Categorical feature encoded as string
    thal = tf.keras.Input(shape=(1,), name="thal", dtype="string")

    # Numerical features
    age = tf.keras.Input(shape=(1,), name="age")
    trestbps = tf.keras.Input(shape=(1,), name="trestbps")
    chol = tf.keras.Input(shape=(1,), name="chol")
    thalach = tf.keras.Input(shape=(1,), name="thalach")
    oldpeak = tf.keras.Input(shape=(1,), name="oldpeak")
    slope = tf.keras.Input(shape=(1,), name="slope")

    all_inputs = [
        sex,
        cp,
        fbs,
        restecg,
        exang,
        ca,
        thal,
        age,
        trestbps,
        chol,
        thalach,
        oldpeak,
        slope,
    ]

    # Integer categorical features
    sex_encoded = encode_categorical_feature(sex, "sex", train_ds, False)
    cp_encoded = encode_categorical_feature(cp, "cp", train_ds, False)
    fbs_encoded = encode_categorical_feature(fbs, "fbs", train_ds, False)
    restecg_encoded = encode_categorical_feature(restecg, "restecg", train_ds, False)
    exang_encoded = encode_categorical_feature(exang, "exang", train_ds, False)
    ca_encoded = encode_categorical_feature(ca, "ca", train_ds, False)

    # String categorical features
    thal_encoded = encode_categorical_feature(thal, "thal", train_ds, True)

    # Numerical features
    age_encoded = encode_numerical_feature(age, "age", train_ds)
    trestbps_encoded = encode_numerical_feature(trestbps, "trestbps", train_ds)
    chol_encoded = encode_numerical_feature(chol, "chol", train_ds)
    thalach_encoded = encode_numerical_feature(thalach, "thalach", train_ds)
    oldpeak_encoded = encode_numerical_feature(oldpeak, "oldpeak", train_ds)
    slope_encoded = encode_numerical_feature(slope, "slope", train_ds)

    all_features = tf.keras.layers.concatenate(
        [
            sex_encoded,
            cp_encoded,
            fbs_encoded,
            restecg_encoded,
            exang_encoded,
            slope_encoded,
            ca_encoded,
            thal_encoded,
            age_encoded,
            trestbps_encoded,
            chol_encoded,
            thalach_encoded,
            oldpeak_encoded,
        ]
    )
    x = tf.keras.layers.Dense(config["units"], activation=config["activation"])(
        all_features
    )
    x = tf.keras.layers.Dropout(config["dropout_rate"])(x)
    output = tf.keras.layers.Dense(1, activation="sigmoid")(x)
    model = tf.keras.Model(all_inputs, output)

    optimizer = tf.keras.optimizers.Adam(learning_rate=config["learning_rate"])
    model.compile(optimizer, "binary_crossentropy", metrics=["accuracy"])

    history = model.fit(
        train_ds, epochs=config["num_epochs"], validation_data=val_ds, verbose=0
    )

    objective = history.history["val_accuracy"][-1]
    metadata = {
        "loss": history.history["loss"],
        "val_loss": history.history["val_loss"],
        "accuracy": history.history["accuracy"],
        "val_accuracy": history.history["val_accuracy"],
    }
    metadata = {k:json.dumps(v) for k,v in metadata.items()}
    metadata.update(count_params(model))

    return {"objective": objective, "metadata": metadata}
Note

The objective maximised by DeepHyper is the scalar value returned by the run-function.

In this tutorial it corresponds to the validation accuracy of the last epoch of training which we retrieve in the History object returned by the model.fit(...) call.

...
history = model.fit(
    train_ds, epochs=config["num_epochs"], validation_data=val_ds, verbose=0
)
return history.history["val_accuracy"][-1]
...

Using an objective like max(history.history['val_accuracy']) can have undesired side effects.

For example, it is possible that the training curves will overshoot a local maximum, resulting in a model without the capacity to flexibly adapt to new data in the future.

2.5. Define the Hyperparameter optimization problem#

Hyperparameter ranges are defined using the following syntax:

  • Discrete integer ranges are generated from a tuple (lower: int, upper: int)

  • Continuous prarameters are generated from a tuple (lower: float, upper: float)

  • Categorical or nonordinal hyperparameter ranges can be given as a list of possible values [val1, val2, ...]

[9]:
from deephyper.problem import HpProblem


# Creation of an hyperparameter problem
problem = HpProblem()

# Discrete hyperparameter (sampled with uniform prior)
problem.add_hyperparameter((8, 128), "units", default_value=32)
problem.add_hyperparameter((10, 100), "num_epochs", default_value=50)


# Categorical hyperparameter (sampled with uniform prior)
ACTIVATIONS = [
    "elu", "gelu", "hard_sigmoid", "linear", "relu", "selu",
    "sigmoid", "softplus", "softsign", "swish", "tanh",
]
problem.add_hyperparameter(ACTIVATIONS, "activation", default_value="relu")


# Real hyperparameter (sampled with uniform prior)
problem.add_hyperparameter((0.0, 0.6), "dropout_rate", default_value=0.5)


# Discrete and Real hyperparameters (sampled with log-uniform)
problem.add_hyperparameter((8, 256, "log-uniform"), "batch_size", default_value=32)
problem.add_hyperparameter((1e-5, 1e-2, "log-uniform"), "learning_rate", default_value=1e-3)

problem
[9]:
Configuration space object:
  Hyperparameters:
    activation, Type: Categorical, Choices: {elu, gelu, hard_sigmoid, linear, relu, selu, sigmoid, softplus, softsign, swish, tanh}, Default: relu
    batch_size, Type: UniformInteger, Range: [8, 256], Default: 32, on log-scale
    dropout_rate, Type: UniformFloat, Range: [0.0, 0.6], Default: 0.5
    learning_rate, Type: UniformFloat, Range: [1e-05, 0.01], Default: 0.001, on log-scale
    num_epochs, Type: UniformInteger, Range: [10, 100], Default: 50
    units, Type: UniformInteger, Range: [8, 128], Default: 32

2.6. Evaluate a default configuration#

We evaluate the performance of the default set of hyperparameters provided in the Keras tutorial.

[10]:
import ray


# We launch the Ray run-time depending of the detected local ressources
# and execute the `run` function with the default configuration
# WARNING: in the case of GPUs it is important to follow this scheme
# to avoid multiple processes (Ray workers vs current process) to lock
# the same GPU.
if is_gpu_available:
    if not(ray.is_initialized()):
        ray.init(num_cpus=n_gpus, num_gpus=n_gpus, log_to_driver=False)

    run_default = ray.remote(num_cpus=1, num_gpus=1)(run)
    out = ray.get(run_default.remote(problem.default_configuration))
else:
    if not(ray.is_initialized()):
        ray.init(num_cpus=1, log_to_driver=False)
    run_default = run
    out = run_default(problem.default_configuration)

objective_default = out["objective"]
metadata_default = out["metadata"]

print(f"Accuracy Default Configuration:  {objective_default:.3f}")

print("Metadata Default Configuration")
for k,v in out["metadata"].items():
    print(f"\t- {k}: {v}")
2023-08-21 15:42:47,612 INFO worker.py:1612 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265 
WARNING:absl:At this time, the v2.11+ optimizer `tf.keras.optimizers.Adam` runs slowly on M1/M2 Macs, please use the legacy Keras optimizer instead, located at `tf.keras.optimizers.legacy.Adam`.
WARNING:absl:There is a known slowdown when using v2.11+ Keras optimizers on M1/M2 Macs. Falling back to the legacy Keras optimizer, i.e., `tf.keras.optimizers.legacy.Adam`.
Accuracy Default Configuration:  0.820
Metadata Default Configuration
        - loss: [0.7349893450737, 0.7119344472885132, 0.6686969995498657, 0.5915975570678711, 0.5431504845619202, 0.5246548056602478, 0.5342034697532654, 0.4842718839645386, 0.4590907692909241, 0.4603484272956848, 0.4597550332546234, 0.44408705830574036, 0.42013710737228394, 0.42205527424812317, 0.4288969337940216, 0.39178749918937683, 0.4030506908893585, 0.3936038613319397, 0.35945186018943787, 0.37975654006004333, 0.3720925748348236, 0.3669050633907318, 0.38736027479171753, 0.374258428812027, 0.3758077025413513, 0.3406431972980499, 0.3294287323951721, 0.33325862884521484, 0.31861767172813416, 0.35496509075164795, 0.3087666630744934, 0.312923789024353, 0.30307966470718384, 0.32532209157943726, 0.3240409791469574, 0.3106668293476105, 0.31075918674468994, 0.31123799085617065, 0.31748679280281067, 0.3370455205440521, 0.30306127667427063, 0.30984699726104736, 0.2797146141529083, 0.29132482409477234, 0.3014650344848633, 0.32005640864372253, 0.2903410494327545, 0.28734707832336426, 0.28279826045036316, 0.2570137679576874]
        - val_loss: [0.7456178665161133, 0.6780385375022888, 0.625544548034668, 0.5850293636322021, 0.5513894557952881, 0.5260460376739502, 0.5064079165458679, 0.4892745316028595, 0.4756909906864166, 0.4652484059333801, 0.4565599262714386, 0.44933125376701355, 0.44344136118888855, 0.4388517737388611, 0.4375801086425781, 0.4364493191242218, 0.432796835899353, 0.43143969774246216, 0.42970556020736694, 0.4274940490722656, 0.42608213424682617, 0.42625632882118225, 0.42683979868888855, 0.42844000458717346, 0.42911961674690247, 0.4305785000324249, 0.43083715438842773, 0.43108829855918884, 0.4328072667121887, 0.4343793988227844, 0.4345102906227112, 0.43473729491233826, 0.4349192678928375, 0.4341476559638977, 0.43471893668174744, 0.4353316128253937, 0.4341714680194855, 0.4331118166446686, 0.4324823021888733, 0.4330752193927765, 0.432032972574234, 0.4320930242538452, 0.433055579662323, 0.43358126282691956, 0.4329754710197449, 0.4332190752029419, 0.4322751760482788, 0.43252894282341003, 0.43424201011657715, 0.4347805976867676]
        - accuracy: [0.4917355477809906, 0.5991735458374023, 0.6280992031097412, 0.7066115736961365, 0.7851239442825317, 0.7190082669258118, 0.7520661354064941, 0.7727272510528564, 0.8099173307418823, 0.8099173307418823, 0.8140496015548706, 0.8264462947845459, 0.78925621509552, 0.8057851195335388, 0.8140496015548706, 0.85537189245224, 0.85537189245224, 0.8264462947845459, 0.8347107172012329, 0.8429751992225647, 0.8388429880142212, 0.85537189245224, 0.8264462947845459, 0.8388429880142212, 0.8099173307418823, 0.85537189245224, 0.85537189245224, 0.8719007968902588, 0.8760330677032471, 0.8471074104309082, 0.8842975497245789, 0.8636363744735718, 0.8677685856819153, 0.8801652789115906, 0.8719007968902588, 0.8677685856819153, 0.8801652789115906, 0.8719007968902588, 0.8636363744735718, 0.8636363744735718, 0.8801652789115906, 0.8677685856819153, 0.8925619721412659, 0.8801652789115906, 0.8801652789115906, 0.8760330677032471, 0.8842975497245789, 0.8636363744735718, 0.8966942429542542, 0.9008264541625977]
        - val_accuracy: [0.49180328845977783, 0.6557376980781555, 0.7377049326896667, 0.7868852615356445, 0.7868852615356445, 0.8032786846160889, 0.8032786846160889, 0.7868852615356445, 0.7868852615356445, 0.7868852615356445, 0.7868852615356445, 0.7868852615356445, 0.8032786846160889, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8032786846160889, 0.8032786846160889, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8032786846160889, 0.8032786846160889, 0.8196721076965332, 0.8196721076965332, 0.8032786846160889, 0.8032786846160889, 0.8032786846160889, 0.8032786846160889, 0.8196721076965332, 0.8196721076965332]
        - num_parameters: 18
        - num_parameters_train: 1217

2.7. Define the evaluator object#

The Evaluator object allows to change the parallelization backend used by DeepHyper.
It is a standalone object which schedules the execution of remote tasks. All evaluators needs a run_function to be instantiated.
Then a keyword method defines the backend (e.g., "ray") and the method_kwargs corresponds to keyword arguments of this chosen method.
evaluator = Evaluator.create(run_function, method, method_kwargs)

Once created the evaluator.num_workers gives access to the number of available parallel workers.

Finally, to submit and collect tasks to the evaluator one just needs to use the following interface:

configs = [{"units": 8, ...}, ...]
evaluator.submit(configs)
...
# To collect the first finished task (asynchronous)
tasks_done = evaluator.get("BATCH", size=1)

# To collect all of the pending tasks (synchronous)
tasks_done = evaluator.get("ALL")

Warning

Each Evaluator saves its own state, therefore it is crucial to create a new evaluator when launching a fresh search.

[11]:
from deephyper.evaluator import Evaluator
from deephyper.evaluator.callback import TqdmCallback


def get_evaluator(run_function):
    # Default arguments for Ray: 1 worker and 1 worker per evaluation
    method_kwargs = {
        "num_cpus": 1,
        "num_cpus_per_task": 1,
        "callbacks": [TqdmCallback()]
    }

    # If GPU devices are detected then it will create 'n_gpus' workers
    # and use 1 worker for each evaluation
    if is_gpu_available:
        method_kwargs["num_cpus"] = n_gpus
        method_kwargs["num_gpus"] = n_gpus
        method_kwargs["num_cpus_per_task"] = 1
        method_kwargs["num_gpus_per_task"] = 1

    evaluator = Evaluator.create(
        run_function,
        method="ray",
        method_kwargs=method_kwargs
    )
    print(f"Created new evaluator with {evaluator.num_workers} worker{'s' if evaluator.num_workers > 1 else ''} and config: {method_kwargs}", )

    return evaluator

evaluator_1 = get_evaluator(run)
Created new evaluator with 1 worker and config: {'num_cpus': 1, 'num_cpus_per_task': 1, 'callbacks': [<deephyper.evaluator.callback.TqdmCallback object at 0x36baafb10>]}
/Users/romainegele/Documents/Argonne/deephyper/deephyper/evaluator/_evaluator.py:127: UserWarning: Applying nest-asyncio patch for IPython Shell!
  warnings.warn(

2.8. Define and run the centralized Bayesian optimization search (CBO)#

A primary pillar of hyperparameter search in DeepHyper is given by a centralized Bayesian optimization search (henceforth CBO). CBO may be described in the following algorithm:

f89b6376645c4bc5817624df8ce50280


Following the parallelized evaluation of these configurations, a low-fidelity and high efficiency model (henceforth “the surrogate”) is devised to reproduce the relationship between the input variables involved in the model (i.e., the choice of hyperparameters) and the outputs (which are generally a measure of validation data accuracy).

After obtaining this surrogate of the validation accuracy, we may utilize ideas from classical methods in Bayesian optimization literature for adaptively sample the search space of hyperparameters.

First, the surrogate is used to obtain an estimate for the mean value of the validation accuracy at a certain sampling location \(x\) in addition to an estimated variance. The latter requirement restricts us to the use of high efficiency data-driven modeling strategies that have inbuilt variance estimates (such as a Gaussian process or Random Forest regressor).

Regions where the mean is high represent opportunities for exploitation and regions where the variance is high represent opportunities for exploration. An optimistic acquisition function called UCB can be constructed using these two quantities:

\[L_{\text{UCB}}(x) = \mu(x) + \kappa \cdot \sigma(x)\]

The unevaluated hyperparameter configurations that maximize the acquisition function are chosen for the next batch of evaluations.

Note that the choice of the variance weighting parameter \(\kappa\) controls the degree of exploration in the hyperparameter search with zero indicating purely exploitation (unseen configurations where the predicted accuracy is highest will be sampled).

The top s configurations are selected for the new batch. The following schematic demonstrates this process:

a4e7492225c84a578e26a91a59ea04b2

The process of obtaining s configurations relies on the “constant-liar” strategy where a sampled configuration is mapped to a dummy output given by a bulk metric of all the evaluated configurations thus far (such as the maximum, mean or median validation accuracy).

Prior to sampling the next configuration by acquisition function maximization, the surrogate is retrained with the dummy output as a data point. As the true validation accuracy becomes available for one of the sampled configurations, the dummy output is replaced and the surrogate is updated.

This allows for scalable asynchronous (or batch synchronous) sampling of new hyperparameter configurations.

2.8.1. Choice of surrogate model#

Users should note that our choice of the surrogate is given by the Random Forest regressor due to its ability to handle non-ordinal data (hyperparameter configurations may not be purely continuous or even numerical). Evidence for how they outperform other methods (such as Gaussian processes) is also available in [1]

210846b1f79a4685b147c1bf37c358a6

36fa3ae19098434eb9b1675e1d1f5e4d

2.8.1.1. Setup CBO#

We create the CBO using the problem and evaluator defined above.

[12]:
from deephyper.search.hps import CBO
# Uncomment the following line to show the arguments of CBO.
# help(CBO)
[13]:
# Instanciate the search with the problem and the evaluator that we created before
search = CBO(problem, evaluator_1, initial_points=[problem.default_configuration])

Note

All DeepHyper’s search algorithm have two stopping criteria:

  • max_evals (int): Defines the maximum number of evaluations that we want to perform. Default to -1 for an infinite number.

  • timeout (int): Defines a time budget (in seconds) before stopping the search. Default to None for an infinite time budget.

[14]:
results = search.search(max_evals=25)

Warning

The search call does not output any information about the current status of the search. However, a results.csv file is created in the local directly and can be visualized to see finished tasks.

The returned results is a Pandas Dataframe where columns starting by "p:" are hyperparameters, columns starting by "m:" are additional metadata (from the user or from the Evaluator) as well as the objective value and the job_id:

  • job_id is a unique identifier corresponding to the order of creation of tasks.

  • objective is the value returned by the run-function.

  • m:timestamp_submit is the time (in seconds) when the task was created by the evaluator since the creation of the evaluator.

  • m:timestamp_gather is the time (in seconds) when the task was received after finishing by the evaluator since the creation of the evaluator.

[15]:
results
[15]:
p:activation p:batch_size p:dropout_rate p:learning_rate p:num_epochs p:units objective job_id m:timestamp_submit m:timestamp_gather m:loss m:val_loss m:accuracy m:val_accuracy m:num_parameters m:num_parameters_train
0 relu 32 0.500000 0.001000 50 32 0.819672 0 14.062015 18.644915 [0.932179868221283, 0.8672291040420532, 0.7659... [0.8616852760314941, 0.7722538113594055, 0.698... [0.3099173605442047, 0.35537189245224, 0.45041... [0.32786884903907776, 0.39344263076782227, 0.5... 18 1217
1 tanh 34 0.416012 0.000123 32 30 0.770492 1 18.689840 20.543970 [0.7169904708862305, 0.7176796197891235, 0.672... [0.6555299758911133, 0.6463007926940918, 0.637... [0.5785123705863953, 0.6198347210884094, 0.661... [0.6065573692321777, 0.6065573692321777, 0.606... 18 1141
2 selu 109 0.253249 0.008099 64 21 0.786885 2 20.569868 22.421012 [0.7877746224403381, 0.48364654183387756, 0.45... [0.5014718770980835, 0.41901537775993347, 0.39... [0.5330578684806824, 0.7520661354064941, 0.785... [0.7704917788505554, 0.7868852615356445, 0.786... 18 799
3 sigmoid 12 0.427878 0.001731 22 78 0.836066 3 22.447220 24.454956 [0.6061558723449707, 0.550484836101532, 0.4878... [0.4615059792995453, 0.41609662771224976, 0.39... [0.6694214940071106, 0.6942148804664612, 0.772... [0.7868852615356445, 0.7868852615356445, 0.770... 18 2965
4 sigmoid 184 0.563842 0.000918 44 16 0.770492 4 24.481104 26.153962 [0.7118028402328491, 0.6913579702377319, 0.675... [0.6253404021263123, 0.6192606687545776, 0.613... [0.56611567735672, 0.5826446413993835, 0.58264... [0.7704917788505554, 0.7868852615356445, 0.786... 18 609
5 sigmoid 22 0.392557 0.009368 41 53 0.786885 5 26.179926 28.258018 [0.6151015758514404, 0.4353742301464081, 0.357... [0.40750277042388916, 0.3809265196323395, 0.37... [0.6528925895690918, 0.7933884263038635, 0.863... [0.7868852615356445, 0.8360655903816223, 0.836... 18 2015
6 hard_sigmoid 132 0.339039 0.000391 80 116 0.803279 6 28.283772 30.233132 [1.384872317314148, 1.377575397491455, 1.33753... [1.4819616079330444, 1.4416449069976807, 1.401... [0.2851239740848541, 0.2851239740848541, 0.280... [0.2295081913471222, 0.2295081913471222, 0.229... 18 4409
7 relu 25 0.305810 0.000866 26 78 0.803279 7 30.259100 32.121436 [0.7404576539993286, 0.6233335137367249, 0.549... [0.6288543939590454, 0.5288269519805908, 0.462... [0.5, 0.6859503984451294, 0.702479362487793, 0... [0.6065573692321777, 0.7868852615356445, 0.852... 18 2965
8 softplus 103 0.454461 0.000099 97 82 0.754098 8 32.147675 34.291445 [0.6858380436897278, 0.6767894625663757, 0.682... [0.5281822085380554, 0.5256621837615967, 0.523... [0.6735537052154541, 0.6776859760284424, 0.677... [0.7704917788505554, 0.7704917788505554, 0.770... 18 3117
9 softsign 23 0.538048 0.003561 22 9 0.819672 9 34.317075 36.144872 [0.6336663961410522, 0.5650404095649719, 0.526... [0.5643037557601929, 0.4906170070171356, 0.446... [0.64462810754776, 0.7190082669258118, 0.73966... [0.7868852615356445, 0.7868852615356445, 0.819... 18 343
10 linear 9 0.483396 0.000016 22 91 0.672131 10 36.253704 38.300674 [0.8400824666023254, 0.8625382781028748, 0.812... [0.8302125930786133, 0.8189805746078491, 0.807... [0.39256197214126587, 0.41322314739227295, 0.4... [0.3606557250022888, 0.39344263076782227, 0.40... 18 3459
11 hard_sigmoid 9 0.435904 0.001394 19 85 0.852459 11 38.542972 40.685555 [0.7858327627182007, 0.580945611000061, 0.5286... [0.5257574319839478, 0.4599587619304657, 0.428... [0.4586776793003082, 0.6859503984451294, 0.776... [0.7704917788505554, 0.7868852615356445, 0.786... 18 3231
12 hard_sigmoid 9 0.436711 0.000341 22 71 0.770492 12 40.800419 42.788335 [0.6493225693702698, 0.6555594205856323, 0.620... [0.5575143694877625, 0.5404179096221924, 0.523... [0.6735537052154541, 0.6776859760284424, 0.702... [0.7704917788505554, 0.7704917788505554, 0.770... 18 2699
13 gelu 9 0.443914 0.004792 12 101 0.803279 13 42.902055 44.841357 [0.48601630330085754, 0.3251952826976776, 0.29... [0.36614474654197693, 0.4151676297187805, 0.40... [0.7520661354064941, 0.8595041036605835, 0.867... [0.8524590134620667, 0.8524590134620667, 0.836... 18 3839
14 linear 11 0.459591 0.000904 11 83 0.836066 14 44.955593 46.634954 [0.6354092955589294, 0.481924831867218, 0.3865... [0.4674466848373413, 0.386948823928833, 0.3739... [0.6115702390670776, 0.7479338645935059, 0.847... [0.7868852615356445, 0.7868852615356445, 0.819... 18 3155
15 linear 9 0.441282 0.000665 16 115 0.819672 15 46.749318 48.612520 [0.7145628333091736, 0.504156231880188, 0.3972... [0.5077534317970276, 0.3964807093143463, 0.371... [0.5206611752510071, 0.7355371713638306, 0.826... [0.7868852615356445, 0.8196721076965332, 0.819... 18 4371
16 hard_sigmoid 9 0.590323 0.001896 22 92 0.852459 16 48.727985 50.911089 [0.6284109354019165, 0.5719208121299744, 0.497... [0.4557706415653229, 0.41930386424064636, 0.39... [0.6404958963394165, 0.7190082669258118, 0.747... [0.7704917788505554, 0.8360655903816223, 0.770... 18 3497
17 hard_sigmoid 8 0.579865 0.001491 11 72 0.803279 17 51.027019 52.910879 [0.6137444376945496, 0.5901528000831604, 0.506... [0.47075191140174866, 0.4412596821784973, 0.41... [0.702479362487793, 0.7148760557174683, 0.7314... [0.7704917788505554, 0.7868852615356445, 0.786... 18 2737
18 gelu 9 0.108936 0.001583 21 92 0.786885 18 53.027879 55.028541 [0.5272975564002991, 0.3731013238430023, 0.328... [0.40461263060569763, 0.3767029941082001, 0.38... [0.7148760557174683, 0.8305785059928894, 0.851... [0.7868852615356445, 0.8360655903816223, 0.836... 18 3497
19 tanh 9 0.595015 0.001919 30 127 0.803279 19 55.145704 57.361813 [0.5116466283798218, 0.35454192757606506, 0.33... [0.36360660195350647, 0.3957943916320801, 0.40... [0.7479338645935059, 0.8181818127632141, 0.842... [0.8032786846160889, 0.8524590134620667, 0.836... 18 4827
20 hard_sigmoid 10 0.596673 0.002164 74 80 0.819672 20 57.478000 60.821340 [0.7859671115875244, 0.5692944526672363, 0.531... [0.4949828088283539, 0.44248247146606445, 0.40... [0.5041322112083435, 0.7231404781341553, 0.735... [0.7704917788505554, 0.7868852615356445, 0.770... 18 3041
21 hard_sigmoid 9 0.181801 0.001526 19 95 0.836066 21 60.939166 63.002136 [0.6060517430305481, 0.535541832447052, 0.4644... [0.48688802123069763, 0.4431024193763733, 0.41... [0.6528925895690918, 0.7190082669258118, 0.785... [0.7704917788505554, 0.7868852615356445, 0.786... 18 3611
22 hard_sigmoid 10 0.550896 0.001073 26 85 0.836066 22 63.120133 65.200209 [0.7356556057929993, 0.645238995552063, 0.5400... [0.5445694923400879, 0.4956490695476532, 0.466... [0.5206611752510071, 0.6570248007774353, 0.719... [0.7704917788505554, 0.7704917788505554, 0.770... 18 3231
23 hard_sigmoid 8 0.478580 0.001344 22 128 0.836066 23 65.413067 67.628386 [0.6741029024124146, 0.5203006267547607, 0.471... [0.46032172441482544, 0.41817986965179443, 0.3... [0.6033057570457458, 0.7479338645935059, 0.764... [0.7704917788505554, 0.8032786846160889, 0.770... 18 4865
24 linear 9 0.390427 0.001105 18 60 0.819672 24 67.749137 69.669918 [0.6414411664009094, 0.44960853457450867, 0.37... [0.4244444668292999, 0.3778190612792969, 0.385... [0.6115702390670776, 0.7851239442825317, 0.822... [0.8196721076965332, 0.8032786846160889, 0.852... 18 2281

The search can be continued without any issue.

[16]:
results = search.search(max_evals=5)

results
[16]:
p:activation p:batch_size p:dropout_rate p:learning_rate p:num_epochs p:units objective job_id m:timestamp_submit m:timestamp_gather m:loss m:val_loss m:accuracy m:val_accuracy m:num_parameters m:num_parameters_train
0 relu 32 0.500000 0.001000 50 32 0.819672 0 14.062015 18.644915 [0.932179868221283, 0.8672291040420532, 0.7659... [0.8616852760314941, 0.7722538113594055, 0.698... [0.3099173605442047, 0.35537189245224, 0.45041... [0.32786884903907776, 0.39344263076782227, 0.5... 18 1217
1 tanh 34 0.416012 0.000123 32 30 0.770492 1 18.689840 20.543970 [0.7169904708862305, 0.7176796197891235, 0.672... [0.6555299758911133, 0.6463007926940918, 0.637... [0.5785123705863953, 0.6198347210884094, 0.661... [0.6065573692321777, 0.6065573692321777, 0.606... 18 1141
2 selu 109 0.253249 0.008099 64 21 0.786885 2 20.569868 22.421012 [0.7877746224403381, 0.48364654183387756, 0.45... [0.5014718770980835, 0.41901537775993347, 0.39... [0.5330578684806824, 0.7520661354064941, 0.785... [0.7704917788505554, 0.7868852615356445, 0.786... 18 799
3 sigmoid 12 0.427878 0.001731 22 78 0.836066 3 22.447220 24.454956 [0.6061558723449707, 0.550484836101532, 0.4878... [0.4615059792995453, 0.41609662771224976, 0.39... [0.6694214940071106, 0.6942148804664612, 0.772... [0.7868852615356445, 0.7868852615356445, 0.770... 18 2965
4 sigmoid 184 0.563842 0.000918 44 16 0.770492 4 24.481104 26.153962 [0.7118028402328491, 0.6913579702377319, 0.675... [0.6253404021263123, 0.6192606687545776, 0.613... [0.56611567735672, 0.5826446413993835, 0.58264... [0.7704917788505554, 0.7868852615356445, 0.786... 18 609
5 sigmoid 22 0.392557 0.009368 41 53 0.786885 5 26.179926 28.258018 [0.6151015758514404, 0.4353742301464081, 0.357... [0.40750277042388916, 0.3809265196323395, 0.37... [0.6528925895690918, 0.7933884263038635, 0.863... [0.7868852615356445, 0.8360655903816223, 0.836... 18 2015
6 hard_sigmoid 132 0.339039 0.000391 80 116 0.803279 6 28.283772 30.233132 [1.384872317314148, 1.377575397491455, 1.33753... [1.4819616079330444, 1.4416449069976807, 1.401... [0.2851239740848541, 0.2851239740848541, 0.280... [0.2295081913471222, 0.2295081913471222, 0.229... 18 4409
7 relu 25 0.305810 0.000866 26 78 0.803279 7 30.259100 32.121436 [0.7404576539993286, 0.6233335137367249, 0.549... [0.6288543939590454, 0.5288269519805908, 0.462... [0.5, 0.6859503984451294, 0.702479362487793, 0... [0.6065573692321777, 0.7868852615356445, 0.852... 18 2965
8 softplus 103 0.454461 0.000099 97 82 0.754098 8 32.147675 34.291445 [0.6858380436897278, 0.6767894625663757, 0.682... [0.5281822085380554, 0.5256621837615967, 0.523... [0.6735537052154541, 0.6776859760284424, 0.677... [0.7704917788505554, 0.7704917788505554, 0.770... 18 3117
9 softsign 23 0.538048 0.003561 22 9 0.819672 9 34.317075 36.144872 [0.6336663961410522, 0.5650404095649719, 0.526... [0.5643037557601929, 0.4906170070171356, 0.446... [0.64462810754776, 0.7190082669258118, 0.73966... [0.7868852615356445, 0.7868852615356445, 0.819... 18 343
10 linear 9 0.483396 0.000016 22 91 0.672131 10 36.253704 38.300674 [0.8400824666023254, 0.8625382781028748, 0.812... [0.8302125930786133, 0.8189805746078491, 0.807... [0.39256197214126587, 0.41322314739227295, 0.4... [0.3606557250022888, 0.39344263076782227, 0.40... 18 3459
11 hard_sigmoid 9 0.435904 0.001394 19 85 0.852459 11 38.542972 40.685555 [0.7858327627182007, 0.580945611000061, 0.5286... [0.5257574319839478, 0.4599587619304657, 0.428... [0.4586776793003082, 0.6859503984451294, 0.776... [0.7704917788505554, 0.7868852615356445, 0.786... 18 3231
12 hard_sigmoid 9 0.436711 0.000341 22 71 0.770492 12 40.800419 42.788335 [0.6493225693702698, 0.6555594205856323, 0.620... [0.5575143694877625, 0.5404179096221924, 0.523... [0.6735537052154541, 0.6776859760284424, 0.702... [0.7704917788505554, 0.7704917788505554, 0.770... 18 2699
13 gelu 9 0.443914 0.004792 12 101 0.803279 13 42.902055 44.841357 [0.48601630330085754, 0.3251952826976776, 0.29... [0.36614474654197693, 0.4151676297187805, 0.40... [0.7520661354064941, 0.8595041036605835, 0.867... [0.8524590134620667, 0.8524590134620667, 0.836... 18 3839
14 linear 11 0.459591 0.000904 11 83 0.836066 14 44.955593 46.634954 [0.6354092955589294, 0.481924831867218, 0.3865... [0.4674466848373413, 0.386948823928833, 0.3739... [0.6115702390670776, 0.7479338645935059, 0.847... [0.7868852615356445, 0.7868852615356445, 0.819... 18 3155
15 linear 9 0.441282 0.000665 16 115 0.819672 15 46.749318 48.612520 [0.7145628333091736, 0.504156231880188, 0.3972... [0.5077534317970276, 0.3964807093143463, 0.371... [0.5206611752510071, 0.7355371713638306, 0.826... [0.7868852615356445, 0.8196721076965332, 0.819... 18 4371
16 hard_sigmoid 9 0.590323 0.001896 22 92 0.852459 16 48.727985 50.911089 [0.6284109354019165, 0.5719208121299744, 0.497... [0.4557706415653229, 0.41930386424064636, 0.39... [0.6404958963394165, 0.7190082669258118, 0.747... [0.7704917788505554, 0.8360655903816223, 0.770... 18 3497
17 hard_sigmoid 8 0.579865 0.001491 11 72 0.803279 17 51.027019 52.910879 [0.6137444376945496, 0.5901528000831604, 0.506... [0.47075191140174866, 0.4412596821784973, 0.41... [0.702479362487793, 0.7148760557174683, 0.7314... [0.7704917788505554, 0.7868852615356445, 0.786... 18 2737
18 gelu 9 0.108936 0.001583 21 92 0.786885 18 53.027879 55.028541 [0.5272975564002991, 0.3731013238430023, 0.328... [0.40461263060569763, 0.3767029941082001, 0.38... [0.7148760557174683, 0.8305785059928894, 0.851... [0.7868852615356445, 0.8360655903816223, 0.836... 18 3497
19 tanh 9 0.595015 0.001919 30 127 0.803279 19 55.145704 57.361813 [0.5116466283798218, 0.35454192757606506, 0.33... [0.36360660195350647, 0.3957943916320801, 0.40... [0.7479338645935059, 0.8181818127632141, 0.842... [0.8032786846160889, 0.8524590134620667, 0.836... 18 4827
20 hard_sigmoid 10 0.596673 0.002164 74 80 0.819672 20 57.478000 60.821340 [0.7859671115875244, 0.5692944526672363, 0.531... [0.4949828088283539, 0.44248247146606445, 0.40... [0.5041322112083435, 0.7231404781341553, 0.735... [0.7704917788505554, 0.7868852615356445, 0.770... 18 3041
21 hard_sigmoid 9 0.181801 0.001526 19 95 0.836066 21 60.939166 63.002136 [0.6060517430305481, 0.535541832447052, 0.4644... [0.48688802123069763, 0.4431024193763733, 0.41... [0.6528925895690918, 0.7190082669258118, 0.785... [0.7704917788505554, 0.7868852615356445, 0.786... 18 3611
22 hard_sigmoid 10 0.550896 0.001073 26 85 0.836066 22 63.120133 65.200209 [0.7356556057929993, 0.645238995552063, 0.5400... [0.5445694923400879, 0.4956490695476532, 0.466... [0.5206611752510071, 0.6570248007774353, 0.719... [0.7704917788505554, 0.7704917788505554, 0.770... 18 3231
23 hard_sigmoid 8 0.478580 0.001344 22 128 0.836066 23 65.413067 67.628386 [0.6741029024124146, 0.5203006267547607, 0.471... [0.46032172441482544, 0.41817986965179443, 0.3... [0.6033057570457458, 0.7479338645935059, 0.764... [0.7704917788505554, 0.8032786846160889, 0.770... 18 4865
24 linear 9 0.390427 0.001105 18 60 0.819672 24 67.749137 69.669918 [0.6414411664009094, 0.44960853457450867, 0.37... [0.4244444668292999, 0.3778190612792969, 0.385... [0.6115702390670776, 0.7851239442825317, 0.822... [0.8196721076965332, 0.8032786846160889, 0.852... 18 2281
25 linear 9 0.390427 0.001105 18 60 0.803279 25 76.058007 77.917548 [0.6900577545166016, 0.45549991726875305, 0.38... [0.44305503368377686, 0.375354528427124, 0.367... [0.586776852607727, 0.7685950398445129, 0.8264... [0.8032786846160889, 0.8032786846160889, 0.836... 18 2281
26 elu 10 0.465738 0.001587 19 86 0.786885 26 78.044782 80.116082 [0.5640519261360168, 0.39274558424949646, 0.35... [0.36559411883354187, 0.36404427886009216, 0.3... [0.7066115736961365, 0.8181818127632141, 0.826... [0.8196721076965332, 0.8524590134620667, 0.852... 18 3269
27 hard_sigmoid 11 0.558586 0.000470 16 97 0.770492 27 80.235434 82.186551 [0.6638611555099487, 0.6282634139060974, 0.578... [0.5112671256065369, 0.49481236934661865, 0.48... [0.6900826692581177, 0.6900826692581177, 0.727... [0.7704917788505554, 0.7704917788505554, 0.770... 18 3687
28 hard_sigmoid 9 0.470758 0.001626 13 93 0.819672 28 82.308676 84.119199 [0.6838790774345398, 0.5366182327270508, 0.500... [0.4881541430950165, 0.44332659244537354, 0.41... [0.6074380278587341, 0.7479338645935059, 0.752... [0.7704917788505554, 0.7868852615356445, 0.836... 18 3535
29 hard_sigmoid 14 0.443848 0.000821 18 91 0.868852 29 84.241275 86.165970 [0.635932981967926, 0.5662679076194763, 0.5535... [0.5330402851104736, 0.49324533343315125, 0.46... [0.6528925895690918, 0.7231404781341553, 0.710... [0.7704917788505554, 0.7704917788505554, 0.770... 18 3459

Now that the search is over, let us print the best configuration found during this run.

[17]:
i_max = results.objective.argmax()
best_job = results.iloc[i_max].to_dict()


print(f"The default configuration has an accuracy of {objective_default:.3f}. \n"
      f"The best configuration found by DeepHyper has an accuracy {results['objective'].iloc[i_max]:.3f}, \n"
      f"discovered after {results['m:timestamp_gather'].iloc[i_max]:.2f} secondes of search.\n")

best_job
The default configuration has an accuracy of 0.820.
The best configuration found by DeepHyper has an accuracy 0.869,
discovered after 86.17 secondes of search.

[17]:
{'p:activation': 'hard_sigmoid',
 'p:batch_size': 14,
 'p:dropout_rate': 0.4438481734660434,
 'p:learning_rate': 0.0008205301701702,
 'p:num_epochs': 18,
 'p:units': 91,
 'objective': 0.868852436542511,
 'job_id': 29,
 'm:timestamp_submit': 84.2412748336792,
 'm:timestamp_gather': 86.16596984863281,
 'm:loss': '[0.635932981967926, 0.5662679076194763, 0.5535424947738647, 0.5280432105064392, 0.5272855162620544, 0.4944866895675659, 0.4731537699699402, 0.4632386565208435, 0.45537763833999634, 0.441008597612381, 0.43360987305641174, 0.41000625491142273, 0.3955390155315399, 0.3979620039463043, 0.4121660590171814, 0.3952913284301758, 0.37713170051574707, 0.37644729018211365]',
 'm:val_loss': '[0.5330402851104736, 0.49324533343315125, 0.46836990118026733, 0.4523628056049347, 0.43521663546562195, 0.4218882620334625, 0.41329118609428406, 0.40013813972473145, 0.39292314648628235, 0.38573184609413147, 0.3782324492931366, 0.37434619665145874, 0.3733466863632202, 0.3698613941669464, 0.36946991086006165, 0.36632686853408813, 0.369396448135376, 0.37207305431365967]',
 'm:accuracy': '[0.6528925895690918, 0.7231404781341553, 0.71074378490448, 0.7396694421768188, 0.7190082669258118, 0.7603305578231812, 0.7685950398445129, 0.7809917330741882, 0.7603305578231812, 0.797520637512207, 0.7851239442825317, 0.8140496015548706, 0.8016529083251953, 0.8016529083251953, 0.8140496015548706, 0.8057851195335388, 0.8512396812438965, 0.8223140239715576]',
 'm:val_accuracy': '[0.7704917788505554, 0.7704917788505554, 0.7704917788505554, 0.7868852615356445, 0.7868852615356445, 0.8032786846160889, 0.7868852615356445, 0.7704917788505554, 0.7704917788505554, 0.7704917788505554, 0.7704917788505554, 0.8032786846160889, 0.8032786846160889, 0.8032786846160889, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.868852436542511]',
 'm:num_parameters': 18,
 'm:num_parameters_train': 3459}
[18]:
import matplotlib.pyplot as plt

plt.figure()
plt.plot(json.loads(metadata_default["val_accuracy"]), color="skyblue", label="Default (val)")
plt.plot(json.loads(metadata_default["accuracy"]), color="skyblue", linestyle="--", label="Default (train)")
plt.plot(json.loads(best_job["m:val_accuracy"]), color="coral", linewidth=2, label="Best Job(val)")
plt.plot(json.loads(best_job["m:accuracy"]), color="coral", linestyle="--", linewidth=2, label="Best Job (train)")
plt.legend()
plt.xlim(0,50)
plt.ylim(0.5, 0.9)
plt.grid()
plt.show()
../../../../_images/tutorials_tutorials_colab_HPS_basic_classification_with_tabular_data_notebook_35_0.png

2.9. Restart from a checkpoint#

It can often be useful to continue the search from previous results. For example, if the allocation requested was not enough or if an unexpected crash happened. The AMBS searhc provides the fit_surrogate(dataframe_of_results) method for this use case.

To simulate this we create a second evaluator evaluator_2 and start a fresh AMBS search with strong explotation kappa=0.001.

[19]:
# Create a new evaluator
evaluator_2 = get_evaluator(run)

# Create a new AMBS search with strong explotation (i.e., small kappa)
search_from_checkpoint = CBO(problem, evaluator_2, kappa=0.001)

# Initialize surrogate model of Bayesian optization
# With results of previous search
search_from_checkpoint.fit_surrogate(results)
Created new evaluator with 1 worker and config: {'num_cpus': 1, 'num_cpus_per_task': 1, 'callbacks': [<deephyper.evaluator.callback.TqdmCallback object at 0x36aa85450>]}
[20]:
results_from_checkpoint = search_from_checkpoint.search(max_evals=25)
[21]:
results_from_checkpoint
[21]:
p:activation p:batch_size p:dropout_rate p:learning_rate p:num_epochs p:units objective job_id m:timestamp_submit m:timestamp_gather m:loss m:val_loss m:accuracy m:val_accuracy m:num_parameters m:num_parameters_train
0 linear 12 0.438976 0.001208 10 95 0.836066 0 6.397480 8.097164 [0.6585780382156372, 0.4198558032512665, 0.388... [0.4234312176704407, 0.3709704279899597, 0.375... [0.586776852607727, 0.8223140239715576, 0.8181... [0.8524590134620667, 0.8032786846160889, 0.852... 18 3611
1 hard_sigmoid 12 0.520554 0.002753 24 86 0.836066 1 8.325622 10.485447 [0.6523781418800354, 0.5374161005020142, 0.467... [0.4600506126880646, 0.418490469455719, 0.3926... [0.6322314143180847, 0.7355371713638306, 0.756... [0.7704917788505554, 0.8032786846160889, 0.770... 18 3269
2 hard_sigmoid 14 0.363236 0.001486 25 110 0.836066 2 10.607192 12.538232 [0.8731681108474731, 0.5844830870628357, 0.526... [0.5583247542381287, 0.463060587644577, 0.4330... [0.39256197214126587, 0.7231404781341553, 0.71... [0.7868852615356445, 0.7704917788505554, 0.786... 18 4181
3 selu 14 0.460495 0.002067 12 83 0.819672 3 12.660986 14.445949 [0.5941786766052246, 0.40573564171791077, 0.35... [0.36787426471710205, 0.3851782977581024, 0.39... [0.6900826692581177, 0.8140496015548706, 0.855... [0.8196721076965332, 0.8360655903816223, 0.836... 18 3155
4 hard_sigmoid 14 0.429233 0.001345 44 93 0.836066 4 14.567245 16.865442 [0.8053226470947266, 0.5926507711410522, 0.555... [0.5757733583450317, 0.4833868145942688, 0.449... [0.4586776793003082, 0.7355371713638306, 0.719... [0.7704917788505554, 0.7704917788505554, 0.770... 18 3535
5 hard_sigmoid 12 0.435679 0.002768 19 98 0.852459 5 16.987307 18.849204 [0.8416682481765747, 0.5458146333694458, 0.485... [0.49310293793678284, 0.4248492121696472, 0.40... [0.4834710657596588, 0.7190082669258118, 0.752... [0.7704917788505554, 0.7868852615356445, 0.819... 18 3725
6 hard_sigmoid 13 0.193879 0.002970 27 97 0.786885 6 18.970489 21.118783 [0.6036434769630432, 0.4864918291568756, 0.419... [0.44192907214164734, 0.39877942204475403, 0.3... [0.6322314143180847, 0.7520661354064941, 0.814... [0.7704917788505554, 0.7868852615356445, 0.803... 18 3687
7 hard_sigmoid 12 0.398339 0.003165 16 86 0.803279 7 21.239853 23.158434 [0.575615644454956, 0.4804820418357849, 0.4493... [0.4336657226085663, 0.38864782452583313, 0.36... [0.6983470916748047, 0.7520661354064941, 0.772... [0.7868852615356445, 0.7704917788505554, 0.803... 18 3269
8 hard_sigmoid 10 0.440850 0.001984 21 79 0.819672 8 23.280709 25.263713 [0.6699650883674622, 0.5623903274536133, 0.494... [0.4936249852180481, 0.43841785192489624, 0.40... [0.5909090638160706, 0.702479362487793, 0.7685... [0.7704917788505554, 0.7868852615356445, 0.786... 18 3003
9 selu 12 0.477594 0.001919 24 112 0.803279 9 25.386227 27.457640 [0.5322725176811218, 0.3245876729488373, 0.309... [0.38319069147109985, 0.3997686803340912, 0.42... [0.7231404781341553, 0.8429751992225647, 0.867... [0.8032786846160889, 0.8360655903816223, 0.819... 18 4257
10 linear 17 0.509973 0.002447 23 89 0.803279 10 27.581020 29.382152 [0.5835959911346436, 0.37069475650787354, 0.33... [0.3832534849643707, 0.3833093047142029, 0.397... [0.6818181872367859, 0.8347107172012329, 0.871... [0.8196721076965332, 0.8360655903816223, 0.852... 18 3383
11 hard_sigmoid 15 0.523164 0.001067 54 85 0.836066 11 29.505811 32.134817 [0.9218267798423767, 0.6668001413345337, 0.599... [0.7107035517692566, 0.5423887968063354, 0.487... [0.40909090638160706, 0.6074380278587341, 0.66... [0.32786884903907776, 0.7868852615356445, 0.77... 18 3231
12 hard_sigmoid 9 0.441817 0.001026 63 120 0.803279 12 32.257875 35.366201 [0.6784322261810303, 0.593529462814331, 0.5153... [0.512413740158081, 0.4700430631637573, 0.4388... [0.702479362487793, 0.6942148804664612, 0.7685... [0.7704917788505554, 0.7704917788505554, 0.786... 18 4561
13 hard_sigmoid 12 0.544595 0.000872 31 108 0.868852 13 35.591719 37.863411 [0.6894024014472961, 0.5956807136535645, 0.591... [0.5160852670669556, 0.4825969934463501, 0.459... [0.6570248007774353, 0.71074378490448, 0.70247... [0.7704917788505554, 0.7704917788505554, 0.770... 18 4105
14 hard_sigmoid 11 0.412163 0.000971 25 95 0.852459 14 37.986737 39.992904 [0.6809544563293457, 0.5772911906242371, 0.549... [0.5399053692817688, 0.4780106544494629, 0.451... [0.56611567735672, 0.7272727489471436, 0.71074... [0.7704917788505554, 0.7704917788505554, 0.770... 18 3611
15 hard_sigmoid 9 0.529147 0.001269 21 104 0.819672 15 40.117375 42.095783 [0.7127968668937683, 0.5936329960823059, 0.580... [0.5231141448020935, 0.4680721163749695, 0.439... [0.5619834661483765, 0.6942148804664612, 0.698... [0.7704917788505554, 0.7704917788505554, 0.803... 18 3953
16 hard_sigmoid 18 0.460586 0.000852 57 114 0.836066 16 42.219457 44.786083 [0.620365560054779, 0.5650091767311096, 0.5926... [0.5191843509674072, 0.4921819567680359, 0.471... [0.6652892827987671, 0.702479362487793, 0.7024... [0.7704917788505554, 0.7704917788505554, 0.770... 18 4333
17 hard_sigmoid 11 0.387598 0.000823 36 107 0.852459 17 44.909478 47.169109 [0.6506151556968689, 0.589153528213501, 0.5537... [0.5208263993263245, 0.47787296772003174, 0.44... [0.6694214940071106, 0.7272727489471436, 0.739... [0.7704917788505554, 0.7704917788505554, 0.786... 18 4067
18 hard_sigmoid 13 0.414306 0.001142 15 107 0.868852 18 47.295230 49.173612 [0.6849047541618347, 0.6010968089103699, 0.530... [0.5226393342018127, 0.4722970426082611, 0.444... [0.5619834661483765, 0.7066115736961365, 0.743... [0.7704917788505554, 0.7704917788505554, 0.786... 18 4067
19 hard_sigmoid 18 0.486091 0.000940 16 98 0.803279 19 49.300318 51.107571 [0.6370577216148376, 0.602827787399292, 0.5877... [0.5238470435142517, 0.49890682101249695, 0.47... [0.6652892827987671, 0.7148760557174683, 0.702... [0.7704917788505554, 0.7704917788505554, 0.770... 18 3725
20 hard_sigmoid 16 0.398807 0.000787 28 127 0.852459 20 51.235310 53.186919 [0.690682590007782, 0.597438395023346, 0.54904... [0.5534363985061646, 0.4879835247993469, 0.464... [0.6115702390670776, 0.6776859760284424, 0.710... [0.7704917788505554, 0.7704917788505554, 0.770... 18 4827
21 hard_sigmoid 17 0.384646 0.000786 38 106 0.852459 21 53.313367 56.106235 [0.6100438237190247, 0.5865045189857483, 0.532... [0.5186737179756165, 0.4805293381214142, 0.459... [0.6694214940071106, 0.6983470916748047, 0.739... [0.7704917788505554, 0.7704917788505554, 0.770... 18 4029
22 hard_sigmoid 14 0.356788 0.000776 36 93 0.852459 22 56.232855 58.367166 [0.8494999408721924, 0.6852762699127197, 0.605... [0.7045169472694397, 0.5688167214393616, 0.503... [0.3677685856819153, 0.5165289044380188, 0.723... [0.4590163826942444, 0.7704917788505554, 0.770... 18 3535
23 hard_sigmoid 13 0.397346 0.001412 58 106 0.803279 23 58.493287 61.266117 [0.5852853059768677, 0.5252292156219482, 0.463... [0.47994744777679443, 0.4468657970428467, 0.41... [0.6983470916748047, 0.7603305578231812, 0.752... [0.7704917788505554, 0.7868852615356445, 0.819... 18 4029
24 hard_sigmoid 11 0.592738 0.000858 36 102 0.852459 24 61.391166 63.710971 [0.7020736932754517, 0.6323720216751099, 0.635... [0.5165272355079651, 0.49130111932754517, 0.46... [0.6652892827987671, 0.6942148804664612, 0.665... [0.7704917788505554, 0.7704917788505554, 0.770... 18 3877
[24]:
i_max = results_from_checkpoint.objective.argmax()
best_job = results_from_checkpoint.iloc[i_max].to_dict()


print(f"The default configuration has an accuracy of {objective_default:.3f}. \n"
      f"The best configuration found by DeepHyper has an accuracy {results_from_checkpoint['objective'].iloc[i_max]:.3f}, \n"
      f"discovered after {results_from_checkpoint['m:timestamp_gather'].iloc[i_max]:.2f} secondes of search.\n")

best_job
The default configuration has an accuracy of 0.820.
The best configuration found by DeepHyper has an accuracy 0.869,
discovered after 37.86 secondes of search.

[24]:
{'p:activation': 'hard_sigmoid',
 'p:batch_size': 12,
 'p:dropout_rate': 0.5445945061992221,
 'p:learning_rate': 0.0008723159649302,
 'p:num_epochs': 31,
 'p:units': 108,
 'objective': 0.868852436542511,
 'job_id': 13,
 'm:timestamp_submit': 35.591718912124634,
 'm:timestamp_gather': 37.86341094970703,
 'm:loss': '[0.6894024014472961, 0.5956807136535645, 0.5917392373085022, 0.5917980670928955, 0.5240165591239929, 0.5105810761451721, 0.48491740226745605, 0.4678659439086914, 0.4341070055961609, 0.44388189911842346, 0.46203067898750305, 0.4385513663291931, 0.373516708612442, 0.3806712031364441, 0.4247581362724304, 0.38694536685943604, 0.39678844809532166, 0.36177390813827515, 0.39248791337013245, 0.3694393038749695, 0.36910954117774963, 0.36527299880981445, 0.3246991038322449, 0.3702981472015381, 0.3380395472049713, 0.32118597626686096, 0.3399485945701599, 0.316026508808136, 0.3481677770614624, 0.332357257604599, 0.30446261167526245]',
 'm:val_loss': '[0.5160852670669556, 0.4825969934463501, 0.4596042037010193, 0.4401782751083374, 0.4263782799243927, 0.40957772731781006, 0.39775389432907104, 0.3879868686199188, 0.3824841380119324, 0.37751471996307373, 0.3767043650150299, 0.37132832407951355, 0.36980336904525757, 0.36811932921409607, 0.36348962783813477, 0.3667713403701782, 0.3646043539047241, 0.36585021018981934, 0.3656473755836487, 0.3691723048686981, 0.3709169328212738, 0.3743424415588379, 0.37509554624557495, 0.3765678405761719, 0.38067683577537537, 0.3765696585178375, 0.38010162115097046, 0.38514068722724915, 0.3826046586036682, 0.38491377234458923, 0.39010316133499146]',
 'm:accuracy': '[0.6570248007774353, 0.71074378490448, 0.702479362487793, 0.6818181872367859, 0.7355371713638306, 0.7561983466148376, 0.7520661354064941, 0.7768595218658447, 0.7809917330741882, 0.7933884263038635, 0.7685950398445129, 0.7933884263038635, 0.8347107172012329, 0.8512396812438965, 0.7851239442825317, 0.8057851195335388, 0.8140496015548706, 0.8347107172012329, 0.8057851195335388, 0.8223140239715576, 0.8181818127632141, 0.8429751992225647, 0.8595041036605835, 0.8264462947845459, 0.8471074104309082, 0.8636363744735718, 0.8595041036605835, 0.8636363744735718, 0.85537189245224, 0.8347107172012329, 0.8595041036605835]',
 'm:val_accuracy': '[0.7704917788505554, 0.7704917788505554, 0.7704917788505554, 0.7868852615356445, 0.8032786846160889, 0.7868852615356445, 0.8032786846160889, 0.7704917788505554, 0.7704917788505554, 0.7704917788505554, 0.8032786846160889, 0.8032786846160889, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.868852436542511, 0.868852436542511, 0.868852436542511, 0.868852436542511, 0.868852436542511, 0.868852436542511, 0.868852436542511, 0.868852436542511, 0.868852436542511, 0.868852436542511, 0.868852436542511, 0.868852436542511]',
 'm:num_parameters': 18,
 'm:num_parameters_train': 4105}
[25]:
import matplotlib.pyplot as plt

plt.figure()
plt.plot(json.loads(metadata_default["val_accuracy"]), color="skyblue", label="Default (val)")
plt.plot(json.loads(metadata_default["accuracy"]), color="skyblue", linestyle="--", label="Default (train)")
plt.plot(json.loads(best_job["m:val_accuracy"]), color="coral", linewidth=2, label="Best Job(val)")
plt.plot(json.loads(best_job["m:accuracy"]), color="coral", linestyle="--", linewidth=2, label="Best Job (train)")
plt.legend()
plt.xlim(0,50)
plt.ylim(0.5, 0.9)
plt.grid()
plt.show()
../../../../_images/tutorials_tutorials_colab_HPS_basic_classification_with_tabular_data_notebook_41_0.png

2.10. Add conditional hyperparameters#

Now we want to add the possibility to search for a second fully-connected layer. We simply add two new lines:

if config.get("dense_2", False):
    x = tf.keras.layers.Dense(config["dense_2:units"], activation=config["dense_2:activation"])(x)
[35]:
def run_with_condition(config: dict):
    tf.autograph.set_verbosity(0)

    train_dataframe, val_dataframe = load_data()

    train_ds = dataframe_to_dataset(train_dataframe)
    val_ds = dataframe_to_dataset(val_dataframe)

    train_ds = train_ds.batch(config["batch_size"])
    val_ds = val_ds.batch(config["batch_size"])

    # Categorical features encoded as integers
    sex = tf.keras.Input(shape=(1,), name="sex", dtype="int64")
    cp = tf.keras.Input(shape=(1,), name="cp", dtype="int64")
    fbs = tf.keras.Input(shape=(1,), name="fbs", dtype="int64")
    restecg = tf.keras.Input(shape=(1,), name="restecg", dtype="int64")
    exang = tf.keras.Input(shape=(1,), name="exang", dtype="int64")
    ca = tf.keras.Input(shape=(1,), name="ca", dtype="int64")

    # Categorical feature encoded as string
    thal = tf.keras.Input(shape=(1,), name="thal", dtype="string")

    # Numerical features
    age = tf.keras.Input(shape=(1,), name="age")
    trestbps = tf.keras.Input(shape=(1,), name="trestbps")
    chol = tf.keras.Input(shape=(1,), name="chol")
    thalach = tf.keras.Input(shape=(1,), name="thalach")
    oldpeak = tf.keras.Input(shape=(1,), name="oldpeak")
    slope = tf.keras.Input(shape=(1,), name="slope")

    all_inputs = [
        sex,
        cp,
        fbs,
        restecg,
        exang,
        ca,
        thal,
        age,
        trestbps,
        chol,
        thalach,
        oldpeak,
        slope,
    ]

    # Integer categorical features
    sex_encoded = encode_categorical_feature(sex, "sex", train_ds, False)
    cp_encoded = encode_categorical_feature(cp, "cp", train_ds, False)
    fbs_encoded = encode_categorical_feature(fbs, "fbs", train_ds, False)
    restecg_encoded = encode_categorical_feature(restecg, "restecg", train_ds, False)
    exang_encoded = encode_categorical_feature(exang, "exang", train_ds, False)
    ca_encoded = encode_categorical_feature(ca, "ca", train_ds, False)

    # String categorical features
    thal_encoded = encode_categorical_feature(thal, "thal", train_ds, True)

    # Numerical features
    age_encoded = encode_numerical_feature(age, "age", train_ds)
    trestbps_encoded = encode_numerical_feature(trestbps, "trestbps", train_ds)
    chol_encoded = encode_numerical_feature(chol, "chol", train_ds)
    thalach_encoded = encode_numerical_feature(thalach, "thalach", train_ds)
    oldpeak_encoded = encode_numerical_feature(oldpeak, "oldpeak", train_ds)
    slope_encoded = encode_numerical_feature(slope, "slope", train_ds)

    all_features = tf.keras.layers.concatenate(
        [
            sex_encoded,
            cp_encoded,
            fbs_encoded,
            restecg_encoded,
            exang_encoded,
            slope_encoded,
            ca_encoded,
            thal_encoded,
            age_encoded,
            trestbps_encoded,
            chol_encoded,
            thalach_encoded,
            oldpeak_encoded,
        ]
    )
    x = tf.keras.layers.Dense(config["units"], activation=config["activation"])(
        all_features
    )

    ### START - NEW LINES
    if config.get("dense_2", False):
        x = tf.keras.layers.Dense(config["dense_2:units"], activation=config["dense_2:activation"])(x)
    ### END - NEW LINES

    x = tf.keras.layers.Dropout(config["dropout_rate"])(x)
    output = tf.keras.layers.Dense(1, activation="sigmoid")(x)
    model = tf.keras.Model(all_inputs, output)

    optimizer = tf.keras.optimizers.Adam(learning_rate=config["learning_rate"])
    model.compile(optimizer, "binary_crossentropy", metrics=["accuracy"])

    history = model.fit(
        train_ds, epochs=config["num_epochs"], validation_data=val_ds, verbose=0
    )

    objective = history.history["val_accuracy"][-1]
    metadata = {
        "loss": history.history["loss"],
        "val_loss": history.history["val_loss"],
        "accuracy": history.history["accuracy"],
        "val_accuracy": history.history["val_accuracy"],
    }
    metadata = {k:json.dumps(v) for k,v in metadata.items()}
    metadata.update(count_params(model))

    return {"objective": objective, "metadata": metadata}

To define conditionnal hyperparameters we use ConfigSpace. We define dense_2:units and dense_2:activation as active hyperparameters only when dense_2 == True. The cs.EqualsCondition help us do that. Then we call

problem_with_condition.add_condition(condition)

to register each new condition to the HpProblem.

[36]:
from deephyper.problem import EqualsCondition

# Define the hyperparameter problem
problem_with_condition = HpProblem()


# Define the same hyperparameters as before
problem_with_condition.add_hyperparameter((8, 128), "units")
problem_with_condition.add_hyperparameter(ACTIVATIONS, "activation")
problem_with_condition.add_hyperparameter((0.0, 0.6), "dropout_rate")
problem_with_condition.add_hyperparameter((10, 100), "num_epochs")
problem_with_condition.add_hyperparameter((8, 256, "log-uniform"), "batch_size")
problem_with_condition.add_hyperparameter((1e-5, 1e-2, "log-uniform"), "learning_rate")


# Add a new hyperparameter "dense_2 (bool)" to decide if a second fully-connected layer should be created
hp_dense_2 = problem_with_condition.add_hyperparameter([True, False], "dense_2")
hp_dense_2_units = problem_with_condition.add_hyperparameter((8, 128), "dense_2:units")
hp_dense_2_activation = problem_with_condition.add_hyperparameter(ACTIVATIONS, "dense_2:activation")

problem_with_condition.add_condition(EqualsCondition(hp_dense_2_units, hp_dense_2, True))
problem_with_condition.add_condition(EqualsCondition(hp_dense_2_activation, hp_dense_2, True))


problem_with_condition
[36]:
Configuration space object:
  Hyperparameters:
    activation, Type: Categorical, Choices: {elu, gelu, hard_sigmoid, linear, relu, selu, sigmoid, softplus, softsign, swish, tanh}, Default: elu
    batch_size, Type: UniformInteger, Range: [8, 256], Default: 45, on log-scale
    dense_2, Type: Categorical, Choices: {True, False}, Default: True
    dense_2:activation, Type: Categorical, Choices: {elu, gelu, hard_sigmoid, linear, relu, selu, sigmoid, softplus, softsign, swish, tanh}, Default: elu
    dense_2:units, Type: UniformInteger, Range: [8, 128], Default: 68
    dropout_rate, Type: UniformFloat, Range: [0.0, 0.6], Default: 0.3
    learning_rate, Type: UniformFloat, Range: [1e-05, 0.01], Default: 0.0003162278, on log-scale
    num_epochs, Type: UniformInteger, Range: [10, 100], Default: 55
    units, Type: UniformInteger, Range: [8, 128], Default: 68
  Conditions:
    dense_2:activation | dense_2 == True
    dense_2:units | dense_2 == True

We create a new evaluator evaluator_3 and start a fresh AMBS search with this new problem problem_with_condition.

[37]:
evaluator_3 = get_evaluator(run_with_condition)

search_with_condition = CBO(problem_with_condition, evaluator_3)
Created new evaluator with 1 worker and config: {'num_cpus': 1, 'num_cpus_per_task': 1, 'callbacks': [<deephyper.evaluator.callback.TqdmCallback object at 0x36e9ecb50>]}
[38]:
results_with_condition = search_with_condition.search(max_evals=25)
[39]:
results_with_condition
[39]:
p:activation p:batch_size p:dense_2 p:dropout_rate p:learning_rate p:num_epochs p:units p:dense_2:activation p:dense_2:units objective job_id m:timestamp_submit m:timestamp_gather m:loss m:val_loss m:accuracy m:val_accuracy m:num_parameters m:num_parameters_train
0 gelu 24 False 0.522131 0.001057 44 8 NaN NaN 0.819672 0 2.235576 4.550812 [0.924185037612915, 0.8747961521148682, 0.8326... [0.8763667941093445, 0.8163478970527649, 0.763... [0.43801653385162354, 0.5123966932296753, 0.54... [0.32786884903907776, 0.3606557250022888, 0.44... 18 305
1 softsign 47 True 0.377181 0.000019 47 127 elu 75.0 0.819672 1 4.884659 6.967428 [0.5733370184898376, 0.5688332319259644, 0.574... [0.5401382446289062, 0.5361412763595581, 0.532... [0.7148760557174683, 0.7396694421768188, 0.702... [0.7377049326896667, 0.7540983557701111, 0.770... 18 14375
2 hard_sigmoid 25 False 0.333931 0.000461 12 33 NaN NaN 0.770492 2 7.296740 8.877128 [0.6208266615867615, 0.6297191381454468, 0.623... [0.5321816802024841, 0.5246576070785522, 0.517... [0.7148760557174683, 0.7231404781341553, 0.706... [0.7704917788505554, 0.7704917788505554, 0.770... 18 1255
3 softsign 8 True 0.203617 0.000153 77 24 softplus 20.0 0.819672 3 9.212803 13.288342 [1.1443312168121338, 1.0988541841506958, 0.993... [1.1278998851776123, 1.0535285472869873, 0.983... [0.2933884263038635, 0.27272728085517883, 0.30... [0.2295081913471222, 0.2295081913471222, 0.229... 18 1409
4 tanh 232 False 0.544339 0.000236 17 15 NaN NaN 0.540984 4 13.726519 15.218855 [0.8239354491233826, 0.8208498358726501, 0.838... [0.7502322196960449, 0.7468000054359436, 0.743... [0.4917355477809906, 0.44214877486228943, 0.47... [0.49180328845977783, 0.49180328845977783, 0.4... 18 571
5 linear 31 True 0.431446 0.002055 27 114 tanh 69.0 0.803279 5 15.547276 17.921458 [0.512678325176239, 0.34136030077934265, 0.303... [0.3721773326396942, 0.44410887360572815, 0.46... [0.7148760557174683, 0.8512396812438965, 0.867... [0.7868852615356445, 0.8524590134620667, 0.836... 18 12223
6 swish 185 True 0.143728 0.000066 36 126 relu 88.0 0.786885 6 18.249087 19.944625 [0.7136270403862, 0.7156499624252319, 0.704962... [0.7125016450881958, 0.7087024450302124, 0.704... [0.3677685856819153, 0.40909090638160706, 0.47... [0.4590163826942444, 0.49180328845977783, 0.50... 18 15927
7 selu 85 True 0.513058 0.000975 20 24 elu 77.0 0.819672 7 20.271154 21.969523 [0.5428742170333862, 0.5066077709197998, 0.472... [0.5021791458129883, 0.4607888162136078, 0.434... [0.7520661354064941, 0.7355371713638306, 0.772... [0.8032786846160889, 0.8196721076965332, 0.803... 18 2891
8 softplus 256 False 0.541869 0.000032 44 123 NaN NaN 0.737705 8 22.299193 24.046926 [0.8337835669517517, 0.7673187255859375, 0.769... [0.6750521063804626, 0.6737582087516785, 0.672... [0.5330578684806824, 0.5413222908973694, 0.520... [0.6557376980781555, 0.6557376980781555, 0.672... 18 4675
9 hard_sigmoid 57 True 0.221024 0.000014 77 101 selu 56.0 0.770492 9 24.373393 26.569876 [1.2874423265457153, 1.2473254203796387, 1.279... [1.334114909172058, 1.3099576234817505, 1.2861... [0.2851239740848541, 0.28925618529319763, 0.28... [0.2295081913471222, 0.2295081913471222, 0.229... 18 9506
10 gelu 18 False 0.564763 0.000933 54 15 NaN NaN 0.819672 10 27.104312 29.572576 [0.5897572636604309, 0.57463538646698, 0.57585... [0.5318115949630737, 0.5115801095962524, 0.493... [0.6859503984451294, 0.6776859760284424, 0.710... [0.7868852615356445, 0.7868852615356445, 0.803... 18 571
11 gelu 142 False 0.579399 0.001457 89 25 NaN NaN 0.819672 11 30.006772 32.050175 [0.7384172081947327, 0.7030460834503174, 0.697... [0.7297533750534058, 0.7030223608016968, 0.677... [0.5619834661483765, 0.5743801593780518, 0.561... [0.5737704634666443, 0.6229507923126221, 0.655... 18 951
12 gelu 134 False 0.559896 0.000204 18 20 NaN NaN 0.557377 12 32.490820 34.208299 [0.7932754158973694, 0.8488675951957703, 0.829... [0.7328909039497375, 0.7296200394630432, 0.726... [0.4752066135406494, 0.42561984062194824, 0.41... [0.49180328845977783, 0.5081967115402222, 0.50... 18 761
13 gelu 242 False 0.515215 0.002007 75 35 NaN NaN 0.852459 13 34.639509 36.527177 [0.859049379825592, 0.8296051025390625, 0.8243... [0.7826507687568665, 0.7542334794998169, 0.727... [0.40082645416259766, 0.44214877486228943, 0.3... [0.3442623019218445, 0.4098360538482666, 0.459... 18 1331
14 gelu 226 False 0.031790 0.007343 58 40 NaN NaN 0.770492 14 36.960484 38.937768 [0.7312912940979004, 0.5623873472213745, 0.466... [0.5508450269699097, 0.44452348351478577, 0.39... [0.42561984062194824, 0.7851239442825317, 0.80... [0.7704917788505554, 0.8196721076965332, 0.803... 18 1521
15 softplus 203 False 0.554349 0.003512 93 37 NaN NaN 0.819672 15 39.373037 41.404819 [1.0284086465835571, 0.9733473658561707, 0.859... [0.8921995162963867, 0.7531617879867554, 0.649... [0.41322314739227295, 0.43388429284095764, 0.5... [0.24590164422988892, 0.3442623019218445, 0.62... 18 1407
16 gelu 232 False 0.362360 0.001611 74 51 NaN NaN 0.836066 16 41.948220 44.062609 [0.7161883115768433, 0.6824987530708313, 0.680... [0.6093551516532898, 0.5813453197479248, 0.557... [0.5413222908973694, 0.5785123705863953, 0.607... [0.688524603843689, 0.7049180269241333, 0.7049... 18 1939
17 gelu 234 False 0.494525 0.000187 81 34 NaN NaN 0.786885 17 44.499486 46.484084 [0.6872013807296753, 0.6597692966461182, 0.683... [0.6522549986839294, 0.6491743326187134, 0.646... [0.5619834661483765, 0.6033057570457458, 0.541... [0.7213114500045776, 0.7377049326896667, 0.737... 18 1293
18 gelu 197 False 0.260730 0.002156 72 65 NaN NaN 0.786885 18 46.924379 49.068913 [0.7648653388023376, 0.702664315700531, 0.6466... [0.6779869198799133, 0.62514728307724, 0.57954... [0.43801653385162354, 0.586776852607727, 0.636... [0.6393442749977112, 0.688524603843689, 0.7540... 18 2471
19 tanh 253 False 0.387836 0.005233 100 44 NaN NaN 0.803279 19 49.509489 51.511528 [0.9511580467224121, 0.8183630108833313, 0.676... [0.8083988428115845, 0.6910563111305237, 0.599... [0.37603306770324707, 0.42561984062194824, 0.5... [0.4262295067310333, 0.5573770403862, 0.688524... 18 1673
20 gelu 215 False 0.496468 0.000754 62 52 NaN NaN 0.819672 20 51.948622 53.995673 [0.6701352596282959, 0.630990743637085, 0.6785... [0.614578127861023, 0.6000330448150635, 0.5868... [0.6611570119857788, 0.702479362487793, 0.6198... [0.7540983557701111, 0.7540983557701111, 0.770... 18 1977
21 gelu 216 False 0.506153 0.001577 31 123 NaN NaN 0.852459 21 54.549624 56.165359 [0.7263947129249573, 0.6771824955940247, 0.611... [0.6366023421287537, 0.5854663252830505, 0.541... [0.4628099203109741, 0.5619834661483765, 0.706... [0.7704917788505554, 0.8032786846160889, 0.803... 18 4675
22 gelu 13 False 0.506717 0.006827 37 122 NaN NaN 0.803279 22 56.607589 58.999443 [0.4278903603553772, 0.3155879080295563, 0.283... [0.4139474332332611, 0.45352885127067566, 0.43... [0.8057851195335388, 0.8719007968902588, 0.892... [0.7868852615356445, 0.7868852615356445, 0.803... 18 4637
23 gelu 218 False 0.496672 0.002579 35 95 NaN NaN 0.803279 23 59.442295 61.092887 [0.6519522070884705, 0.5893392562866211, 0.535... [0.5795789957046509, 0.5213544368743896, 0.478... [0.6280992031097412, 0.7355371713638306, 0.768... [0.7868852615356445, 0.8032786846160889, 0.819... 18 3611
24 gelu 102 True 0.564560 0.001154 22 121 swish 58.0 0.836066 24 61.534962 63.325902 [0.6295450329780579, 0.5699268579483032, 0.503... [0.5654922127723694, 0.5030131936073303, 0.457... [0.6776859760284424, 0.7190082669258118, 0.760... [0.8032786846160889, 0.8032786846160889, 0.819... 18 11612

Finally, let us print out the best configuration found from this conditionned search space.

[41]:
i_max = results_with_condition.objective.argmax()
best_job = results_with_condition.iloc[i_max].to_dict()


print(f"The default configuration has an accuracy of {objective_default:.3f}. \n"
      f"The best configuration found by DeepHyper has an accuracy {results_with_condition['objective'].iloc[i_max]:.3f}, \n"
      f"discovered after {results_with_condition['m:timestamp_gather'].iloc[i_max]:.2f} secondes of search.\n")

best_job
The default configuration has an accuracy of 0.820.
The best configuration found by DeepHyper has an accuracy 0.852,
discovered after 36.53 secondes of search.

[41]:
{'p:activation': 'gelu',
 'p:batch_size': 242,
 'p:dense_2': False,
 'p:dropout_rate': 0.5152152399825081,
 'p:learning_rate': 0.0020066536451654,
 'p:num_epochs': 75,
 'p:units': 35,
 'p:dense_2:activation': nan,
 'p:dense_2:units': nan,
 'objective': 0.8524590134620667,
 'job_id': 13,
 'm:timestamp_submit': 34.639508962631226,
 'm:timestamp_gather': 36.52717709541321,
 'm:loss': '[0.859049379825592, 0.8296051025390625, 0.8243043422698975, 0.7973544597625732, 0.7397300004959106, 0.7041493654251099, 0.7235077619552612, 0.6488988995552063, 0.702877402305603, 0.6625365018844604, 0.655247151851654, 0.6482662558555603, 0.6332350373268127, 0.5944415926933289, 0.599096953868866, 0.5725361704826355, 0.5635514259338379, 0.5673025250434875, 0.5335801839828491, 0.5114941000938416, 0.5681918859481812, 0.5181658864021301, 0.5367188453674316, 0.4979115426540375, 0.4959726631641388, 0.4972151219844818, 0.49240514636039734, 0.4656968414783478, 0.46458086371421814, 0.47685131430625916, 0.4619789123535156, 0.42305096983909607, 0.4502629041671753, 0.45318228006362915, 0.4273068904876709, 0.45731934905052185, 0.4358069598674774, 0.4073839485645294, 0.40324676036834717, 0.39515185356140137, 0.3967740535736084, 0.42188364267349243, 0.3988163471221924, 0.414568156003952, 0.38948333263397217, 0.3933953642845154, 0.3803653120994568, 0.4041972756385803, 0.38209018111228943, 0.37077051401138306, 0.3959812521934509, 0.3823794722557068, 0.37696704268455505, 0.38598284125328064, 0.37342891097068787, 0.36935892701148987, 0.3629962205886841, 0.3568687438964844, 0.35987207293510437, 0.35740146040916443, 0.3505695164203644, 0.3561004102230072, 0.3582131266593933, 0.37328383326530457, 0.3326776325702667, 0.34751608967781067, 0.34388142824172974, 0.32585853338241577, 0.34427350759506226, 0.3316827714443207, 0.3136938512325287, 0.34206631779670715, 0.3135722279548645, 0.3292677700519562, 0.3271864652633667]',
 'm:val_loss': '[0.7826507687568665, 0.7542334794998169, 0.7272999286651611, 0.7016635537147522, 0.6774805188179016, 0.6546944379806519, 0.6334282159805298, 0.6135805249214172, 0.5949371457099915, 0.5774815678596497, 0.5610758066177368, 0.5457151532173157, 0.5314500331878662, 0.5182359218597412, 0.5059540867805481, 0.49457502365112305, 0.48407983779907227, 0.47442489862442017, 0.46552494168281555, 0.4573885202407837, 0.44996970891952515, 0.44316211342811584, 0.4370157718658447, 0.43129366636276245, 0.4260532855987549, 0.42123591899871826, 0.41692915558815, 0.4130058288574219, 0.4093940854072571, 0.40622448921203613, 0.40338370203971863, 0.4008386433124542, 0.3985663056373596, 0.3966066539287567, 0.3948517143726349, 0.39329880475997925, 0.39195170998573303, 0.390933632850647, 0.3902509808540344, 0.38974982500076294, 0.3894272744655609, 0.38932961225509644, 0.38939112424850464, 0.38970157504081726, 0.38995805382728577, 0.39017829298973083, 0.3904784023761749, 0.3908677101135254, 0.3912723660469055, 0.39176133275032043, 0.3920464813709259, 0.3923230767250061, 0.3927132785320282, 0.39302974939346313, 0.39317506551742554, 0.39328402280807495, 0.393261194229126, 0.39328503608703613, 0.3932672441005707, 0.39302825927734375, 0.39276590943336487, 0.39265578985214233, 0.3927079439163208, 0.3927791118621826, 0.39287132024765015, 0.3929784595966339, 0.39310818910598755, 0.3932960033416748, 0.3935457170009613, 0.39387473464012146, 0.3941606879234314, 0.3945646584033966, 0.3948483169078827, 0.39523547887802124, 0.39547473192214966]',
 'm:accuracy': '[0.40082645416259766, 0.44214877486228943, 0.39256197214126587, 0.41735535860061646, 0.4958677589893341, 0.5, 0.5123966932296753, 0.6157024502754211, 0.5785123705863953, 0.6198347210884094, 0.6033057570457458, 0.6280992031097412, 0.6239669322967529, 0.6983470916748047, 0.6570248007774353, 0.71074378490448, 0.71074378490448, 0.7148760557174683, 0.7190082669258118, 0.7355371713638306, 0.7148760557174683, 0.7479338645935059, 0.7355371713638306, 0.7809917330741882, 0.7727272510528564, 0.7561983466148376, 0.7520661354064941, 0.7933884263038635, 0.7727272510528564, 0.797520637512207, 0.8016529083251953, 0.85537189245224, 0.797520637512207, 0.8099173307418823, 0.8223140239715576, 0.797520637512207, 0.8512396812438965, 0.8347107172012329, 0.8305785059928894, 0.8305785059928894, 0.8429751992225647, 0.8223140239715576, 0.8223140239715576, 0.8057851195335388, 0.8264462947845459, 0.8388429880142212, 0.8347107172012329, 0.8305785059928894, 0.8429751992225647, 0.8595041036605835, 0.8305785059928894, 0.8388429880142212, 0.8305785059928894, 0.8140496015548706, 0.8512396812438965, 0.8305785059928894, 0.8471074104309082, 0.8388429880142212, 0.8388429880142212, 0.8429751992225647, 0.8636363744735718, 0.8429751992225647, 0.8388429880142212, 0.8636363744735718, 0.8677685856819153, 0.85537189245224, 0.8636363744735718, 0.8760330677032471, 0.8512396812438965, 0.8595041036605835, 0.8636363744735718, 0.8636363744735718, 0.8471074104309082, 0.8595041036605835, 0.85537189245224]',
 'm:val_accuracy': '[0.3442623019218445, 0.4098360538482666, 0.4590163826942444, 0.4754098355770111, 0.5245901346206665, 0.5409836173057556, 0.6393442749977112, 0.7213114500045776, 0.7377049326896667, 0.7704917788505554, 0.8032786846160889, 0.8032786846160889, 0.8196721076965332, 0.8360655903816223, 0.8360655903816223, 0.8360655903816223, 0.8360655903816223, 0.8360655903816223, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8196721076965332, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667, 0.8524590134620667]',
 'm:num_parameters': 18,
 'm:num_parameters_train': 1331}
[42]:
import matplotlib.pyplot as plt

plt.figure()
plt.plot(json.loads(metadata_default["val_accuracy"]), color="skyblue", label="Default (val)")
plt.plot(json.loads(metadata_default["accuracy"]), color="skyblue", linestyle="--", label="Default (train)")
plt.plot(json.loads(best_job["m:val_accuracy"]), color="coral", linewidth=2, label="Best Job(val)")
plt.plot(json.loads(best_job["m:accuracy"]), color="coral", linestyle="--", linewidth=2, label="Best Job (train)")
plt.legend()
plt.xlim(0,50)
plt.ylim(0.5, 0.9)
plt.grid()
plt.show()
../../../../_images/tutorials_tutorials_colab_HPS_basic_classification_with_tabular_data_notebook_52_0.png
[ ]: