2. Hyperparameter search for classification with Tabular data (Keras)#

Open In Colab

In this tutorial we present how to use hyperparameter optimization on a basic example from the Keras documentation.

Reference: This tutorial is based on materials from the Keras Documentation: Structured data classification from scratch

Let us start with installing DeepHyper!

Warning

This tutorial should be run with tensorflow>=2.6.

[1]:
try:
    import deephyper
    print(deephyper.__version__)
except (ImportError, ModuleNotFoundError):
    !pip install deephyper

try:
    import ray
except (ImportError, ModuleNotFoundError):
    !pip install ray
0.4.2

Note

The following environment variables can be used to avoid the logging of some Tensorflow DEBUG, INFO and WARNING statements.

[2]:
import os


os.environ["TF_CPP_MIN_LOG_LEVEL"] = str(3)
os.environ["AUTOGRAPH_VERBOSITY"] = str(0)

2.1. Imports#

Warning

It is important to follow the import strategy import tensorflow as tf to prevent serialization errors that will crash the search.

The import strategy from the original Keras tutorial (shown below),

from tensorflow import keras
from tensorflow.keras import layers
...
from tensorflow.keras.layers import IntegerLookup
from tensorflow.keras.layers import Normalization
from tensorflow.keras.layers import StringLookup

resulted in non-serializable data, preventing the search from executing.

[3]:
import pandas as pd
import tensorflow as tf

Note

The following can be used to detect if GPU devices are available on the current host. Therefore, this notebook will automatically adapt the parallel execution based on the resources available locally. However, this simple code will not detect the ressources from multiple nodes.

[4]:
from tensorflow.python.client import device_lib


def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos if x.device_type == "GPU"]


n_gpus = len(get_available_gpus())
if n_gpus > 1:
    n_gpus -= 1

is_gpu_available = n_gpus > 0

if is_gpu_available:
    print(f"{n_gpus} GPU{'s are' if n_gpus > 1 else ' is'} available.")
else:
    print("No GPU available")
No GPU available

2.2. The dataset (from Keras.io)#

The dataset is provided by the Cleveland Clinic Foundation for Heart Disease. It’s a CSV file with 303 rows. Each row contains information about a patient (a sample), and each column describes an attribute of the patient (a feature). We use the features to predict whether a patient has a heart disease (binary classification).

Here’s the description of each feature:

Column

Description

Feature Type

Age

Age in years

Numerical

Sex

(1 = male; 0 = female)

Categorical

CP

Chest pain type (0, 1, 2, 3, 4)

Categorical

Trestbpd

Resting blood pressure (in mm Hg on admission)

Numerical

Chol

Serum cholesterol in mg/dl

Numerical

FBS

fasting blood sugar in 120 mg/dl (1 = true; 0 = false)

Categorical

RestECG

Resting electrocardiogram results (0, 1, 2)

Categorical

Thalach

Maximum heart rate achieved

Numerical

Exang

Exercise induced angina (1 = yes; 0 = no)

Categorical

Oldpeak

ST depression induced by exercise relative to rest

Numerical

Slope

Slope of the peak exercise ST segment

Numerical

CA

Number of major vessels (0-3) colored by fluoroscopy

Both numerical & categorical

Thal

3 = normal; 6 = fixed defect; 7 = reversible defect

Categorical

Target

Diagnosis of heart disease (1 = true; 0 = false)

Target

[5]:
def load_data():
    file_url = "http://storage.googleapis.com/download.tensorflow.org/data/heart.csv"
    dataframe = pd.read_csv(file_url)

    val_dataframe = dataframe.sample(frac=0.2, random_state=1337)
    train_dataframe = dataframe.drop(val_dataframe.index)

    return train_dataframe, val_dataframe


def dataframe_to_dataset(dataframe):
    dataframe = dataframe.copy()
    labels = dataframe.pop("target")
    ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))
    ds = ds.shuffle(buffer_size=len(dataframe))
    return ds

2.3. Preprocessing & encoding of features#

The next cells use tf.keras.layers.Normalization() to apply standard scaling on the features.

Then, the tf.keras.layers.StringLookup and tf.keras.layers.IntegerLookup are used to encode categorical variables.

[6]:
def encode_numerical_feature(feature, name, dataset):
    # Create a Normalization layer for our feature
    normalizer = tf.keras.layers.Normalization()

    # Prepare a Dataset that only yields our feature
    feature_ds = dataset.map(lambda x, y: x[name])
    feature_ds = feature_ds.map(lambda x: tf.expand_dims(x, -1))

    # Learn the statistics of the data
    normalizer.adapt(feature_ds)

    # Normalize the input feature
    encoded_feature = normalizer(feature)
    return encoded_feature


def encode_categorical_feature(feature, name, dataset, is_string):
    lookup_class = (
        tf.keras.layers.StringLookup if is_string else tf.keras.layers.IntegerLookup
    )
    # Create a lookup layer which will turn strings into integer indices
    lookup = lookup_class(output_mode="binary")

    # Prepare a Dataset that only yields our feature
    feature_ds = dataset.map(lambda x, y: x[name])
    feature_ds = feature_ds.map(lambda x: tf.expand_dims(x, -1))

    # Learn the set of possible string values and assign them a fixed integer index
    lookup.adapt(feature_ds)

    # Turn the string input into integer indices
    encoded_feature = lookup(feature)
    return encoded_feature

2.4. Define the run-function#

The run-function defines how the objective that we want to maximize is computed. It takes a config dictionary as input and often returns a scalar value that we want to maximize. The config contains a sample value of hyperparameters that we want to tune. In this example we will search for:

  • units (default value: 32)

  • activation (default value: "relu")

  • dropout_rate (default value: 0.5)

  • num_epochs (default value: 50)

  • batch_size (default value: 32)

  • learning_rate (default value: 1e-3)

A hyperparameter value can be acessed easily in the dictionary through the corresponding key, for example config["units"].

[7]:
def run(config: dict):
    tf.autograph.set_verbosity(0)
    # Load data and split into validation set
    train_dataframe, val_dataframe = load_data()
    train_ds = dataframe_to_dataset(train_dataframe)
    val_ds = dataframe_to_dataset(val_dataframe)
    train_ds = train_ds.batch(config["batch_size"])
    val_ds = val_ds.batch(config["batch_size"])

    # Categorical features encoded as integers
    sex = tf.keras.Input(shape=(1,), name="sex", dtype="int64")
    cp = tf.keras.Input(shape=(1,), name="cp", dtype="int64")
    fbs = tf.keras.Input(shape=(1,), name="fbs", dtype="int64")
    restecg = tf.keras.Input(shape=(1,), name="restecg", dtype="int64")
    exang = tf.keras.Input(shape=(1,), name="exang", dtype="int64")
    ca = tf.keras.Input(shape=(1,), name="ca", dtype="int64")

    # Categorical feature encoded as string
    thal = tf.keras.Input(shape=(1,), name="thal", dtype="string")

    # Numerical features
    age = tf.keras.Input(shape=(1,), name="age")
    trestbps = tf.keras.Input(shape=(1,), name="trestbps")
    chol = tf.keras.Input(shape=(1,), name="chol")
    thalach = tf.keras.Input(shape=(1,), name="thalach")
    oldpeak = tf.keras.Input(shape=(1,), name="oldpeak")
    slope = tf.keras.Input(shape=(1,), name="slope")

    all_inputs = [
        sex,
        cp,
        fbs,
        restecg,
        exang,
        ca,
        thal,
        age,
        trestbps,
        chol,
        thalach,
        oldpeak,
        slope,
    ]

    # Integer categorical features
    sex_encoded = encode_categorical_feature(sex, "sex", train_ds, False)
    cp_encoded = encode_categorical_feature(cp, "cp", train_ds, False)
    fbs_encoded = encode_categorical_feature(fbs, "fbs", train_ds, False)
    restecg_encoded = encode_categorical_feature(restecg, "restecg", train_ds, False)
    exang_encoded = encode_categorical_feature(exang, "exang", train_ds, False)
    ca_encoded = encode_categorical_feature(ca, "ca", train_ds, False)

    # String categorical features
    thal_encoded = encode_categorical_feature(thal, "thal", train_ds, True)

    # Numerical features
    age_encoded = encode_numerical_feature(age, "age", train_ds)
    trestbps_encoded = encode_numerical_feature(trestbps, "trestbps", train_ds)
    chol_encoded = encode_numerical_feature(chol, "chol", train_ds)
    thalach_encoded = encode_numerical_feature(thalach, "thalach", train_ds)
    oldpeak_encoded = encode_numerical_feature(oldpeak, "oldpeak", train_ds)
    slope_encoded = encode_numerical_feature(slope, "slope", train_ds)

    all_features = tf.keras.layers.concatenate(
        [
            sex_encoded,
            cp_encoded,
            fbs_encoded,
            restecg_encoded,
            exang_encoded,
            slope_encoded,
            ca_encoded,
            thal_encoded,
            age_encoded,
            trestbps_encoded,
            chol_encoded,
            thalach_encoded,
            oldpeak_encoded,
        ]
    )
    x = tf.keras.layers.Dense(config["units"], activation=config["activation"])(
        all_features
    )
    x = tf.keras.layers.Dropout(config["dropout_rate"])(x)
    output = tf.keras.layers.Dense(1, activation="sigmoid")(x)
    model = tf.keras.Model(all_inputs, output)

    optimizer = tf.keras.optimizers.Adam(learning_rate=config["learning_rate"])
    model.compile(optimizer, "binary_crossentropy", metrics=["accuracy"])

    history = model.fit(
        train_ds, epochs=config["num_epochs"], validation_data=val_ds, verbose=0
    )

    return history.history["val_accuracy"][-1]
Note

The objective maximised by DeepHyper is the scalar value returned by the run-function.

In this tutorial it corresponds to the validation accuracy of the last epoch of training which we retrieve in the History object returned by the model.fit(...) call.

...
history = model.fit(
    train_ds, epochs=config["num_epochs"], validation_data=val_ds, verbose=0
)
return history.history["val_accuracy"][-1]
...

Using an objective like max(history.history['val_accuracy']) can have undesired side effects.

For example, it is possible that the training curves will overshoot a local maximum, resulting in a model without the capacity to flexibly adapt to new data in the future.

2.5. Define the Hyperparameter optimization problem#

Hyperparameter ranges are defined using the following syntax:

  • Discrete integer ranges are generated from a tuple (lower: int, upper: int)

  • Continuous prarameters are generated from a tuple (lower: float, upper: float)

  • Categorical or nonordinal hyperparameter ranges can be given as a list of possible values [val1, val2, ...]

[8]:
from deephyper.problem import HpProblem


# Creation of an hyperparameter problem
problem = HpProblem()

# Discrete hyperparameter (sampled with uniform prior)
problem.add_hyperparameter((8, 128), "units", default_value=32)
problem.add_hyperparameter((10, 100), "num_epochs", default_value=50)


# Categorical hyperparameter (sampled with uniform prior)
ACTIVATIONS = [
    "elu", "gelu", "hard_sigmoid", "linear", "relu", "selu",
    "sigmoid", "softplus", "softsign", "swish", "tanh",
]
problem.add_hyperparameter(ACTIVATIONS, "activation", default_value="relu")


# Real hyperparameter (sampled with uniform prior)
problem.add_hyperparameter((0.0, 0.6), "dropout_rate", default_value=0.5)


# Discrete and Real hyperparameters (sampled with log-uniform)
problem.add_hyperparameter((8, 256, "log-uniform"), "batch_size", default_value=32)
problem.add_hyperparameter((1e-5, 1e-2, "log-uniform"), "learning_rate", default_value=1e-3)

problem
[8]:
Configuration space object:
  Hyperparameters:
    activation, Type: Categorical, Choices: {elu, gelu, hard_sigmoid, linear, relu, selu, sigmoid, softplus, softsign, swish, tanh}, Default: relu
    batch_size, Type: UniformInteger, Range: [8, 256], Default: 32, on log-scale
    dropout_rate, Type: UniformFloat, Range: [0.0, 0.6], Default: 0.5
    learning_rate, Type: UniformFloat, Range: [1e-05, 0.01], Default: 0.001, on log-scale
    num_epochs, Type: UniformInteger, Range: [10, 100], Default: 50
    units, Type: UniformInteger, Range: [8, 128], Default: 32

2.6. Evaluate a default configuration#

We evaluate the performance of the default set of hyperparameters provided in the Keras tutorial.

[9]:
import ray


# We launch the Ray run-time depending of the detected local ressources
# and execute the `run` function with the default configuration
# WARNING: in the case of GPUs it is important to follow this scheme
# to avoid multiple processes (Ray workers vs current process) to lock
# the same GPU.
if is_gpu_available:
    if not(ray.is_initialized()):
        ray.init(num_cpus=n_gpus, num_gpus=n_gpus, log_to_driver=False)

    run_default = ray.remote(num_cpus=1, num_gpus=1)(run)
    objective_default = ray.get(run_default.remote(problem.default_configuration))
else:
    if not(ray.is_initialized()):
        ray.init(num_cpus=1, log_to_driver=False)
    run_default = run
    objective_default = run_default(problem.default_configuration)

print(f"Accuracy Default Configuration:  {objective_default:.3f}")
2023-01-30 12:17:37,785 INFO worker.py:1509 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 
Accuracy Default Configuration:  0.820

2.7. Define the evaluator object#

The Evaluator object allows to change the parallelization backend used by DeepHyper.
It is a standalone object which schedules the execution of remote tasks. All evaluators needs a run_function to be instantiated.
Then a keyword method defines the backend (e.g., "ray") and the method_kwargs corresponds to keyword arguments of this chosen method.
evaluator = Evaluator.create(run_function, method, method_kwargs)

Once created the evaluator.num_workers gives access to the number of available parallel workers.

Finally, to submit and collect tasks to the evaluator one just needs to use the following interface:

configs = [{"units": 8, ...}, ...]
evaluator.submit(configs)
...
# To collect the first finished task (asynchronous)
tasks_done = evaluator.get("BATCH", size=1)

# To collect all of the pending tasks (synchronous)
tasks_done = evaluator.get("ALL")

Warning

Each Evaluator saves its own state, therefore it is crucial to create a new evaluator when launching a fresh search.

[10]:
from deephyper.evaluator import Evaluator
from deephyper.evaluator.callback import TqdmCallback


def get_evaluator(run_function):
    # Default arguments for Ray: 1 worker and 1 worker per evaluation
    method_kwargs = {
        "num_cpus": 1,
        "num_cpus_per_task": 1,
        "callbacks": [TqdmCallback()]
    }

    # If GPU devices are detected then it will create 'n_gpus' workers
    # and use 1 worker for each evaluation
    if is_gpu_available:
        method_kwargs["num_cpus"] = n_gpus
        method_kwargs["num_gpus"] = n_gpus
        method_kwargs["num_cpus_per_task"] = 1
        method_kwargs["num_gpus_per_task"] = 1

    evaluator = Evaluator.create(
        run_function,
        method="ray",
        method_kwargs=method_kwargs
    )
    print(f"Created new evaluator with {evaluator.num_workers} worker{'s' if evaluator.num_workers > 1 else ''} and config: {method_kwargs}", )

    return evaluator

evaluator_1 = get_evaluator(run)
Created new evaluator with 1 worker and config: {'num_cpus': 1, 'num_cpus_per_task': 1, 'callbacks': [<deephyper.evaluator.callback.TqdmCallback object at 0x16a8533a0>]}
/Users/romainegele/Documents/Argonne/deephyper/deephyper/evaluator/_evaluator.py:126: UserWarning: Applying nest-asyncio patch for IPython Shell!
  warnings.warn(

2.8. Define and run the centralized Bayesian optimization search (CBO)#

A primary pillar of hyperparameter search in DeepHyper is given by a centralized Bayesian optimization search (henceforth CBO). CBO may be described in the following algorithm:

6b4214f43e534ac4af7b9340aa051b1c


Following the parallelized evaluation of these configurations, a low-fidelity and high efficiency model (henceforth “the surrogate”) is devised to reproduce the relationship between the input variables involved in the model (i.e., the choice of hyperparameters) and the outputs (which are generally a measure of validation data accuracy).

After obtaining this surrogate of the validation accuracy, we may utilize ideas from classical methods in Bayesian optimization literature for adaptively sample the search space of hyperparameters.

First, the surrogate is used to obtain an estimate for the mean value of the validation accuracy at a certain sampling location \(x\) in addition to an estimated variance. The latter requirement restricts us to the use of high efficiency data-driven modeling strategies that have inbuilt variance estimates (such as a Gaussian process or Random Forest regressor).

Regions where the mean is high represent opportunities for exploitation and regions where the variance is high represent opportunities for exploration. An optimistic acquisition function called UCB can be constructed using these two quantities:

\[L_{\text{UCB}}(x) = \mu(x) + \kappa \cdot \sigma(x)\]

The unevaluated hyperparameter configurations that maximize the acquisition function are chosen for the next batch of evaluations.

Note that the choice of the variance weighting parameter \(\kappa\) controls the degree of exploration in the hyperparameter search with zero indicating purely exploitation (unseen configurations where the predicted accuracy is highest will be sampled).

The top s configurations are selected for the new batch. The following schematic demonstrates this process:

d098ecab542e405cb086213efd200f96

The process of obtaining s configurations relies on the “constant-liar” strategy where a sampled configuration is mapped to a dummy output given by a bulk metric of all the evaluated configurations thus far (such as the maximum, mean or median validation accuracy).

Prior to sampling the next configuration by acquisition function maximization, the surrogate is retrained with the dummy output as a data point. As the true validation accuracy becomes available for one of the sampled configurations, the dummy output is replaced and the surrogate is updated.

This allows for scalable asynchronous (or batch synchronous) sampling of new hyperparameter configurations.

2.8.1. Choice of surrogate model#

Users should note that our choice of the surrogate is given by the Random Forest regressor due to its ability to handle non-ordinal data (hyperparameter configurations may not be purely continuous or even numerical). Evidence for how they outperform other methods (such as Gaussian processes) is also available in [1]

e1878e940832498f9d48128f6e510930

5c3bb53939f947a18e716b4004841010

2.8.1.1. Setup CBO#

We create the CBO using the problem and evaluator defined above.

[11]:
from deephyper.search.hps import CBO
# Uncomment the following line to show the arguments of CBO.
# help(CBO)
[12]:
# Instanciate the search with the problem and the evaluator that we created before

search = CBO(problem, evaluator_1, initial_points=[problem.default_configuration])

Note

All DeepHyper’s search algorithm have two stopping criteria:

        <li> <code>`max_evals (int)`</code>: Defines the maximum number of evaluations that we want to perform. Default to <code>-1</code> for an infinite number.</li>
        <li> <code>`timeout (int)`</code>: Defines a time budget (in seconds) before stopping the search. Default to <code>None</code> for an infinite time budget.</li>
    </ul>
    
[13]:
results = search.search(max_evals=10)

Warning

The search call does not output any information about the current status of the search. However, a results.csv file is created in the local directly and can be visualized to see finished tasks.

The returned results is a Pandas Dataframe where columns starting by "p:" are hyperparameters, columns starting by "m:" are additional metadata (from the user or from the Evaluator) as well as the objective value and the job_id:

  • job_id is a unique identifier corresponding to the order of creation of tasks.

  • objective is the value returned by the run-function.

  • m:timestamp_submit is the time (in seconds) when the task was created by the evaluator since the creation of the evaluator.

  • m:timestamp_gather is the time (in seconds) when the task was received after finishing by the evaluator since the creation of the evaluator.

[14]:
results
[14]:
p:activation p:batch_size p:dropout_rate p:learning_rate p:num_epochs p:units objective job_id m:timestamp_submit m:timestamp_gather
0 relu 32 0.500000 0.001000 50 32 0.819672 0 1.673296 5.423881
1 elu 196 0.442914 0.000019 33 122 0.540984 1 5.472432 6.810891
2 elu 65 0.304934 0.002111 19 8 0.803279 2 6.840086 8.020196
3 relu 14 0.574456 0.005709 55 42 0.770492 3 8.049471 10.171254
4 softplus 50 0.027865 0.000235 51 87 0.803279 4 10.302291 11.826399
5 linear 220 0.196513 0.000031 38 84 0.704918 5 11.855185 13.540711
6 hard_sigmoid 93 0.567920 0.000735 14 73 0.770492 6 13.569894 14.862006
7 tanh 127 0.233563 0.000239 74 32 0.803279 7 14.890817 16.443914
8 linear 21 0.569421 0.004058 47 84 0.819672 8 16.472753 18.266434
9 relu 61 0.383737 0.000045 18 14 0.557377 9 18.295845 19.519711

The search can be continued without any issue.

[15]:
results = search.search(max_evals=5)

results
[15]:
p:activation p:batch_size p:dropout_rate p:learning_rate p:num_epochs p:units objective job_id m:timestamp_submit m:timestamp_gather
0 relu 32 0.500000 0.001000 50 32 0.819672 0 1.673296 5.423881
1 elu 196 0.442914 0.000019 33 122 0.540984 1 5.472432 6.810891
2 elu 65 0.304934 0.002111 19 8 0.803279 2 6.840086 8.020196
3 relu 14 0.574456 0.005709 55 42 0.770492 3 8.049471 10.171254
4 softplus 50 0.027865 0.000235 51 87 0.803279 4 10.302291 11.826399
5 linear 220 0.196513 0.000031 38 84 0.704918 5 11.855185 13.540711
6 hard_sigmoid 93 0.567920 0.000735 14 73 0.770492 6 13.569894 14.862006
7 tanh 127 0.233563 0.000239 74 32 0.803279 7 14.890817 16.443914
8 linear 21 0.569421 0.004058 47 84 0.819672 8 16.472753 18.266434
9 relu 61 0.383737 0.000045 18 14 0.557377 9 18.295845 19.519711
10 softsign 211 0.467029 0.000039 28 50 0.229508 10 61.149681 62.913303
11 elu 27 0.571352 0.001114 48 12 0.836066 11 63.053085 64.710794
12 elu 11 0.446366 0.001394 53 11 0.803279 12 64.924019 67.180369
13 elu 28 0.583510 0.000013 44 17 0.393443 13 67.321455 69.016971
14 elu 18 0.594079 0.006243 67 14 0.786885 14 69.156168 71.279930

Now that the search is over, let us print the best configuration found during this run.

[17]:
i_max = results.objective.argmax()
best_config = results.iloc[i_max][:-3].to_dict()


print(f"The default configuration has an accuracy of {objective_default:.3f}. \n"
      f"The best configuration found by DeepHyper has an accuracy {results['objective'].iloc[i_max]:.3f}, \n"
      f"discovered after {results['m:timestamp_gather'].iloc[i_max]:.2f} secondes of search.\n")

best_config
The default configuration has an accuracy of 0.820.
The best configuration found by DeepHyper has an accuracy 0.836,
discovered after 64.71 secondes of search.

[17]:
{'p:activation': 'elu',
 'p:batch_size': 27,
 'p:dropout_rate': 0.5713520400329467,
 'p:learning_rate': 0.0011143178363802,
 'p:num_epochs': 48,
 'p:units': 12,
 'objective': 0.8360655903816223}

2.9. Restart from a checkpoint#

It can often be useful to continue the search from previous results. For example, if the allocation requested was not enough or if an unexpected crash happened. The AMBS searhc provides the fit_surrogate(dataframe_of_results) method for this use case.

To simulate this we create a second evaluator evaluator_2 and start a fresh AMBS search with strong explotation kappa=0.001.

[18]:
# Create a new evaluator
evaluator_2 = get_evaluator(run)

# Create a new AMBS search with strong explotation (i.e., small kappa)
search_from_checkpoint = CBO(problem, evaluator_2, kappa=0.001)

# Initialize surrogate model of Bayesian optization (in AMBS)
# With results of previous search
search_from_checkpoint.fit_surrogate(results)
Created new evaluator with 1 worker and config: {'num_cpus': 1, 'num_cpus_per_task': 1, 'callbacks': [<deephyper.evaluator.callback.TqdmCallback object at 0x16c69cd00>]}
[19]:
results_from_checkpoint = search_from_checkpoint.search(max_evals=10)
E0130 12:20:53.838909000 10922094592 chttp2_transport.cc:1167]         Received a GOAWAY with error code ENHANCE_YOUR_CALM and debug data equal to "too_many_pings"
[20]:
results_from_checkpoint
[20]:
p:activation p:batch_size p:dropout_rate p:learning_rate p:num_epochs p:units objective job_id m:timestamp_submit m:timestamp_gather
0 elu 9 0.563147 0.002584 50 8 0.786885 0 4.881641 8.339313
1 elu 27 0.586640 0.001353 64 10 0.819672 1 8.489422 10.275765
2 gelu 28 0.534305 0.001184 48 19 0.836066 2 10.421549 12.151974
3 gelu 9 0.520566 0.002797 48 14 0.770492 3 12.299520 14.649483
4 gelu 22 0.376572 0.000640 47 21 0.819672 4 14.796534 16.606915
5 tanh 43 0.571578 0.001132 18 13 0.803279 5 16.754099 17.986947
6 gelu 15 0.549082 0.001329 47 19 0.819672 6 18.228396 20.232561
7 elu 27 0.486335 0.001134 83 127 0.803279 7 20.380729 22.511694
8 gelu 47 0.534682 0.000790 47 20 0.852459 8 22.663078 24.264377
9 gelu 137 0.533913 0.000648 92 21 0.819672 9 24.420509 26.063546
[22]:
i_max = results_from_checkpoint.objective.argmax()
best_config = results_from_checkpoint.iloc[i_max][:-3].to_dict()

print(f"The default configuration has an accuracy of {objective_default:.3f}. "
      f"The best configuration found by DeepHyper has an accuracy {results_from_checkpoint['objective'].iloc[i_max]:.3f}, "
      f"finished after {results_from_checkpoint['m:timestamp_gather'].iloc[i_max]:.2f} secondes of search.")

best_config
The default configuration has an accuracy of 0.820. The best configuration found by DeepHyper has an accuracy 0.852, finished after 24.26 secondes of search.
[22]:
{'p:activation': 'gelu',
 'p:batch_size': 47,
 'p:dropout_rate': 0.5346818686230272,
 'p:learning_rate': 0.0007899995372786,
 'p:num_epochs': 47,
 'p:units': 20,
 'objective': 0.8524590134620667}

2.10. Add conditional hyperparameters#

Now we want to add the possibility to search for a second fully-connected layer. We simply add two new lines:

if config.get("dense_2", False):
    x = tf.keras.layers.Dense(config["dense_2:units"], activation=config["dense_2:activation"])(x)
[23]:
def run_with_condition(config: dict):
    tf.autograph.set_verbosity(0)

    train_dataframe, val_dataframe = load_data()

    train_ds = dataframe_to_dataset(train_dataframe)
    val_ds = dataframe_to_dataset(val_dataframe)

    train_ds = train_ds.batch(config["batch_size"])
    val_ds = val_ds.batch(config["batch_size"])

    # Categorical features encoded as integers
    sex = tf.keras.Input(shape=(1,), name="sex", dtype="int64")
    cp = tf.keras.Input(shape=(1,), name="cp", dtype="int64")
    fbs = tf.keras.Input(shape=(1,), name="fbs", dtype="int64")
    restecg = tf.keras.Input(shape=(1,), name="restecg", dtype="int64")
    exang = tf.keras.Input(shape=(1,), name="exang", dtype="int64")
    ca = tf.keras.Input(shape=(1,), name="ca", dtype="int64")

    # Categorical feature encoded as string
    thal = tf.keras.Input(shape=(1,), name="thal", dtype="string")

    # Numerical features
    age = tf.keras.Input(shape=(1,), name="age")
    trestbps = tf.keras.Input(shape=(1,), name="trestbps")
    chol = tf.keras.Input(shape=(1,), name="chol")
    thalach = tf.keras.Input(shape=(1,), name="thalach")
    oldpeak = tf.keras.Input(shape=(1,), name="oldpeak")
    slope = tf.keras.Input(shape=(1,), name="slope")

    all_inputs = [
        sex,
        cp,
        fbs,
        restecg,
        exang,
        ca,
        thal,
        age,
        trestbps,
        chol,
        thalach,
        oldpeak,
        slope,
    ]

    # Integer categorical features
    sex_encoded = encode_categorical_feature(sex, "sex", train_ds, False)
    cp_encoded = encode_categorical_feature(cp, "cp", train_ds, False)
    fbs_encoded = encode_categorical_feature(fbs, "fbs", train_ds, False)
    restecg_encoded = encode_categorical_feature(restecg, "restecg", train_ds, False)
    exang_encoded = encode_categorical_feature(exang, "exang", train_ds, False)
    ca_encoded = encode_categorical_feature(ca, "ca", train_ds, False)

    # String categorical features
    thal_encoded = encode_categorical_feature(thal, "thal", train_ds, True)

    # Numerical features
    age_encoded = encode_numerical_feature(age, "age", train_ds)
    trestbps_encoded = encode_numerical_feature(trestbps, "trestbps", train_ds)
    chol_encoded = encode_numerical_feature(chol, "chol", train_ds)
    thalach_encoded = encode_numerical_feature(thalach, "thalach", train_ds)
    oldpeak_encoded = encode_numerical_feature(oldpeak, "oldpeak", train_ds)
    slope_encoded = encode_numerical_feature(slope, "slope", train_ds)

    all_features = tf.keras.layers.concatenate(
        [
            sex_encoded,
            cp_encoded,
            fbs_encoded,
            restecg_encoded,
            exang_encoded,
            slope_encoded,
            ca_encoded,
            thal_encoded,
            age_encoded,
            trestbps_encoded,
            chol_encoded,
            thalach_encoded,
            oldpeak_encoded,
        ]
    )
    x = tf.keras.layers.Dense(config["units"], activation=config["activation"])(
        all_features
    )

    ### START - NEW LINES
    if config.get("dense_2", False):
        x = tf.keras.layers.Dense(config["dense_2:units"], activation=config["dense_2:activation"])(x)
    ### END - NEW LINES

    x = tf.keras.layers.Dropout(config["dropout_rate"])(x)
    output = tf.keras.layers.Dense(1, activation="sigmoid")(x)
    model = tf.keras.Model(all_inputs, output)

    optimizer = tf.keras.optimizers.Adam(learning_rate=config["learning_rate"])
    model.compile(optimizer, "binary_crossentropy", metrics=["accuracy"])

    history = model.fit(
        train_ds, epochs=config["num_epochs"], validation_data=val_ds, verbose=0
    )

    return history.history["val_accuracy"][-1]

To define conditionnal hyperparameters we use ConfigSpace. We define dense_2:units and dense_2:activation as active hyperparameters only when dense_2 == True. The cs.EqualsCondition help us do that. Then we call

problem_with_condition.add_condition(condition)

to register each new condition to the HpProblem.

[24]:
from deephyper.problem import EqualsCondition

# Define the hyperparameter problem
problem_with_condition = HpProblem()


# Define the same hyperparameters as before
problem_with_condition.add_hyperparameter((8, 128), "units")
problem_with_condition.add_hyperparameter(ACTIVATIONS, "activation")
problem_with_condition.add_hyperparameter((0.0, 0.6), "dropout_rate")
problem_with_condition.add_hyperparameter((10, 100), "num_epochs")
problem_with_condition.add_hyperparameter((8, 256, "log-uniform"), "batch_size")
problem_with_condition.add_hyperparameter((1e-5, 1e-2, "log-uniform"), "learning_rate")


# Add a new hyperparameter "dense_2 (bool)" to decide if a second fully-connected layer should be created
hp_dense_2 = problem_with_condition.add_hyperparameter([True, False], "dense_2")
hp_dense_2_units = problem_with_condition.add_hyperparameter((8, 128), "dense_2:units")
hp_dense_2_activation = problem_with_condition.add_hyperparameter(ACTIVATIONS, "dense_2:activation")

problem_with_condition.add_condition(EqualsCondition(hp_dense_2_units, hp_dense_2, True))
problem_with_condition.add_condition(EqualsCondition(hp_dense_2_activation, hp_dense_2, True))


problem_with_condition
[24]:
Configuration space object:
  Hyperparameters:
    activation, Type: Categorical, Choices: {elu, gelu, hard_sigmoid, linear, relu, selu, sigmoid, softplus, softsign, swish, tanh}, Default: elu
    batch_size, Type: UniformInteger, Range: [8, 256], Default: 45, on log-scale
    dense_2, Type: Categorical, Choices: {True, False}, Default: True
    dense_2:activation, Type: Categorical, Choices: {elu, gelu, hard_sigmoid, linear, relu, selu, sigmoid, softplus, softsign, swish, tanh}, Default: elu
    dense_2:units, Type: UniformInteger, Range: [8, 128], Default: 68
    dropout_rate, Type: UniformFloat, Range: [0.0, 0.6], Default: 0.3
    learning_rate, Type: UniformFloat, Range: [1e-05, 0.01], Default: 0.0003162278, on log-scale
    num_epochs, Type: UniformInteger, Range: [10, 100], Default: 55
    units, Type: UniformInteger, Range: [8, 128], Default: 68
  Conditions:
    dense_2:activation | dense_2 == True
    dense_2:units | dense_2 == True

We create a new evaluator evaluator_3 and start a fresh AMBS search with this new problem problem_with_condition.

[25]:
evaluator_3 = get_evaluator(run_with_condition)

search_with_condition = CBO(problem_with_condition, evaluator_3)
Created new evaluator with 1 worker and config: {'num_cpus': 1, 'num_cpus_per_task': 1, 'callbacks': [<deephyper.evaluator.callback.TqdmCallback object at 0x17e3c8250>]}
[26]:
results_with_condition = search_with_condition.search(max_evals=10)
[27]:
results_with_condition
[27]:
p:activation p:batch_size p:dense_2 p:dropout_rate p:learning_rate p:num_epochs p:units p:dense_2:activation p:dense_2:units objective job_id m:timestamp_submit m:timestamp_gather
0 gelu 24 False 0.522131 0.001057 44 8 NaN NaN 0.852459 0 1.576017 3.662161
1 softsign 47 True 0.377181 0.000019 47 127 elu 75.0 0.770492 1 3.888727 5.539755
2 hard_sigmoid 25 False 0.333931 0.000461 12 33 NaN NaN 0.770492 2 5.759022 7.123170
3 softsign 8 True 0.203617 0.000153 77 24 softplus 20.0 0.819672 3 7.439673 11.098140
4 tanh 232 False 0.544339 0.000236 17 15 NaN NaN 0.377049 4 11.318734 12.497572
5 linear 31 True 0.431446 0.002055 27 114 tanh 69.0 0.803279 5 12.721716 14.184543
6 swish 185 True 0.143728 0.000066 36 126 relu 88.0 0.786885 6 14.402191 15.902531
7 selu 85 True 0.513058 0.000975 20 24 elu 77.0 0.836066 7 16.216778 17.592724
8 softplus 256 False 0.541869 0.000032 44 123 NaN NaN 0.229508 8 17.817622 19.283510
9 hard_sigmoid 57 True 0.221024 0.000014 77 101 selu 56.0 0.770492 9 19.503178 21.415980

Finally, let us print out the best configuration found from this conditionned search space.

[29]:
i_max = results_with_condition.objective.argmax()
best_config = results_with_condition.iloc[i_max][:-3].to_dict()

print(f"The default configuration has an accuracy of {objective_default:.3f}. "
      f"The best configuration found by DeepHyper has an accuracy {results_with_condition['objective'].iloc[i_max]:.3f}, "
      f"finished after {results_with_condition['m:timestamp_gather'].iloc[i_max]:.2f} seconds of search.")

best_config
The default configuration has an accuracy of 0.820. The best configuration found by DeepHyper has an accuracy 0.852, finished after 3.66 seconds of search.
[29]:
{'p:activation': 'gelu',
 'p:batch_size': 24,
 'p:dense_2': False,
 'p:dropout_rate': 0.5221310827728567,
 'p:learning_rate': 0.0010573740607372,
 'p:num_epochs': 44,
 'p:units': 8,
 'p:dense_2:activation': nan,
 'p:dense_2:units': nan,
 'objective': 0.8524590134620667}