Notify Failures in Hyperparameter optimization#

Author(s): Romain Egele.

This example demonstrates how to handle failure of objectives in hyperparameter search. In many cases such as software auto-tuning (where we minimize the run-time of a software application) some configurations can create run-time errors and therefore no scalar objective is returned. A default choice could be to return in this case the worst case objective if known and it can be done inside the run-function. Other possibilites are to ignore these configurations or to replace them with the running mean/min objective. To illustrate such a use-case we define an artificial run-function which will fail when one of its input parameters is greater than 0.5. To define a failure, it is possible to return a “string” value with "F" as prefix such as:

def run(config: dict) -> float:
    if config["y"] > 0.5:
        return "F_postfix"
    else:
        return config["x"]

Then, we define the corresponding hyperparameter problem where x is the value to maximize and y is a value impact the appearance of failures.

from deephyper.problem import HpProblem

problem = HpProblem()
problem.add_hyperparameter([1, 2, 4, 8, 16, 32], "x")
problem.add_hyperparameter((0.0, 1.0), "y")

print(problem)

Out:

Configuration space object:
  Hyperparameters:
    x, Type: Ordinal, Sequence: {1, 2, 4, 8, 16, 32}, Default: 1
    y, Type: UniformFloat, Range: [0.0, 1.0], Default: 0.5

Then, we define a centralized Bayesian optimization (CBO) search (i.e., master-worker architecture) which uses the Random-Forest regressor as default surrogate model. We will compare the ignore strategy which filters-out failed configurations, the mean strategy which replaces a failure by the running mean of collected objectives and the min strategy which replaces by the running min of collected objectives.

from deephyper.search.hps import CBO
from deephyper.evaluator import Evaluator
from deephyper.evaluator.callback import TqdmCallback

results = {}
max_evals = 30
for failure_strategy in ["ignore", "mean", "min"]:
    # for failure_strategy in ["min"]:
    print(f"Executing failure strategy: {failure_strategy}")
    evaluator = Evaluator.create(
        run, method="serial", method_kwargs={"callbacks": [TqdmCallback()]}
    )
    search = CBO(
        problem,
        evaluator,
        filter_failures=failure_strategy,
        log_dir=f"search_{failure_strategy}",
        random_state=42,
    )
    results[failure_strategy] = search.search(max_evals)

Out:

Executing failure strategy: ignore

  0%|          | 0/30 [00:00<?, ?it/s]
  3%|3         | 1/30 [00:00<00:00, 3366.22it/s, failures=1, objective=None]
  7%|6         | 2/30 [00:00<00:00, 132.20it/s, failures=1, objective=16]
 10%|#         | 3/30 [00:00<00:00, 112.23it/s, failures=2, objective=16]
 13%|#3        | 4/30 [00:00<00:00, 105.91it/s, failures=2, objective=16]
 17%|#6        | 5/30 [00:00<00:00, 102.96it/s, failures=2, objective=32]
 20%|##        | 6/30 [00:00<00:00, 101.04it/s, failures=3, objective=32]
 23%|##3       | 7/30 [00:00<00:00, 47.14it/s, failures=3, objective=32]
 23%|##3       | 7/30 [00:00<00:00, 47.14it/s, failures=3, objective=32]
 27%|##6       | 8/30 [00:00<00:00, 47.14it/s, failures=3, objective=32]
 30%|###       | 9/30 [00:00<00:00, 47.14it/s, failures=3, objective=32]
 33%|###3      | 10/30 [00:00<00:00, 47.14it/s, failures=3, objective=32]
 37%|###6      | 11/30 [00:00<00:00, 47.14it/s, failures=4, objective=32]
 40%|####      | 12/30 [00:00<00:00, 47.14it/s, failures=4, objective=32]
 43%|####3     | 13/30 [00:00<00:00, 47.14it/s, failures=4, objective=32]
 47%|####6     | 14/30 [00:00<00:00, 48.29it/s, failures=4, objective=32]
 47%|####6     | 14/30 [00:00<00:00, 48.29it/s, failures=5, objective=32]
 50%|#####     | 15/30 [00:00<00:00, 48.29it/s, failures=5, objective=32]
 53%|#####3    | 16/30 [00:00<00:00, 48.29it/s, failures=6, objective=32]
 57%|#####6    | 17/30 [00:00<00:00, 48.29it/s, failures=7, objective=32]
 60%|######    | 18/30 [00:00<00:00, 48.29it/s, failures=8, objective=32]
 63%|######3   | 19/30 [00:00<00:00, 34.91it/s, failures=8, objective=32]
 63%|######3   | 19/30 [00:00<00:00, 34.91it/s, failures=9, objective=32]
 67%|######6   | 20/30 [00:00<00:00, 34.91it/s, failures=10, objective=32]
 70%|#######   | 21/30 [00:00<00:00, 34.91it/s, failures=11, objective=32]
 73%|#######3  | 22/30 [00:00<00:00, 34.91it/s, failures=12, objective=32]
 77%|#######6  | 23/30 [00:00<00:00, 34.91it/s, failures=13, objective=32]
 80%|########  | 24/30 [00:00<00:00, 34.91it/s, failures=14, objective=32]
 83%|########3 | 25/30 [00:00<00:00, 34.91it/s, failures=15, objective=32]
 87%|########6 | 26/30 [00:00<00:00, 34.91it/s, failures=16, objective=32]
 90%|######### | 27/30 [00:00<00:00, 34.91it/s, failures=17, objective=32]
 93%|#########3| 28/30 [00:00<00:00, 34.91it/s, failures=18, objective=32]
 97%|#########6| 29/30 [00:00<00:00, 34.91it/s, failures=19, objective=32]
100%|##########| 30/30 [00:00<00:00, 34.91it/s, failures=20, objective=32]Executing failure strategy: mean


  0%|          | 0/30 [00:00<?, ?it/s]

  3%|3         | 1/30 [00:00<00:00, 38130.04it/s, failures=1, objective=None]

  7%|6         | 2/30 [00:00<00:00, 180.59it/s, failures=1, objective=16]

 10%|#         | 3/30 [00:00<00:00, 136.75it/s, failures=2, objective=16]

 13%|#3        | 4/30 [00:00<00:00, 121.90it/s, failures=2, objective=16]
100%|##########| 30/30 [00:00<00:00, 48.74it/s, failures=20, objective=32]


 17%|#6        | 5/30 [00:00<00:00, 45.34it/s, failures=2, objective=16]

 17%|#6        | 5/30 [00:00<00:00, 45.34it/s, failures=2, objective=32]

 20%|##        | 6/30 [00:00<00:00, 45.34it/s, failures=3, objective=32]

 23%|##3       | 7/30 [00:00<00:00, 45.34it/s, failures=3, objective=32]

 27%|##6       | 8/30 [00:00<00:00, 45.34it/s, failures=3, objective=32]

 30%|###       | 9/30 [00:00<00:00, 45.34it/s, failures=3, objective=32]

 33%|###3      | 10/30 [00:00<00:00, 45.34it/s, failures=3, objective=32]

 37%|###6      | 11/30 [00:00<00:00, 23.59it/s, failures=3, objective=32]

 37%|###6      | 11/30 [00:00<00:00, 23.59it/s, failures=3, objective=32]

 40%|####      | 12/30 [00:00<00:00, 23.59it/s, failures=3, objective=32]

 43%|####3     | 13/30 [00:00<00:00, 23.59it/s, failures=3, objective=32]

 47%|####6     | 14/30 [00:01<00:01, 10.48it/s, failures=3, objective=32]

 47%|####6     | 14/30 [00:01<00:01, 10.48it/s, failures=3, objective=32]

 50%|#####     | 15/30 [00:01<00:01, 10.48it/s, failures=3, objective=32]

 53%|#####3    | 16/30 [00:01<00:01,  7.55it/s, failures=3, objective=32]

 53%|#####3    | 16/30 [00:01<00:01,  7.55it/s, failures=3, objective=32]

 57%|#####6    | 17/30 [00:01<00:01,  7.55it/s, failures=3, objective=32]

 60%|######    | 18/30 [00:02<00:01,  6.24it/s, failures=3, objective=32]

 60%|######    | 18/30 [00:02<00:01,  6.24it/s, failures=3, objective=32]

 63%|######3   | 19/30 [00:02<00:01,  5.97it/s, failures=3, objective=32]

 63%|######3   | 19/30 [00:02<00:01,  5.97it/s, failures=4, objective=32]

 67%|######6   | 20/30 [00:02<00:01,  5.69it/s, failures=4, objective=32]

 67%|######6   | 20/30 [00:02<00:01,  5.69it/s, failures=5, objective=32]

 70%|#######   | 21/30 [00:02<00:01,  5.44it/s, failures=5, objective=32]

 70%|#######   | 21/30 [00:02<00:01,  5.44it/s, failures=5, objective=32]

 73%|#######3  | 22/30 [00:02<00:01,  4.88it/s, failures=5, objective=32]

 73%|#######3  | 22/30 [00:02<00:01,  4.88it/s, failures=5, objective=32]

 77%|#######6  | 23/30 [00:03<00:01,  4.83it/s, failures=5, objective=32]

 77%|#######6  | 23/30 [00:03<00:01,  4.83it/s, failures=5, objective=32]

 80%|########  | 24/30 [00:03<00:01,  4.77it/s, failures=5, objective=32]

 80%|########  | 24/30 [00:03<00:01,  4.77it/s, failures=5, objective=32]

 83%|########3 | 25/30 [00:03<00:01,  4.73it/s, failures=5, objective=32]

 83%|########3 | 25/30 [00:03<00:01,  4.73it/s, failures=5, objective=32]

 87%|########6 | 26/30 [00:03<00:00,  4.35it/s, failures=5, objective=32]

 87%|########6 | 26/30 [00:03<00:00,  4.35it/s, failures=5, objective=32]

 90%|######### | 27/30 [00:04<00:00,  4.42it/s, failures=5, objective=32]

 90%|######### | 27/30 [00:04<00:00,  4.42it/s, failures=5, objective=32]

 93%|#########3| 28/30 [00:04<00:00,  4.42it/s, failures=5, objective=32]

 93%|#########3| 28/30 [00:04<00:00,  4.42it/s, failures=6, objective=32]

 97%|#########6| 29/30 [00:04<00:00,  4.35it/s, failures=6, objective=32]

 97%|#########6| 29/30 [00:04<00:00,  4.35it/s, failures=7, objective=32]

100%|##########| 30/30 [00:04<00:00,  3.99it/s, failures=7, objective=32]

100%|##########| 30/30 [00:04<00:00,  3.99it/s, failures=8, objective=32]Executing failure strategy: min

  0%|          | 0/30 [00:00<?, ?it/s]
  3%|3         | 1/30 [00:00<00:00, 59074.70it/s, failures=1, objective=None]
  7%|6         | 2/30 [00:00<00:00, 181.61it/s, failures=1, objective=16]
 10%|#         | 3/30 [00:00<00:00, 136.64it/s, failures=2, objective=16]
 13%|#3        | 4/30 [00:00<00:00, 121.89it/s, failures=2, objective=16]
 17%|#6        | 5/30 [00:00<00:00, 114.54it/s, failures=2, objective=32]
100%|##########| 30/30 [00:05<00:00,  5.96it/s, failures=8, objective=32]

 20%|##        | 6/30 [00:00<00:00, 49.25it/s, failures=2, objective=32]
 20%|##        | 6/30 [00:00<00:00, 49.25it/s, failures=3, objective=32]
 23%|##3       | 7/30 [00:00<00:00, 49.25it/s, failures=3, objective=32]
 27%|##6       | 8/30 [00:00<00:00, 49.25it/s, failures=3, objective=32]
 30%|###       | 9/30 [00:00<00:00, 49.25it/s, failures=3, objective=32]
 33%|###3      | 10/30 [00:00<00:00, 49.25it/s, failures=3, objective=32]
 37%|###6      | 11/30 [00:00<00:00, 28.14it/s, failures=3, objective=32]
 37%|###6      | 11/30 [00:00<00:00, 28.14it/s, failures=3, objective=32]
 40%|####      | 12/30 [00:00<00:00, 28.14it/s, failures=4, objective=32]
 43%|####3     | 13/30 [00:00<00:00, 28.14it/s, failures=4, objective=32]
 47%|####6     | 14/30 [00:01<00:00, 28.14it/s, failures=5, objective=32]
 50%|#####     | 15/30 [00:01<00:01,  9.08it/s, failures=5, objective=32]
 50%|#####     | 15/30 [00:01<00:01,  9.08it/s, failures=5, objective=32]
 53%|#####3    | 16/30 [00:01<00:01,  9.08it/s, failures=5, objective=32]
 57%|#####6    | 17/30 [00:01<00:01,  7.03it/s, failures=5, objective=32]
 57%|#####6    | 17/30 [00:01<00:01,  7.03it/s, failures=5, objective=32]
 60%|######    | 18/30 [00:02<00:01,  7.03it/s, failures=5, objective=32]
 63%|######3   | 19/30 [00:02<00:01,  6.28it/s, failures=5, objective=32]
 63%|######3   | 19/30 [00:02<00:01,  6.28it/s, failures=5, objective=32]
 67%|######6   | 20/30 [00:02<00:01,  6.28it/s, failures=6, objective=32]
 70%|#######   | 21/30 [00:02<00:01,  5.52it/s, failures=6, objective=32]
 70%|#######   | 21/30 [00:02<00:01,  5.52it/s, failures=6, objective=32]
 73%|#######3  | 22/30 [00:02<00:01,  5.38it/s, failures=6, objective=32]
 73%|#######3  | 22/30 [00:02<00:01,  5.38it/s, failures=7, objective=32]
 77%|#######6  | 23/30 [00:03<00:01,  4.93it/s, failures=7, objective=32]
 77%|#######6  | 23/30 [00:03<00:01,  4.93it/s, failures=7, objective=32]
 80%|########  | 24/30 [00:03<00:01,  4.87it/s, failures=7, objective=32]
 80%|########  | 24/30 [00:03<00:01,  4.87it/s, failures=7, objective=32]
 83%|########3 | 25/30 [00:03<00:01,  4.82it/s, failures=7, objective=32]
 83%|########3 | 25/30 [00:03<00:01,  4.82it/s, failures=7, objective=32]
 87%|########6 | 26/30 [00:03<00:00,  4.79it/s, failures=7, objective=32]
 87%|########6 | 26/30 [00:03<00:00,  4.79it/s, failures=7, objective=32]
 90%|######### | 27/30 [00:04<00:00,  4.40it/s, failures=7, objective=32]
 90%|######### | 27/30 [00:04<00:00,  4.40it/s, failures=7, objective=32]
 93%|#########3| 28/30 [00:04<00:00,  4.46it/s, failures=7, objective=32]
 93%|#########3| 28/30 [00:04<00:00,  4.46it/s, failures=7, objective=32]
 97%|#########6| 29/30 [00:04<00:00,  4.50it/s, failures=7, objective=32]
 97%|#########6| 29/30 [00:04<00:00,  4.50it/s, failures=7, objective=32]
100%|##########| 30/30 [00:04<00:00,  4.52it/s, failures=7, objective=32]
100%|##########| 30/30 [00:04<00:00,  4.52it/s, failures=7, objective=32]

Finally we plot the collected results

import matplotlib.pyplot as plt
import numpy as np

plt.figure()

for i, (failure_strategy, df) in enumerate(results.items()):
    plt.subplot(3, 1, i + 1)
    if df.objective.dtype != np.float64:
        x = np.arange(len(df))
        mask_failed = np.where(df.objective.str.startswith("F"))[0]
        mask_success = np.where(~df.objective.str.startswith("F"))[0]
        x_success, x_failed = x[mask_success], x[mask_failed]
        y_success = df["objective"][mask_success].astype(float)
    plt.scatter(x_success, y_success, label=failure_strategy)
    plt.scatter(x_failed, np.zeros(x_failed.shape), marker="v", color="red")

    plt.xlabel(r"Iterations")
    plt.ylabel(r"Objective")
    plt.legend()
plt.show()
plot notify failures hyperparameter search

Total running time of the script: ( 0 minutes 12.565 seconds)

Gallery generated by Sphinx-Gallery