Notify Failures in Hyperparameter optimization

Notify Failures in Hyperparameter optimization#

Author(s): Romain Egele.

This example demonstrates how to handle failure of objectives in hyperparameter search. In many cases such as software auto-tuning (where we minimize the run-time of a software application) some configurations can create run-time errors and therefore no scalar objective is returned. A default choice could be to return in this case the worst case objective if known and it can be done inside the run-function. Other possibilites are to ignore these configurations or to replace them with the running mean/min objective. To illustrate such a use-case we define an artificial run-function which will fail when one of its input parameters is greater than 0.5. To define a failure, it is possible to return a “string” value with "F" as prefix such as:

import matplotlib.pyplot as plt
import numpy as np

from deephyper.hpo import HpProblem
from deephyper.hpo import CBO
from deephyper.evaluator import Evaluator
from deephyper.evaluator.callback import TqdmCallback


def run(config: dict) -> float:
    if config["y"] > 0.5:
        return "F_postfix"
    else:
        return config["x"]

Then, we define the corresponding hyperparameter problem where x is the value to maximize and y is a value impact the appearance of failures.

problem = HpProblem()
problem.add_hyperparameter([1, 2, 4, 8, 16, 32], "x")
problem.add_hyperparameter((0.0, 1.0), "y")

print(problem)
Configuration space object:
  Hyperparameters:
    x, Type: Ordinal, Sequence: {1, 2, 4, 8, 16, 32}, Default: 1
    y, Type: UniformFloat, Range: [0.0, 1.0], Default: 0.5

Then, we define a centralized Bayesian optimization (CBO) search (i.e., master-worker architecture) which uses the Random-Forest regressor as default surrogate model. We will compare the ignore strategy which filters-out failed configurations, the mean strategy which replaces a failure by the running mean of collected objectives and the min strategy which replaces by the running min of collected objectives.

results = {}
max_evals = 50
for failure_strategy in ["ignore", "mean", "min"]:
    # for failure_strategy in ["min"]:
    print(f"Executing failure strategy: {failure_strategy}")
    evaluator = Evaluator.create(
        run, method="thread", method_kwargs={"callbacks": [TqdmCallback()]}
    )
    search = CBO(
        problem,
        evaluator,
        filter_failures=failure_strategy,
        log_dir=f"search_{failure_strategy}",
        random_state=42,
    )
    results[failure_strategy] = search.search(max_evals)
Executing failure strategy: ignore


  0%|          | 0/50 [00:00<?, ?it/s]

  2%|▏         | 1/50 [00:00<00:00, 11096.04it/s, failures=1, objective=None]

  4%|▍         | 2/50 [00:00<00:00, 120.63it/s, failures=2, objective=None]

  6%|▌         | 3/50 [00:00<00:00, 93.05it/s, failures=3, objective=None]

  8%|▊         | 4/50 [00:00<00:00, 97.48it/s, failures=4, objective=None]

 10%|█         | 5/50 [00:00<00:00, 100.71it/s, failures=4, objective=32]

 12%|█▏        | 6/50 [00:00<00:00, 103.18it/s, failures=5, objective=32]

 14%|█▍        | 7/50 [00:00<00:00, 105.62it/s, failures=6, objective=32]

 16%|█▌        | 8/50 [00:00<00:00, 107.41it/s, failures=7, objective=32]

 18%|█▊        | 9/50 [00:00<00:00, 108.99it/s, failures=7, objective=32]

 20%|██        | 10/50 [00:00<00:00, 110.30it/s, failures=7, objective=32]

 22%|██▏       | 11/50 [00:00<00:00, 111.41it/s, failures=7, objective=32]

 24%|██▍       | 12/50 [00:00<00:00, 111.98it/s, failures=7, objective=32]

 24%|██▍       | 12/50 [00:00<00:00, 111.98it/s, failures=8, objective=32]

 26%|██▌       | 13/50 [00:00<00:00, 111.98it/s, failures=9, objective=32]

 28%|██▊       | 14/50 [00:00<00:00, 111.98it/s, failures=9, objective=32]

 30%|███       | 15/50 [00:00<00:00, 111.98it/s, failures=9, objective=32]

 32%|███▏      | 16/50 [00:00<00:00, 111.98it/s, failures=9, objective=32]

 34%|███▍      | 17/50 [00:00<00:00, 111.98it/s, failures=9, objective=32]

 36%|███▌      | 18/50 [00:00<00:00, 111.98it/s, failures=9, objective=32]

 38%|███▊      | 19/50 [00:00<00:00, 111.98it/s, failures=10, objective=32]

 40%|████      | 20/50 [00:00<00:00, 111.98it/s, failures=10, objective=32]

 42%|████▏     | 21/50 [00:00<00:00, 111.98it/s, failures=10, objective=32]

 44%|████▍     | 22/50 [00:00<00:00, 111.98it/s, failures=11, objective=32]

 46%|████▌     | 23/50 [00:00<00:00, 111.98it/s, failures=12, objective=32]

 48%|████▊     | 24/50 [00:00<00:00, 75.18it/s, failures=12, objective=32]

 48%|████▊     | 24/50 [00:00<00:00, 75.18it/s, failures=13, objective=32]

 50%|█████     | 25/50 [00:00<00:00, 75.18it/s, failures=14, objective=32]

 52%|█████▏    | 26/50 [00:00<00:00, 75.18it/s, failures=15, objective=32]

 54%|█████▍    | 27/50 [00:00<00:00, 75.18it/s, failures=16, objective=32]

 56%|█████▌    | 28/50 [00:00<00:00, 75.18it/s, failures=17, objective=32]

 58%|█████▊    | 29/50 [00:00<00:00, 75.18it/s, failures=18, objective=32]

 60%|██████    | 30/50 [00:00<00:00, 75.18it/s, failures=19, objective=32]

 62%|██████▏   | 31/50 [00:00<00:00, 75.18it/s, failures=20, objective=32]

 64%|██████▍   | 32/50 [00:00<00:00, 75.18it/s, failures=21, objective=32]

 66%|██████▌   | 33/50 [00:00<00:00, 75.18it/s, failures=22, objective=32]

 68%|██████▊   | 34/50 [00:00<00:00, 75.18it/s, failures=23, objective=32]

 70%|███████   | 35/50 [00:00<00:00, 75.18it/s, failures=24, objective=32]

 72%|███████▏  | 36/50 [00:00<00:00, 75.18it/s, failures=25, objective=32]

 74%|███████▍  | 37/50 [00:00<00:00, 75.18it/s, failures=26, objective=32]

 76%|███████▌  | 38/50 [00:00<00:00, 75.18it/s, failures=27, objective=32]

 78%|███████▊  | 39/50 [00:00<00:00, 75.18it/s, failures=28, objective=32]

 80%|████████  | 40/50 [00:00<00:00, 75.18it/s, failures=29, objective=32]

 82%|████████▏ | 41/50 [00:00<00:00, 75.18it/s, failures=30, objective=32]

 84%|████████▍ | 42/50 [00:00<00:00, 75.18it/s, failures=31, objective=32]

 86%|████████▌ | 43/50 [00:00<00:00, 75.18it/s, failures=32, objective=32]

 88%|████████▊ | 44/50 [00:00<00:00, 75.18it/s, failures=33, objective=32]

 90%|█████████ | 45/50 [00:00<00:00, 75.18it/s, failures=34, objective=32]

 92%|█████████▏| 46/50 [00:00<00:00, 75.18it/s, failures=35, objective=32]

 94%|█████████▍| 47/50 [00:00<00:00, 75.18it/s, failures=36, objective=32]

 96%|█████████▌| 48/50 [00:00<00:00, 75.18it/s, failures=37, objective=32]

 98%|█████████▊| 49/50 [00:00<00:00, 75.18it/s, failures=38, objective=32]

100%|██████████| 50/50 [00:00<00:00, 75.18it/s, failures=39, objective=32]Executing failure strategy: mean



  0%|          | 0/50 [00:00<?, ?it/s]


  2%|▏         | 1/50 [00:00<00:00, 27776.85it/s, failures=1, objective=None]


  4%|▍         | 2/50 [00:00<00:00, 227.22it/s, failures=2, objective=None]


  6%|▌         | 3/50 [00:00<00:00, 175.56it/s, failures=3, objective=None]


  8%|▊         | 4/50 [00:00<00:00, 158.10it/s, failures=4, objective=None]


 10%|█         | 5/50 [00:00<00:00, 149.88it/s, failures=4, objective=32]


 12%|█▏        | 6/50 [00:00<00:00, 142.75it/s, failures=5, objective=32]


 14%|█▍        | 7/50 [00:00<00:00, 139.50it/s, failures=6, objective=32]


 16%|█▌        | 8/50 [00:00<00:00, 136.78it/s, failures=7, objective=32]


 18%|█▊        | 9/50 [00:00<00:00, 134.92it/s, failures=7, objective=32]


 20%|██        | 10/50 [00:00<00:00, 133.27it/s, failures=7, objective=32]


 22%|██▏       | 11/50 [00:00<00:00, 131.95it/s, failures=7, objective=32]


 24%|██▍       | 12/50 [00:00<00:00, 130.79it/s, failures=8, objective=32]


 26%|██▌       | 13/50 [00:00<00:00, 130.03it/s, failures=9, objective=32]


 28%|██▊       | 14/50 [00:00<00:00, 129.32it/s, failures=9, objective=32]


 28%|██▊       | 14/50 [00:00<00:00, 129.32it/s, failures=9, objective=32]


 30%|███       | 15/50 [00:00<00:00, 129.32it/s, failures=9, objective=32]


 32%|███▏      | 16/50 [00:00<00:00, 129.32it/s, failures=9, objective=32]


 34%|███▍      | 17/50 [00:00<00:00, 129.32it/s, failures=9, objective=32]


 36%|███▌      | 18/50 [00:00<00:00, 129.32it/s, failures=9, objective=32]


 38%|███▊      | 19/50 [00:00<00:00, 129.32it/s, failures=10, objective=32]


 40%|████      | 20/50 [00:00<00:00, 129.32it/s, failures=10, objective=32]


 42%|████▏     | 21/50 [00:00<00:00, 129.32it/s, failures=10, objective=32]


 44%|████▍     | 22/50 [00:00<00:00, 129.32it/s, failures=10, objective=32]


 46%|████▌     | 23/50 [00:00<00:00, 129.32it/s, failures=11, objective=32]


 48%|████▊     | 24/50 [00:00<00:00, 129.32it/s, failures=11, objective=32]


 50%|█████     | 25/50 [00:00<00:00, 129.32it/s, failures=11, objective=32]


 52%|█████▏    | 26/50 [00:00<00:00, 129.32it/s, failures=11, objective=32]


 54%|█████▍    | 27/50 [00:00<00:00, 32.50it/s, failures=11, objective=32]


 54%|█████▍    | 27/50 [00:00<00:00, 32.50it/s, failures=11, objective=32]


 56%|█████▌    | 28/50 [00:00<00:00, 32.50it/s, failures=11, objective=32]


 58%|█████▊    | 29/50 [00:00<00:00, 32.50it/s, failures=11, objective=32]


 60%|██████    | 30/50 [00:00<00:00, 32.50it/s, failures=11, objective=32]


 62%|██████▏   | 31/50 [00:01<00:00, 32.50it/s, failures=11, objective=32]


 64%|██████▍   | 32/50 [00:01<00:00, 32.50it/s, failures=11, objective=32]


 66%|██████▌   | 33/50 [00:01<00:00, 32.50it/s, failures=11, objective=32]


 68%|██████▊   | 34/50 [00:01<00:00, 21.36it/s, failures=11, objective=32]


 68%|██████▊   | 34/50 [00:01<00:00, 21.36it/s, failures=12, objective=32]


 70%|███████   | 35/50 [00:01<00:00, 21.36it/s, failures=13, objective=32]


 72%|███████▏  | 36/50 [00:01<00:00, 21.36it/s, failures=13, objective=32]


 74%|███████▍  | 37/50 [00:01<00:00, 21.36it/s, failures=14, objective=32]


 76%|███████▌  | 38/50 [00:01<00:00, 21.36it/s, failures=14, objective=32]


 78%|███████▊  | 39/50 [00:01<00:00, 17.69it/s, failures=14, objective=32]


 78%|███████▊  | 39/50 [00:01<00:00, 17.69it/s, failures=15, objective=32]


 80%|████████  | 40/50 [00:01<00:00, 17.69it/s, failures=15, objective=32]


 82%|████████▏ | 41/50 [00:01<00:00, 17.69it/s, failures=15, objective=32]


 84%|████████▍ | 42/50 [00:02<00:00, 16.06it/s, failures=15, objective=32]


 84%|████████▍ | 42/50 [00:02<00:00, 16.06it/s, failures=16, objective=32]


 86%|████████▌ | 43/50 [00:02<00:00, 16.06it/s, failures=17, objective=32]
100%|██████████| 50/50 [00:02<00:00, 19.55it/s, failures=39, objective=32]



 88%|████████▊ | 44/50 [00:02<00:00, 16.06it/s, failures=17, objective=32]


 90%|█████████ | 45/50 [00:02<00:00, 14.07it/s, failures=17, objective=32]


 90%|█████████ | 45/50 [00:02<00:00, 14.07it/s, failures=18, objective=32]


 92%|█████████▏| 46/50 [00:02<00:00, 14.07it/s, failures=19, objective=32]


 94%|█████████▍| 47/50 [00:02<00:00, 13.50it/s, failures=19, objective=32]


 94%|█████████▍| 47/50 [00:02<00:00, 13.50it/s, failures=20, objective=32]


 96%|█████████▌| 48/50 [00:02<00:00, 13.50it/s, failures=20, objective=32]


 98%|█████████▊| 49/50 [00:02<00:00, 12.97it/s, failures=20, objective=32]


 98%|█████████▊| 49/50 [00:02<00:00, 12.97it/s, failures=20, objective=32]


100%|██████████| 50/50 [00:02<00:00, 12.97it/s, failures=20, objective=32]Executing failure strategy: min


  0%|          | 0/50 [00:00<?, ?it/s]

  2%|▏         | 1/50 [00:00<00:00, 39945.75it/s, failures=1, objective=None]

  4%|▍         | 2/50 [00:00<00:00, 247.50it/s, failures=2, objective=None]

  6%|▌         | 3/50 [00:00<00:00, 187.36it/s, failures=3, objective=None]

  8%|▊         | 4/50 [00:00<00:00, 166.12it/s, failures=4, objective=None]

 10%|█         | 5/50 [00:00<00:00, 150.66it/s, failures=4, objective=32]

 12%|█▏        | 6/50 [00:00<00:00, 142.83it/s, failures=5, objective=32]

 14%|█▍        | 7/50 [00:00<00:00, 138.29it/s, failures=6, objective=32]

 16%|█▌        | 8/50 [00:00<00:00, 135.98it/s, failures=7, objective=32]

 18%|█▊        | 9/50 [00:00<00:00, 134.51it/s, failures=7, objective=32]

 20%|██        | 10/50 [00:00<00:00, 133.58it/s, failures=7, objective=32]

 22%|██▏       | 11/50 [00:00<00:00, 132.22it/s, failures=7, objective=32]

 24%|██▍       | 12/50 [00:00<00:00, 131.65it/s, failures=8, objective=32]

 26%|██▌       | 13/50 [00:00<00:00, 131.00it/s, failures=9, objective=32]

 28%|██▊       | 14/50 [00:00<00:00, 130.46it/s, failures=9, objective=32]

 28%|██▊       | 14/50 [00:00<00:00, 130.46it/s, failures=9, objective=32]

 30%|███       | 15/50 [00:00<00:00, 130.46it/s, failures=9, objective=32]

 32%|███▏      | 16/50 [00:00<00:00, 130.46it/s, failures=9, objective=32]

 34%|███▍      | 17/50 [00:00<00:00, 130.46it/s, failures=9, objective=32]

 36%|███▌      | 18/50 [00:00<00:00, 130.46it/s, failures=9, objective=32]

 38%|███▊      | 19/50 [00:00<00:00, 130.46it/s, failures=10, objective=32]

 40%|████      | 20/50 [00:00<00:00, 130.46it/s, failures=10, objective=32]

 42%|████▏     | 21/50 [00:00<00:00, 130.46it/s, failures=10, objective=32]

 44%|████▍     | 22/50 [00:00<00:00, 130.46it/s, failures=10, objective=32]

 46%|████▌     | 23/50 [00:00<00:00, 130.46it/s, failures=11, objective=32]

 48%|████▊     | 24/50 [00:00<00:00, 130.46it/s, failures=11, objective=32]

 50%|█████     | 25/50 [00:00<00:00, 130.46it/s, failures=12, objective=32]

 52%|█████▏    | 26/50 [00:00<00:00, 130.46it/s, failures=12, objective=32]

 54%|█████▍    | 27/50 [00:00<00:00, 130.46it/s, failures=12, objective=32]

 56%|█████▌    | 28/50 [00:00<00:00, 32.05it/s, failures=12, objective=32]

 56%|█████▌    | 28/50 [00:00<00:00, 32.05it/s, failures=12, objective=32]

 58%|█████▊    | 29/50 [00:00<00:00, 32.05it/s, failures=12, objective=32]

 60%|██████    | 30/50 [00:00<00:00, 32.05it/s, failures=13, objective=32]

 62%|██████▏   | 31/50 [00:01<00:00, 32.05it/s, failures=13, objective=32]

 64%|██████▍   | 32/50 [00:01<00:00, 32.05it/s, failures=13, objective=32]

 66%|██████▌   | 33/50 [00:01<00:00, 32.05it/s, failures=13, objective=32]

 68%|██████▊   | 34/50 [00:01<00:00, 32.05it/s, failures=14, objective=32]

 70%|███████   | 35/50 [00:01<00:00, 21.98it/s, failures=14, objective=32]

 70%|███████   | 35/50 [00:01<00:00, 21.98it/s, failures=15, objective=32]

 72%|███████▏  | 36/50 [00:01<00:00, 21.98it/s, failures=15, objective=32]

 74%|███████▍  | 37/50 [00:01<00:00, 21.98it/s, failures=15, objective=32]

 76%|███████▌  | 38/50 [00:01<00:00, 21.98it/s, failures=15, objective=32]

 78%|███████▊  | 39/50 [00:01<00:00, 21.98it/s, failures=16, objective=32]

 80%|████████  | 40/50 [00:01<00:00, 18.76it/s, failures=16, objective=32]

 80%|████████  | 40/50 [00:01<00:00, 18.76it/s, failures=16, objective=32]

 82%|████████▏ | 41/50 [00:01<00:00, 18.76it/s, failures=16, objective=32]

 84%|████████▍ | 42/50 [00:01<00:00, 18.76it/s, failures=17, objective=32]

 86%|████████▌ | 43/50 [00:01<00:00, 18.76it/s, failures=17, objective=32]

 88%|████████▊ | 44/50 [00:02<00:00, 16.98it/s, failures=17, objective=32]

 88%|████████▊ | 44/50 [00:02<00:00, 16.98it/s, failures=17, objective=32]

 90%|█████████ | 45/50 [00:02<00:00, 16.98it/s, failures=17, objective=32]

 92%|█████████▏| 46/50 [00:02<00:00, 16.98it/s, failures=17, objective=32]

 94%|█████████▍| 47/50 [00:02<00:00, 15.99it/s, failures=17, objective=32]

 94%|█████████▍| 47/50 [00:02<00:00, 15.99it/s, failures=17, objective=32]

 96%|█████████▌| 48/50 [00:02<00:00, 15.99it/s, failures=17, objective=32]

 98%|█████████▊| 49/50 [00:02<00:00, 15.99it/s, failures=17, objective=32]

100%|██████████| 50/50 [00:02<00:00, 15.17it/s, failures=17, objective=32]

100%|██████████| 50/50 [00:02<00:00, 15.17it/s, failures=17, objective=32]

Finally we plot the collected results

plt.figure()

for i, (failure_strategy, df) in enumerate(results.items()):
    plt.subplot(3, 1, i + 1)
    if df.objective.dtype != np.float64:
        x = np.arange(len(df))
        mask_failed = np.where(df.objective.str.startswith("F"))[0]
        mask_success = np.where(~df.objective.str.startswith("F"))[0]
        x_success, x_failed = x[mask_success], x[mask_failed]
        y_success = df["objective"][mask_success].astype(float)
    plt.scatter(x_success, y_success, label=failure_strategy)
    plt.scatter(x_failed, np.zeros(x_failed.shape), marker="v", color="red")

    plt.xlabel(r"Iterations")
    plt.ylabel(r"Objective")
    plt.legend()
plt.show()
plot notify failures hpo

Total running time of the script: (0 minutes 5.948 seconds)

Gallery generated by Sphinx-Gallery