Note
Go to the end to download the full example code.
Notify Failures in Hyperparameter optimization#
Author(s): Romain Egele.
This example demonstrates how to handle failure of objectives in
hyperparameter search. In many cases such as software auto-tuning (where we
minimize the run-time of a software application) some configurations can
create run-time errors and therefore no scalar objective is returned. A
default choice could be to return in this case the worst case objective if
known and it can be done inside the run
-function. Other possibilites are
to ignore these configurations or to replace them with the running mean/min
objective. To illustrate such a use-case we define an artificial
run
-function which will fail when one of its input parameters is greater
than 0.5. To define a failure, it is possible to return a “string” value with
"F"
as prefix such as:
import matplotlib.pyplot as plt
import numpy as np
from deephyper.hpo import HpProblem
from deephyper.hpo import CBO
from deephyper.evaluator import Evaluator
from deephyper.evaluator.callback import TqdmCallback
def run(config: dict) -> float:
if config["y"] > 0.5:
return "F_postfix"
else:
return config["x"]
Then, we define the corresponding hyperparameter problem where x
is the
value to maximize and y
is a value impact the appearance of failures.
problem = HpProblem()
problem.add_hyperparameter([1, 2, 4, 8, 16, 32], "x")
problem.add_hyperparameter((0.0, 1.0), "y")
print(problem)
Configuration space object:
Hyperparameters:
x, Type: Ordinal, Sequence: {1, 2, 4, 8, 16, 32}, Default: 1
y, Type: UniformFloat, Range: [0.0, 1.0], Default: 0.5
Then, we define a centralized Bayesian optimization (CBO) search
(i.e., master-worker architecture) which uses the Random-Forest regressor
as default surrogate model. We will compare the ignore
strategy which
filters-out failed configurations, the mean
strategy which replaces a
failure by the running mean of collected objectives and the min
strategy which replaces by the running min of collected objectives.
results = {}
max_evals = 50
for failure_strategy in ["ignore", "mean", "min"]:
# for failure_strategy in ["min"]:
print(f"Executing failure strategy: {failure_strategy}")
evaluator = Evaluator.create(
run, method="thread", method_kwargs={"callbacks": [TqdmCallback()]}
)
search = CBO(
problem,
evaluator,
filter_failures=failure_strategy,
log_dir=f"search_{failure_strategy}",
random_state=42,
)
results[failure_strategy] = search.search(max_evals)
Executing failure strategy: ignore
0%| | 0/50 [00:00<?, ?it/s]
2%|▏ | 1/50 [00:00<00:00, 11096.04it/s, failures=1, objective=None]
4%|▍ | 2/50 [00:00<00:00, 120.63it/s, failures=2, objective=None]
6%|▌ | 3/50 [00:00<00:00, 93.05it/s, failures=3, objective=None]
8%|▊ | 4/50 [00:00<00:00, 97.48it/s, failures=4, objective=None]
10%|█ | 5/50 [00:00<00:00, 100.71it/s, failures=4, objective=32]
12%|█▏ | 6/50 [00:00<00:00, 103.18it/s, failures=5, objective=32]
14%|█▍ | 7/50 [00:00<00:00, 105.62it/s, failures=6, objective=32]
16%|█▌ | 8/50 [00:00<00:00, 107.41it/s, failures=7, objective=32]
18%|█▊ | 9/50 [00:00<00:00, 108.99it/s, failures=7, objective=32]
20%|██ | 10/50 [00:00<00:00, 110.30it/s, failures=7, objective=32]
22%|██▏ | 11/50 [00:00<00:00, 111.41it/s, failures=7, objective=32]
24%|██▍ | 12/50 [00:00<00:00, 111.98it/s, failures=7, objective=32]
24%|██▍ | 12/50 [00:00<00:00, 111.98it/s, failures=8, objective=32]
26%|██▌ | 13/50 [00:00<00:00, 111.98it/s, failures=9, objective=32]
28%|██▊ | 14/50 [00:00<00:00, 111.98it/s, failures=9, objective=32]
30%|███ | 15/50 [00:00<00:00, 111.98it/s, failures=9, objective=32]
32%|███▏ | 16/50 [00:00<00:00, 111.98it/s, failures=9, objective=32]
34%|███▍ | 17/50 [00:00<00:00, 111.98it/s, failures=9, objective=32]
36%|███▌ | 18/50 [00:00<00:00, 111.98it/s, failures=9, objective=32]
38%|███▊ | 19/50 [00:00<00:00, 111.98it/s, failures=10, objective=32]
40%|████ | 20/50 [00:00<00:00, 111.98it/s, failures=10, objective=32]
42%|████▏ | 21/50 [00:00<00:00, 111.98it/s, failures=10, objective=32]
44%|████▍ | 22/50 [00:00<00:00, 111.98it/s, failures=11, objective=32]
46%|████▌ | 23/50 [00:00<00:00, 111.98it/s, failures=12, objective=32]
48%|████▊ | 24/50 [00:00<00:00, 75.18it/s, failures=12, objective=32]
48%|████▊ | 24/50 [00:00<00:00, 75.18it/s, failures=13, objective=32]
50%|█████ | 25/50 [00:00<00:00, 75.18it/s, failures=14, objective=32]
52%|█████▏ | 26/50 [00:00<00:00, 75.18it/s, failures=15, objective=32]
54%|█████▍ | 27/50 [00:00<00:00, 75.18it/s, failures=16, objective=32]
56%|█████▌ | 28/50 [00:00<00:00, 75.18it/s, failures=17, objective=32]
58%|█████▊ | 29/50 [00:00<00:00, 75.18it/s, failures=18, objective=32]
60%|██████ | 30/50 [00:00<00:00, 75.18it/s, failures=19, objective=32]
62%|██████▏ | 31/50 [00:00<00:00, 75.18it/s, failures=20, objective=32]
64%|██████▍ | 32/50 [00:00<00:00, 75.18it/s, failures=21, objective=32]
66%|██████▌ | 33/50 [00:00<00:00, 75.18it/s, failures=22, objective=32]
68%|██████▊ | 34/50 [00:00<00:00, 75.18it/s, failures=23, objective=32]
70%|███████ | 35/50 [00:00<00:00, 75.18it/s, failures=24, objective=32]
72%|███████▏ | 36/50 [00:00<00:00, 75.18it/s, failures=25, objective=32]
74%|███████▍ | 37/50 [00:00<00:00, 75.18it/s, failures=26, objective=32]
76%|███████▌ | 38/50 [00:00<00:00, 75.18it/s, failures=27, objective=32]
78%|███████▊ | 39/50 [00:00<00:00, 75.18it/s, failures=28, objective=32]
80%|████████ | 40/50 [00:00<00:00, 75.18it/s, failures=29, objective=32]
82%|████████▏ | 41/50 [00:00<00:00, 75.18it/s, failures=30, objective=32]
84%|████████▍ | 42/50 [00:00<00:00, 75.18it/s, failures=31, objective=32]
86%|████████▌ | 43/50 [00:00<00:00, 75.18it/s, failures=32, objective=32]
88%|████████▊ | 44/50 [00:00<00:00, 75.18it/s, failures=33, objective=32]
90%|█████████ | 45/50 [00:00<00:00, 75.18it/s, failures=34, objective=32]
92%|█████████▏| 46/50 [00:00<00:00, 75.18it/s, failures=35, objective=32]
94%|█████████▍| 47/50 [00:00<00:00, 75.18it/s, failures=36, objective=32]
96%|█████████▌| 48/50 [00:00<00:00, 75.18it/s, failures=37, objective=32]
98%|█████████▊| 49/50 [00:00<00:00, 75.18it/s, failures=38, objective=32]
100%|██████████| 50/50 [00:00<00:00, 75.18it/s, failures=39, objective=32]Executing failure strategy: mean
0%| | 0/50 [00:00<?, ?it/s]
2%|▏ | 1/50 [00:00<00:00, 27776.85it/s, failures=1, objective=None]
4%|▍ | 2/50 [00:00<00:00, 227.22it/s, failures=2, objective=None]
6%|▌ | 3/50 [00:00<00:00, 175.56it/s, failures=3, objective=None]
8%|▊ | 4/50 [00:00<00:00, 158.10it/s, failures=4, objective=None]
10%|█ | 5/50 [00:00<00:00, 149.88it/s, failures=4, objective=32]
12%|█▏ | 6/50 [00:00<00:00, 142.75it/s, failures=5, objective=32]
14%|█▍ | 7/50 [00:00<00:00, 139.50it/s, failures=6, objective=32]
16%|█▌ | 8/50 [00:00<00:00, 136.78it/s, failures=7, objective=32]
18%|█▊ | 9/50 [00:00<00:00, 134.92it/s, failures=7, objective=32]
20%|██ | 10/50 [00:00<00:00, 133.27it/s, failures=7, objective=32]
22%|██▏ | 11/50 [00:00<00:00, 131.95it/s, failures=7, objective=32]
24%|██▍ | 12/50 [00:00<00:00, 130.79it/s, failures=8, objective=32]
26%|██▌ | 13/50 [00:00<00:00, 130.03it/s, failures=9, objective=32]
28%|██▊ | 14/50 [00:00<00:00, 129.32it/s, failures=9, objective=32]
28%|██▊ | 14/50 [00:00<00:00, 129.32it/s, failures=9, objective=32]
30%|███ | 15/50 [00:00<00:00, 129.32it/s, failures=9, objective=32]
32%|███▏ | 16/50 [00:00<00:00, 129.32it/s, failures=9, objective=32]
34%|███▍ | 17/50 [00:00<00:00, 129.32it/s, failures=9, objective=32]
36%|███▌ | 18/50 [00:00<00:00, 129.32it/s, failures=9, objective=32]
38%|███▊ | 19/50 [00:00<00:00, 129.32it/s, failures=10, objective=32]
40%|████ | 20/50 [00:00<00:00, 129.32it/s, failures=10, objective=32]
42%|████▏ | 21/50 [00:00<00:00, 129.32it/s, failures=10, objective=32]
44%|████▍ | 22/50 [00:00<00:00, 129.32it/s, failures=10, objective=32]
46%|████▌ | 23/50 [00:00<00:00, 129.32it/s, failures=11, objective=32]
48%|████▊ | 24/50 [00:00<00:00, 129.32it/s, failures=11, objective=32]
50%|█████ | 25/50 [00:00<00:00, 129.32it/s, failures=11, objective=32]
52%|█████▏ | 26/50 [00:00<00:00, 129.32it/s, failures=11, objective=32]
54%|█████▍ | 27/50 [00:00<00:00, 32.50it/s, failures=11, objective=32]
54%|█████▍ | 27/50 [00:00<00:00, 32.50it/s, failures=11, objective=32]
56%|█████▌ | 28/50 [00:00<00:00, 32.50it/s, failures=11, objective=32]
58%|█████▊ | 29/50 [00:00<00:00, 32.50it/s, failures=11, objective=32]
60%|██████ | 30/50 [00:00<00:00, 32.50it/s, failures=11, objective=32]
62%|██████▏ | 31/50 [00:01<00:00, 32.50it/s, failures=11, objective=32]
64%|██████▍ | 32/50 [00:01<00:00, 32.50it/s, failures=11, objective=32]
66%|██████▌ | 33/50 [00:01<00:00, 32.50it/s, failures=11, objective=32]
68%|██████▊ | 34/50 [00:01<00:00, 21.36it/s, failures=11, objective=32]
68%|██████▊ | 34/50 [00:01<00:00, 21.36it/s, failures=12, objective=32]
70%|███████ | 35/50 [00:01<00:00, 21.36it/s, failures=13, objective=32]
72%|███████▏ | 36/50 [00:01<00:00, 21.36it/s, failures=13, objective=32]
74%|███████▍ | 37/50 [00:01<00:00, 21.36it/s, failures=14, objective=32]
76%|███████▌ | 38/50 [00:01<00:00, 21.36it/s, failures=14, objective=32]
78%|███████▊ | 39/50 [00:01<00:00, 17.69it/s, failures=14, objective=32]
78%|███████▊ | 39/50 [00:01<00:00, 17.69it/s, failures=15, objective=32]
80%|████████ | 40/50 [00:01<00:00, 17.69it/s, failures=15, objective=32]
82%|████████▏ | 41/50 [00:01<00:00, 17.69it/s, failures=15, objective=32]
84%|████████▍ | 42/50 [00:02<00:00, 16.06it/s, failures=15, objective=32]
84%|████████▍ | 42/50 [00:02<00:00, 16.06it/s, failures=16, objective=32]
86%|████████▌ | 43/50 [00:02<00:00, 16.06it/s, failures=17, objective=32]
100%|██████████| 50/50 [00:02<00:00, 19.55it/s, failures=39, objective=32]
88%|████████▊ | 44/50 [00:02<00:00, 16.06it/s, failures=17, objective=32]
90%|█████████ | 45/50 [00:02<00:00, 14.07it/s, failures=17, objective=32]
90%|█████████ | 45/50 [00:02<00:00, 14.07it/s, failures=18, objective=32]
92%|█████████▏| 46/50 [00:02<00:00, 14.07it/s, failures=19, objective=32]
94%|█████████▍| 47/50 [00:02<00:00, 13.50it/s, failures=19, objective=32]
94%|█████████▍| 47/50 [00:02<00:00, 13.50it/s, failures=20, objective=32]
96%|█████████▌| 48/50 [00:02<00:00, 13.50it/s, failures=20, objective=32]
98%|█████████▊| 49/50 [00:02<00:00, 12.97it/s, failures=20, objective=32]
98%|█████████▊| 49/50 [00:02<00:00, 12.97it/s, failures=20, objective=32]
100%|██████████| 50/50 [00:02<00:00, 12.97it/s, failures=20, objective=32]Executing failure strategy: min
0%| | 0/50 [00:00<?, ?it/s]
2%|▏ | 1/50 [00:00<00:00, 39945.75it/s, failures=1, objective=None]
4%|▍ | 2/50 [00:00<00:00, 247.50it/s, failures=2, objective=None]
6%|▌ | 3/50 [00:00<00:00, 187.36it/s, failures=3, objective=None]
8%|▊ | 4/50 [00:00<00:00, 166.12it/s, failures=4, objective=None]
10%|█ | 5/50 [00:00<00:00, 150.66it/s, failures=4, objective=32]
12%|█▏ | 6/50 [00:00<00:00, 142.83it/s, failures=5, objective=32]
14%|█▍ | 7/50 [00:00<00:00, 138.29it/s, failures=6, objective=32]
16%|█▌ | 8/50 [00:00<00:00, 135.98it/s, failures=7, objective=32]
18%|█▊ | 9/50 [00:00<00:00, 134.51it/s, failures=7, objective=32]
20%|██ | 10/50 [00:00<00:00, 133.58it/s, failures=7, objective=32]
22%|██▏ | 11/50 [00:00<00:00, 132.22it/s, failures=7, objective=32]
24%|██▍ | 12/50 [00:00<00:00, 131.65it/s, failures=8, objective=32]
26%|██▌ | 13/50 [00:00<00:00, 131.00it/s, failures=9, objective=32]
28%|██▊ | 14/50 [00:00<00:00, 130.46it/s, failures=9, objective=32]
28%|██▊ | 14/50 [00:00<00:00, 130.46it/s, failures=9, objective=32]
30%|███ | 15/50 [00:00<00:00, 130.46it/s, failures=9, objective=32]
32%|███▏ | 16/50 [00:00<00:00, 130.46it/s, failures=9, objective=32]
34%|███▍ | 17/50 [00:00<00:00, 130.46it/s, failures=9, objective=32]
36%|███▌ | 18/50 [00:00<00:00, 130.46it/s, failures=9, objective=32]
38%|███▊ | 19/50 [00:00<00:00, 130.46it/s, failures=10, objective=32]
40%|████ | 20/50 [00:00<00:00, 130.46it/s, failures=10, objective=32]
42%|████▏ | 21/50 [00:00<00:00, 130.46it/s, failures=10, objective=32]
44%|████▍ | 22/50 [00:00<00:00, 130.46it/s, failures=10, objective=32]
46%|████▌ | 23/50 [00:00<00:00, 130.46it/s, failures=11, objective=32]
48%|████▊ | 24/50 [00:00<00:00, 130.46it/s, failures=11, objective=32]
50%|█████ | 25/50 [00:00<00:00, 130.46it/s, failures=12, objective=32]
52%|█████▏ | 26/50 [00:00<00:00, 130.46it/s, failures=12, objective=32]
54%|█████▍ | 27/50 [00:00<00:00, 130.46it/s, failures=12, objective=32]
56%|█████▌ | 28/50 [00:00<00:00, 32.05it/s, failures=12, objective=32]
56%|█████▌ | 28/50 [00:00<00:00, 32.05it/s, failures=12, objective=32]
58%|█████▊ | 29/50 [00:00<00:00, 32.05it/s, failures=12, objective=32]
60%|██████ | 30/50 [00:00<00:00, 32.05it/s, failures=13, objective=32]
62%|██████▏ | 31/50 [00:01<00:00, 32.05it/s, failures=13, objective=32]
64%|██████▍ | 32/50 [00:01<00:00, 32.05it/s, failures=13, objective=32]
66%|██████▌ | 33/50 [00:01<00:00, 32.05it/s, failures=13, objective=32]
68%|██████▊ | 34/50 [00:01<00:00, 32.05it/s, failures=14, objective=32]
70%|███████ | 35/50 [00:01<00:00, 21.98it/s, failures=14, objective=32]
70%|███████ | 35/50 [00:01<00:00, 21.98it/s, failures=15, objective=32]
72%|███████▏ | 36/50 [00:01<00:00, 21.98it/s, failures=15, objective=32]
74%|███████▍ | 37/50 [00:01<00:00, 21.98it/s, failures=15, objective=32]
76%|███████▌ | 38/50 [00:01<00:00, 21.98it/s, failures=15, objective=32]
78%|███████▊ | 39/50 [00:01<00:00, 21.98it/s, failures=16, objective=32]
80%|████████ | 40/50 [00:01<00:00, 18.76it/s, failures=16, objective=32]
80%|████████ | 40/50 [00:01<00:00, 18.76it/s, failures=16, objective=32]
82%|████████▏ | 41/50 [00:01<00:00, 18.76it/s, failures=16, objective=32]
84%|████████▍ | 42/50 [00:01<00:00, 18.76it/s, failures=17, objective=32]
86%|████████▌ | 43/50 [00:01<00:00, 18.76it/s, failures=17, objective=32]
88%|████████▊ | 44/50 [00:02<00:00, 16.98it/s, failures=17, objective=32]
88%|████████▊ | 44/50 [00:02<00:00, 16.98it/s, failures=17, objective=32]
90%|█████████ | 45/50 [00:02<00:00, 16.98it/s, failures=17, objective=32]
92%|█████████▏| 46/50 [00:02<00:00, 16.98it/s, failures=17, objective=32]
94%|█████████▍| 47/50 [00:02<00:00, 15.99it/s, failures=17, objective=32]
94%|█████████▍| 47/50 [00:02<00:00, 15.99it/s, failures=17, objective=32]
96%|█████████▌| 48/50 [00:02<00:00, 15.99it/s, failures=17, objective=32]
98%|█████████▊| 49/50 [00:02<00:00, 15.99it/s, failures=17, objective=32]
100%|██████████| 50/50 [00:02<00:00, 15.17it/s, failures=17, objective=32]
100%|██████████| 50/50 [00:02<00:00, 15.17it/s, failures=17, objective=32]
Finally we plot the collected results
plt.figure()
for i, (failure_strategy, df) in enumerate(results.items()):
plt.subplot(3, 1, i + 1)
if df.objective.dtype != np.float64:
x = np.arange(len(df))
mask_failed = np.where(df.objective.str.startswith("F"))[0]
mask_success = np.where(~df.objective.str.startswith("F"))[0]
x_success, x_failed = x[mask_success], x[mask_failed]
y_success = df["objective"][mask_success].astype(float)
plt.scatter(x_success, y_success, label=failure_strategy)
plt.scatter(x_failed, np.zeros(x_failed.shape), marker="v", color="red")
plt.xlabel(r"Iterations")
plt.ylabel(r"Objective")
plt.legend()
plt.show()
Total running time of the script: (0 minutes 5.948 seconds)