2. Introduction to Asynchronous Distributed Bayesian Optimization (DBO)#

Author(s): Joceran Gouneau & Romain Egele.

In this tutorial we show how to use the Distributed Bayesian Optimization (DBO) search algorithm to perform hyperparameter optimization on the Ackley function.

2.1. Definition of the problem : the Ackley function#

Ackley Function in 2D
\[f(x) = -a \exp \left( -b \sqrt {\frac 1 d \sum_{i=1}^d x_i^2} \right) - \exp \left( -b \sqrt {\frac 1 d \sum_{i=1}^d \cos(c x_i)} \right) + a + \exp(1)\]

Note

We are using this function to emulate a realistic problem while keeping the definition of the hyperparameter search space and run function as simple as possible ; if you are searching for neural network use cases we redirect you to our Colab and Notebook tutorials.

First we have to define the Hyperparameter search space as well as the run function, which, given a certain config of hyperparameters, should return the objective we want to maximize. We are computing the 10-D (\(d = 10\)) Ackley function with \(a = 20\), \(b = 0.2\) and \(c = 2\pi\) and want to find its minimum \(f(x=(0, \dots , 0)) = 0\) on the domain \([-32.768, 32.768]^{10}\). Thus we define the hyperparameter problem as \(x_i \in [-32.768, 32.768]~ \forall i \in [|0,9|]\) and the objective returned by the run function as \(-f(x)\).

file: ackley.py#
import time
import numpy as np
from deephyper.problem import HpProblem

d = 10
domain = (-32.768, 32.768)
hp_problem = HpProblem()
for i in range(d):
    hp_problem.add_hyperparameter(domain, f"x{i}")

def ackley(x, a=20, b=0.2, c=2*np.pi):
    d = len(x)
    s1 = np.sum(x ** 2)
    s2 = np.sum(np.cos(c * x))
    term1 = -a * np.exp(-b * np.sqrt(s1 / d))
    term2 = -np.exp(s2 / d)
    y = term1 + term2 + a + np.exp(1)
    return y

def run(config):
    x = np.array([config[f"x{i}"] for i in range(d)])
    x = np.asarray_chkfinite(x)  # ValueError if any NaN or Inf
    return -ackley(x)

2.2. Definition of the distributed Bayesian optimization search (DBO)#

Centralized (left) and Distributed (right) architectures

DBO (right) is very similar to Centralized Bayesian Optimization (CBO) (left) in the sense that we iteratively generate new configurations with an optimizer \(O\), evaluate them on Workers \(W\) by calling the black-box function \(f\) which takes \(t_{eff}\) time to be computed, and fit the optimizer on the history of the search (the configuration/objective pairs) to generate better configurations. The only difference is that with CBO the fitting of the optimizer and generation of new configurations is centralized on a Manager \(M\), while with DBO each worker has its own optimizer and these operations are parallelized. This difference makes DBO a preferable choice when the run function is relatively fast and the number of workers increasing; with a large enough number of workers the fit of the optimizer (which has to be performed each time we generate a configuration) starts to take more time than the run function takes to be evaluated : at that point we obtain a congestion on the manager and therefore workers become idle because waiting for a new configuration to be evaluated. DBO avoid this type of congestion issue by attributing one optimizer per worker and performing asynchronous communication between optimizers.

DBO can be formally described in the following algorithm:

DBO algorithm

2.2.1. Running DBO#

Unlike CBO, DBO doesn’t use any evaluator instance, everything is comprised within the DBO search instance and is only compatible with MPI for now:

file: search.py#
from ackley import hp_problem, run

search = DBO(
   hp_problem,
   run,
   sync_communication=False,
   log_dir=".",
   checkpoint_file="results.csv",
   checkpoint_freq=1,
)

Note

The checkpoint_file and checkpoint_freq parameters can be used to regularly save the state of the search while it is being performed, this is equivalent to saving the results dataframe returned by the search in the file named checkpoint_file within the directory log_dir at a frequency checkpoint_freq. It is good practice to perform this checkpoint in the case of an issue during the search in order to still have results even though the search doesn’t successfuly terminate. By default the results are saved at each iteration in the results.csv of the current directory.

Note

The sync_communication parameter, when set to True, allows to force the workers to broadcast their new evaluations synchronously, in this case the frequency at which this broadcast is performed can be specified with the sync_communication_freq parameter. For example with sync_communication=True and sync_communication_freq=10, each worker will perform 10 evaluations and wait for every other worker to do as much before broadcasting these new evaluations, and then repeat this process until the end of the search.

2.3. Execution of the search : using MPI#

The backend of DBO uses MPI, so if we want for example to get the final results and save it at a specific location, we need to get it from a single worker while executing the search for the same amount of time on each one of them :

file: search.py#
from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()

timeout = 10
if rank == 0:
   results = search.search(timeout=timeout)
   results.to_csv("results.csv")
else:
   search.search(timeout=timeout)

Finaly, we can run the search.py script with MPI:

$ mpirun -np 10 python search.py