deephyper.search.hps.DBO#

class deephyper.search.hps.DBO(problem, run_function, random_state: Optional[int] = None, log_dir: str = '.', verbose: int = 0, comm=None, run_function_kwargs: Optional[dict] = None, n_jobs: int = 1, surrogate_model: str = 'RF', surrogate_model_kwargs: Optional[dict] = None, n_initial_points: int = 10, lazy_socket_allocation: bool = False, communication_batch_size=2048, sync_communication: bool = False, sync_communication_freq: int = 10, checkpoint_file: str = 'results.csv', checkpoint_freq: int = 1, acq_func: str = 'UCB', acq_optimizer: str = 'auto', kappa: float = 1.96, xi: float = 0.001, sample_max_size: int = - 1, sample_strategy: str = 'quantile')[source]#

Bases: object

Distributed Bayesian Optimization Search.

Parameters
  • problem (HpProblem) – Hyperparameter problem describing the search space to explore.

  • run_function (callable) – A callable instance which represents the black-box function we want to evaluate.

  • random_state (int, optional) – Random seed. Defaults to None.

  • log_dir (str, optional) – Log directory where search’s results are saved. Defaults to ".".

  • verbose (int, optional) – Indicate the verbosity level of the search. Defaults to 0.

  • comm (optional) – The MPI communicator to use. Defaults to None.

  • run_function_kwargs (dict) – Keyword arguments to pass to the run-function. Defaults to None.

  • n_jobs (int, optional) – Parallel processes per rank to use for optimization updates (e.g., model re-fitting). Defaults to 1.

  • surrogate_model (str, optional) – Type of the surrogate model to use. "DUMMY" can be used of random-search, "GP" for Gaussian-Process (efficient with few iterations such as a hundred sequentially but bottleneck when scaling because of its cubic complexity w.r.t. the number of evaluations), “``”RF”`` for the Random-Forest regressor (log-linear complexity with respect to the number of evaluations). Defaults to "RF".

  • lazy_socket_allocation (bool, optional) – If True then MPI communication socket are initialized only when used for the first time, otherwise the initialization is forced when creating the instance. Defaults to False.

  • sync_communication (bool, optional) – If True workers communicate synchronously, otherwise workers communicate asynchronously. Defaults to False.

  • sync_communication_freq (int, optional) – Manage the frequency at which workers should communicate their results in the case of synchronous communication. Defaults to 10.

  • checkpoint_file (str) – Name of the file in log_dir where results are checkpointed. Defaults to "results.csv".

  • checkpoint_freq (int) – Frequency at which results are checkpointed. Defaults to 1.

  • acq_func (str) – Acquisition function to use. If "UCB" then the upper confidence bound is used, if "EI" then the expected-improvement is used, if "PI" then the probability of improvement is used, if "gp_hedge" then probabilistically choose one of the above.

  • acq_optimizer (str) – Method use to optimise the acquisition function. If "sampling" then random-samples are drawn and infered for optimization, if "lbfgs" gradient-descent is used. Defaults to "auto".

  • kappa (float) – Exploration/exploitation value for UCB-acquisition function, the higher the more exploration, the smaller the more exploitation. Defaults to 1.96 which corresponds to a 95% confidence interval.

  • xi (float) – Exploration/exploitation value for EI and PI-acquisition functions, the higher the more exploration, the smaller the more exploitation. Defaults to 0.001.

  • sample_max_size (int) – Maximum size of the number of samples used to re-fit the surrogate model. Defaults to -1 for infinite sample size.

  • sample_strategy (str) – Sub-sampling strategy to re-fit the surrogate model. If "quantile" then sub-sampling is performed based on the quantile of the collected objective values. Defaults to "quantile".

Methods

broadcast

broadcast_to_root

checkpoint

Dump evaluations to a CSV file.``

dump_context

Dumps the context in the log folder.

fit_surrogate

Fit the surrogate model of the search from a checkpointed Dataframe.

gather_results

recv_any

search

Execute the search algorithm.

send_all

terminate

Terminate the search.

to_dict

Transform a list of hyperparameter values to a dict where keys are hyperparameters names and values are hyperparameters values.

to_json

Returns a json version of the search object.

checkpoint()[source]#

Dump evaluations to a CSV file.``

dump_context()[source]#

Dumps the context in the log folder.

fit_surrogate(df)[source]#

Fit the surrogate model of the search from a checkpointed Dataframe.

Parameters

df (str|DataFrame) – a checkpoint from a previous search.

Example Usage:

>>> search = CBO(problem, evaluator)
>>> search.fit_surrogate("results.csv")
search(max_evals: int = - 1, timeout: Optional[int] = None)[source]#

Execute the search algorithm.

Parameters
  • max_evals (int, optional) – The maximum number of evaluations of the run function to perform before stopping the search. Defaults to -1, will run indefinitely.

  • timeout (int, optional) – The time budget (in seconds) of the search before stopping. Defaults to None, will not impose a time budget.

Returns

a pandas DataFrame containing the evaluations performed.

Return type

DataFrame

terminate()[source]#

Terminate the search.

Raises

SearchTerminationError – raised when the search is terminated with SIGALARM

to_json()[source]#

Returns a json version of the search object.