Note
Go to the end to download the full example code.
Queued Evaluator with MPI#
Author(s): Romain Egele, Brett Eiffert.
In this example, you will learn how to use a queued ProcessPoolEvaluator with an mpi run function.
Code (Import statements)
import subprocess
from deephyper.evaluator import (
LokyEvaluator,
parse_subprocess_result,
profile,
queued,
)
from deephyper.evaluator.callback import TqdmCallback
from deephyper.hpo import CBO, HpProblem
Run function using MPI#
Used by the main function to fork and manage mpi processes within the evaluator.
@profile
def run_mpi_exe(job, dequed=None):
x = job.parameters["x"]
# The format of the print "DH-OUTPUT:(.+)\n" is strict if you use parse_suprocess_result
command = f"mpirun -np {len(dequed)} echo DH-OUTPUT:{x}\n"
completed_process = subprocess.run(command.split(), capture_output=True)
objective = parse_subprocess_result(completed_process)
# In the results.csv a new `m:dequed` will show the passed dequed values.
return {"objective": objective, "metadata": {}}
Setup#
The problem is defined with basic hyperparameters for a straightforward example.
problem = HpProblem()
problem.add_hyperparameter((0.0, 10.0), "x")
UniformFloatHyperparameter(name='x', default_value=5.0, meta=None, size=inf, lower=0.0, upper=10.0, log=False)
Variables used for selecting the number of workers to execute the pool of mpi workers. These are defined for show and can be run in a multi-node setup or on a single node or local machine. Number of processes spawned = num_nodes / num_nodes_per_task
# Local machine or single node
num_nodes = 1
num_nodes_per_task = 1
# Multi-node setup
#num_nodes = 10
#num_nodes_per_task = 2
Parallel Processing#
We define a main function which sets up an mpi enabled evaluator object to be used to evaluate the model in parallel. Tasks are spawned in the run_mpi_exe function that was defined earlier and queued in a LokyEvaluator
.
Using the evaluator (LokyEvaluator
), the search is performed for a user defined number of iterations (50).
LokyEvaluator
was chosen over other deephyper evaluators ProcessPoolEvaluator
and ThreadPoolEvaluator
due to the preference of running MPI processes and the necessity of argument based process spawning required by notebook-style runtimes.
To read more about the evaluator backend options and how to choose the best on for a specific use case, go to (coming soon).
def main():
evaluator = queued(LokyEvaluator)(
run_function=run_mpi_exe,
num_workers=num_nodes // num_nodes_per_task,
callbacks=[TqdmCallback()],
queue=[node_id for node_id in range(num_nodes)],
queue_pop_per_task=num_nodes_per_task,
)
print(f"Evaluator uses {evaluator.num_workers} workers")
search = CBO(problem, evaluator, log_dir="log_queued_evaluator")
search.search(max_evals=50)
if __name__ == "__main__":
main()
Evaluator uses 1 workers
WARNING:root:Results file already exists, it will be renamed to /Users/35e/Projects/DeepHyper/deephyper/examples/examples_parallelism/log_queued_evaluator/results_20250515-170740.csv
0%| | 0/50 [00:00<?, ?it/s]
2%|▏ | 1/50 [00:00<00:00, 10131.17it/s, failures=0, objective=4.31]
4%|▍ | 2/50 [00:00<00:03, 14.54it/s, failures=0, objective=4.31]
4%|▍ | 2/50 [00:00<00:03, 14.54it/s, failures=0, objective=4.8]
6%|▌ | 3/50 [00:00<00:03, 14.54it/s, failures=0, objective=4.8]
8%|▊ | 4/50 [00:00<00:12, 3.55it/s, failures=0, objective=4.8]
8%|▊ | 4/50 [00:00<00:12, 3.55it/s, failures=0, objective=8.9]
10%|█ | 5/50 [00:01<00:18, 2.44it/s, failures=0, objective=8.9]
10%|█ | 5/50 [00:01<00:18, 2.44it/s, failures=0, objective=8.9]
12%|█▏ | 6/50 [00:02<00:24, 1.80it/s, failures=0, objective=8.9]
12%|█▏ | 6/50 [00:02<00:24, 1.80it/s, failures=0, objective=8.9]
14%|█▍ | 7/50 [00:03<00:26, 1.65it/s, failures=0, objective=8.9]
14%|█▍ | 7/50 [00:03<00:26, 1.65it/s, failures=0, objective=9.26]
16%|█▌ | 8/50 [00:04<00:27, 1.55it/s, failures=0, objective=9.26]
16%|█▌ | 8/50 [00:04<00:27, 1.55it/s, failures=0, objective=9.61]
18%|█▊ | 9/50 [00:04<00:28, 1.44it/s, failures=0, objective=9.61]
18%|█▊ | 9/50 [00:04<00:28, 1.44it/s, failures=0, objective=9.75]
20%|██ | 10/50 [00:05<00:29, 1.35it/s, failures=0, objective=9.75]
20%|██ | 10/50 [00:05<00:29, 1.35it/s, failures=0, objective=9.75]
22%|██▏ | 11/50 [00:06<00:32, 1.20it/s, failures=0, objective=9.75]
22%|██▏ | 11/50 [00:06<00:32, 1.20it/s, failures=0, objective=9.75]
24%|██▍ | 12/50 [00:07<00:33, 1.12it/s, failures=0, objective=9.75]
24%|██▍ | 12/50 [00:07<00:33, 1.12it/s, failures=0, objective=9.75]
26%|██▌ | 13/50 [00:08<00:31, 1.17it/s, failures=0, objective=9.75]
26%|██▌ | 13/50 [00:08<00:31, 1.17it/s, failures=0, objective=9.75]
28%|██▊ | 14/50 [00:09<00:31, 1.13it/s, failures=0, objective=9.75]
28%|██▊ | 14/50 [00:09<00:31, 1.13it/s, failures=0, objective=9.75]
30%|███ | 15/50 [00:10<00:30, 1.14it/s, failures=0, objective=9.75]
30%|███ | 15/50 [00:10<00:30, 1.14it/s, failures=0, objective=9.75]
32%|███▏ | 16/50 [00:11<00:29, 1.14it/s, failures=0, objective=9.75]
32%|███▏ | 16/50 [00:11<00:29, 1.14it/s, failures=0, objective=9.75]
34%|███▍ | 17/50 [00:12<00:30, 1.07it/s, failures=0, objective=9.75]
34%|███▍ | 17/50 [00:12<00:30, 1.07it/s, failures=0, objective=9.76]
36%|███▌ | 18/50 [00:13<00:30, 1.06it/s, failures=0, objective=9.76]
36%|███▌ | 18/50 [00:13<00:30, 1.06it/s, failures=0, objective=9.76]
38%|███▊ | 19/50 [00:14<00:27, 1.13it/s, failures=0, objective=9.76]
38%|███▊ | 19/50 [00:14<00:27, 1.13it/s, failures=0, objective=9.91]
40%|████ | 20/50 [00:14<00:25, 1.19it/s, failures=0, objective=9.91]
40%|████ | 20/50 [00:14<00:25, 1.19it/s, failures=0, objective=9.91]
42%|████▏ | 21/50 [00:15<00:23, 1.24it/s, failures=0, objective=9.91]
42%|████▏ | 21/50 [00:15<00:23, 1.24it/s, failures=0, objective=9.91]
44%|████▍ | 22/50 [00:16<00:21, 1.28it/s, failures=0, objective=9.91]
44%|████▍ | 22/50 [00:16<00:21, 1.28it/s, failures=0, objective=9.91]
46%|████▌ | 23/50 [00:17<00:21, 1.27it/s, failures=0, objective=9.91]
46%|████▌ | 23/50 [00:17<00:21, 1.27it/s, failures=0, objective=9.91]
48%|████▊ | 24/50 [00:18<00:22, 1.16it/s, failures=0, objective=9.91]
48%|████▊ | 24/50 [00:18<00:22, 1.16it/s, failures=0, objective=9.91]
50%|█████ | 25/50 [00:18<00:20, 1.19it/s, failures=0, objective=9.91]
50%|█████ | 25/50 [00:18<00:20, 1.19it/s, failures=0, objective=9.91]
52%|█████▏ | 26/50 [00:19<00:19, 1.24it/s, failures=0, objective=9.91]
52%|█████▏ | 26/50 [00:19<00:19, 1.24it/s, failures=0, objective=9.91]
54%|█████▍ | 27/50 [00:20<00:18, 1.27it/s, failures=0, objective=9.91]
54%|█████▍ | 27/50 [00:20<00:18, 1.27it/s, failures=0, objective=9.91]
56%|█████▌ | 28/50 [00:21<00:16, 1.30it/s, failures=0, objective=9.91]
56%|█████▌ | 28/50 [00:21<00:16, 1.30it/s, failures=0, objective=9.91]
58%|█████▊ | 29/50 [00:21<00:15, 1.33it/s, failures=0, objective=9.91]
58%|█████▊ | 29/50 [00:21<00:15, 1.33it/s, failures=0, objective=9.91]
60%|██████ | 30/50 [00:22<00:15, 1.28it/s, failures=0, objective=9.91]
60%|██████ | 30/50 [00:22<00:15, 1.28it/s, failures=0, objective=9.99]
62%|██████▏ | 31/50 [00:23<00:14, 1.30it/s, failures=0, objective=9.99]
62%|██████▏ | 31/50 [00:23<00:14, 1.30it/s, failures=0, objective=9.99]
64%|██████▍ | 32/50 [00:24<00:13, 1.32it/s, failures=0, objective=9.99]
64%|██████▍ | 32/50 [00:24<00:13, 1.32it/s, failures=0, objective=10]
66%|██████▌ | 33/50 [00:24<00:12, 1.35it/s, failures=0, objective=10]
66%|██████▌ | 33/50 [00:24<00:12, 1.35it/s, failures=0, objective=10]
68%|██████▊ | 34/50 [00:25<00:11, 1.37it/s, failures=0, objective=10]
68%|██████▊ | 34/50 [00:25<00:11, 1.37it/s, failures=0, objective=10]
70%|███████ | 35/50 [00:26<00:10, 1.38it/s, failures=0, objective=10]
70%|███████ | 35/50 [00:26<00:10, 1.38it/s, failures=0, objective=10]
72%|███████▏ | 36/50 [00:26<00:10, 1.38it/s, failures=0, objective=10]
72%|███████▏ | 36/50 [00:26<00:10, 1.38it/s, failures=0, objective=10]
74%|███████▍ | 37/50 [00:27<00:09, 1.31it/s, failures=0, objective=10]
74%|███████▍ | 37/50 [00:27<00:09, 1.31it/s, failures=0, objective=10]
76%|███████▌ | 38/50 [00:28<00:09, 1.32it/s, failures=0, objective=10]
76%|███████▌ | 38/50 [00:28<00:09, 1.32it/s, failures=0, objective=10]
78%|███████▊ | 39/50 [00:29<00:08, 1.34it/s, failures=0, objective=10]
78%|███████▊ | 39/50 [00:29<00:08, 1.34it/s, failures=0, objective=10]
80%|████████ | 40/50 [00:29<00:07, 1.35it/s, failures=0, objective=10]
80%|████████ | 40/50 [00:29<00:07, 1.35it/s, failures=0, objective=10]
82%|████████▏ | 41/50 [00:30<00:06, 1.36it/s, failures=0, objective=10]
82%|████████▏ | 41/50 [00:30<00:06, 1.36it/s, failures=0, objective=10]
84%|████████▍ | 42/50 [00:31<00:06, 1.30it/s, failures=0, objective=10]
84%|████████▍ | 42/50 [00:31<00:06, 1.30it/s, failures=0, objective=10]
86%|████████▌ | 43/50 [00:32<00:05, 1.33it/s, failures=0, objective=10]
86%|████████▌ | 43/50 [00:32<00:05, 1.33it/s, failures=0, objective=10]
88%|████████▊ | 44/50 [00:32<00:04, 1.34it/s, failures=0, objective=10]
88%|████████▊ | 44/50 [00:32<00:04, 1.34it/s, failures=0, objective=10]
90%|█████████ | 45/50 [00:33<00:03, 1.36it/s, failures=0, objective=10]
90%|█████████ | 45/50 [00:33<00:03, 1.36it/s, failures=0, objective=10]
92%|█████████▏| 46/50 [00:34<00:02, 1.36it/s, failures=0, objective=10]
92%|█████████▏| 46/50 [00:34<00:02, 1.36it/s, failures=0, objective=10]
94%|█████████▍| 47/50 [00:35<00:02, 1.37it/s, failures=0, objective=10]
94%|█████████▍| 47/50 [00:35<00:02, 1.37it/s, failures=0, objective=10]
96%|█████████▌| 48/50 [00:35<00:01, 1.31it/s, failures=0, objective=10]
96%|█████████▌| 48/50 [00:35<00:01, 1.31it/s, failures=0, objective=10]
98%|█████████▊| 49/50 [00:36<00:00, 1.33it/s, failures=0, objective=10]
98%|█████████▊| 49/50 [00:36<00:00, 1.33it/s, failures=0, objective=10]
100%|██████████| 50/50 [00:37<00:00, 1.35it/s, failures=0, objective=10]
100%|██████████| 50/50 [00:37<00:00, 1.35it/s, failures=0, objective=10]
100%|██████████| 50/50 [00:37<00:00, 1.34it/s, failures=0, objective=10]
Total running time of the script: (0 minutes 44.342 seconds)