Configuring model execution with Balsam

When using the Balsam evaluator or balsam-submit shortcut to launch DeepHyper HPS jobs, the default assumption is that the --run argument is a callable that trains your model and returns the objective of maximization. This is a convenient abstraction for many simple use cases, because DeepHyper fully automates:

  • wrapping your model in an executable “runner” script

  • passing hyperparameters from DeepHyper to your model

  • returning the objective from a trained model to DeepHyper

Unfortunately, there are cases where the default execution model does not apply. For instance, you may wish to launch a Singularity container that performs data-parallel model training on several nodes. You may also need to vary the number of MPI ranks according to a local batch-size hyperparmeter in your search. In another scenario, you might have a model implemented in Fortran that isn’t trivially importable by DeepHyper.

The underlying Balsam evaluator is sufficiently flexible to handle these complex use cases. The price to pay for this flexibility is that your code becomes responsible for addressing the preceeding bullet points. We illustrate how to control Balsam model evaluation tasks below. Let’s modify the template HPS benchmark generated by new-problem. First, set up the default workspace:

bash
$ deephyper start-project demo
$ cd demo/demo/
$ deephyper new-problem hps polynome2
$ cd polynome2

Creating an execution wrapper

First, we will modify model_run.py to parse hyperparameters passed as a JSON string on the command line. After passing them into the run function, we signal the return value back to DeepHyper using a print statement. This could be done in a separate script, running a different version of Python, or even with a different programming language altogether! For brevity in this example, we just put this logic at the bottom of the model_run.py file, under the if __name__ == "__main__" block.

polynome2/model_run.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12

    HISTORY = history.history

    return history.history["val_r2"][-1]


if __name__ == "__main__":
    import json

    point = json.loads(sys.argv[1])
    objective = run(point)
    print("DH-OUTPUT:", objective)

In the next section, you’ll see how we pass the hyperparameters as a single JSON-string argument. The json.loads() call of the snippet above pulls the hyperparameters back into a dictionary from the command line.

The print statement beginning with DH-OUTPUT (case-sensitive) signals to DeepHyper that this line contains the optimization objective. DeepHyper will scan the standard output of this model_run.py script, find the DH-OUTPUT line, and pass the objective back into its search algorithm.

Overriding BalsamJob creation

Second, we will create a module called addtask.py with an add_task(point) function that tells DeepHyper how to create Balsam tasks for each evaluation. This function has a similar signature to the run(point) function itself: it takes the dictionary of hyperparameters as its sole argument. But instead of training a model and returning the objective value, add_task(point) simply creates a BalsamJob instance and returns it.

polynome2/addtask.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import json
import os
import sys
import shlex

from balsam.core.models import BalsamJob, ApplicationDefinition
from deephyper.benchmark import balsamjob_spec, JSONEncoder

app_name = "run_poly2"
here = os.path.dirname(os.path.abspath(__file__))
script_path = os.path.join(here, "model_run.py")
app_cmd = f'{sys.executable} {script_path}'

app, created = ApplicationDefinition.objects.get_or_create(
    name = app_name,
    defaults={'executable': app_cmd}
)
if not created:
    app.executable = app_cmd
    app.save()

@balsamjob_spec
def add_task(point):
    job = BalsamJob(
        application = app_name,
        args = shlex.quote(json.dumps(point, cls=JSONEncoder)),
        num_nodes = 1,
        ranks_per_node = 1,
    )
    return job

In this script, we set up the Balsam app run_poly2 to run the model code. Notice the call to ApplicationDefinition.objects.get_or_create, which runs on import time. This ensures that the Balsam application registered in the database is always up-to-date with this script. Most importantly, the add_task(point) function is annotated with the deephyper.benchmark.balsamjob_spec decorator. This signals to DeepHyper that add_task returns a BalsamJob object and is NOT an ordinary run function.

The body of add_task creates an instance of the run_poly2 app with the necessary parameters. Here, you can exercise complete control over aspects of how the task is launched. Refer to the Balsam documentation for more details on how to control task execution.

In this example, we convert the point dictionary into a JSON-string, using json.dumps() and the DeepHyper-aware JSONEncoder class. To ensure the entire JSON-string is parsed as a single command-line argument, we escape it in single-quotes using the Python standard library function shlex.quote().

Run like usual!

We can now run HPS like before, but passing the add_task() function in place of the run() function to DeepHyper.

$ deephyper balsam-submit hps test_hps3 -p polynome2.problem.Problem -r polynome2.addtask.add_task \
  -t 30 -q debug-cache-quad -n 4 -A datascience -j mpi

Communicating through the Balsam database

The preceding example specifically avoided loading Balsam in model_run.py to illustrate the concepts of linking DeepHyper to a model running in a completely different programming environment. In that case, the most portable way to communicate between DeepHyper and the model is via command-line arguments and file I/O.

In case your model runs in Python 3.7, it is straightforward to install Balsam and link to the job database directly from the model execution code. Then, models can read parameters and write results directly to the Balsam database, using the data attribute of the current job (balsam.launcher.dag.current_job). addtask.py would changes as follows:

polynome2/addtask_balsam.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import json
import os
import sys
import shlex

from balsam.core.models import BalsamJob, ApplicationDefinition
from deephyper.benchmark import balsamjob_spec, to_encodable

app_name = "run_poly2"
here = os.path.dirname(os.path.abspath(__file__))
script_path = os.path.join(here, "model_run.py")
app_cmd = f'{sys.executable} {script_path}'

app, created = ApplicationDefinition.objects.get_or_create(
    name = app_name,
    defaults={'executable': app_cmd}
)
if not created:
    app.executable = app_cmd
    app.save()

@balsamjob_spec
def add_task(point):
    job = BalsamJob(
        application = app_name,
        data = {'point': to_encodable(point)},
        num_nodes = 1,
        ranks_per_node = 1,
    )
    return job

Notice the import of deephyper.benchmark.to_encodable, which is used to convert numpy arrays and other DeepHyper data types from the point dictionary into a JSON-encodable format. The corresponding if __name__ == "__main__" block of model_run.py would then read the hyperparameters directly out the database and write the objective back, as follows:

polynome2/model_run_balsam.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13

    HISTORY = history.history

    return history.history["val_r2"][-1]


if __name__ == "__main__":
    from balsam.launcher.dag import current_job

    point = current_job.data["point"]
    objective = run(point)
    current_job.data["dh_objective"] = objective
    current_job.save()

Notice the dh_objective parameter set in the database. This overrides the search of DH-OUTPUT in the standard output stream and allows DeepHyper to retrieve the objective directly from the task database.