Hyperparameter Search for Deep Learning (Basic)

Every DeepHyper search requires at least 2 Python objects as input:

  • run: your “black-box” function returning the objective value to be maximized

  • Problem: an instance of deephyper.problem.BaseProblem which defines the search space of input parameters to run

These objects are required for both HPS and NAS, but take on a slightly different meaning in the context of NAS.

We will illustrate DeepHyper HPS using a regression example. We generate synthetic data according to \(y = - \mathbf{x}^{T} \mathbf{x}\) for random \(N\)-dimensional input vectors \(\mathbf{x}\). Our regression model is a multilayer perceptron with 1 hidden layer, implemented in Keras. Using HPS, we will then tune the model hyperparameters to optimize the validation \(R^{2}\) metric.

Setting up the problem

Note

Be sure to work in a virtual environment where you can easily pip install new packages. This typically entails using either Anaconda, virtualenv, or Pipenv.

Let’s start by creating a new DeepHyper project workspace. This a directory where you will create search problem instances that are automatically installed and importable across your Python environment.

bash
$ deephyper start-project hps_demo

A new hps_demo directory is created, containing the following files:

hps_demo/
      hps_demo/
          __init__.py
      setup.py

We can now define DeepHyper search problems inside this directory, using either deephyper new-problem nas {name} or deephyper new-problem hps {name} for NAS or HPS, respectively.

Let’s set up an HPS problem called polynome2 as follows:

bash
$ cd hps_demo/hps_demo/
$ deephyper new-problem hps polynome2

A new HPS problem subdirectory should be in place. This is a Python subpackage containing sample code in the files __init__.py, load_data.py, model_run.py, and problem.py. Overall, your project directory should look like:

hps_demo/
      hps_demo/
          __init__.py
          polynome2/
              __init__.py
              load_data.py
              model_run.py
              problem.py
      setup.py

Generating data

The sample load_data.py will generate the training and validation data for our demo regression problem. While not required by the DeepHyper HPS API, it is helpful to encapsulate data loading and preparation in a separate module. This sample generates data from a function \(f\) where \(X \in [a, b]^n\) where \(f(X) = -\sum_{i=0}^{n-1} {x_i ^2}\):

polynome2/load_data.py
 1import os
 2import numpy as np
 3
 4np.random.seed(2018)
 5
 6
 7def load_data(dim=10, a=-50, b=50, prop=0.80, size=10000):
 8    """Generate a random distribution of data for polynome_2 function: -SUM(X**2) where "**" is an element wise operator in the continuous range [a, b].
 9
10    Args:
11        dim (int): size of input vector for the polynome_2 function.
12        a (int): minimum bound for all X dimensions.
13        b (int): maximum bound for all X dimensions.
14        prop (float): a value between [0., 1.] indicating how to split data between training set and validation set. `prop` corresponds to the ratio of data in training set. `1.-prop` corresponds to the amount of data in validation set.
15        size (int): amount of data to generate. It is equal to `len(training_data)+len(validation_data).
16
17    Returns:
18        tuple(tuple(ndarray, ndarray), tuple(ndarray, ndarray)): of Numpy arrays: `(train_X, train_y), (valid_X, valid_y)`.
19    """
20
21    def polynome_2(x):
22        return -sum([x_i ** 2 for x_i in x])
23
24    d = b - a
25    x = np.array([a + np.random.random(dim) * d for i in range(size)])
26    y = np.array([[polynome_2(v)] for v in x])
27
28    sep_index = int(prop * size)
29    train_X = x[:sep_index]
30    train_y = y[:sep_index]
31
32    valid_X = x[sep_index:]
33    valid_y = y[sep_index:]
34
35    print(f"train_X shape: {np.shape(train_X)}")
36    print(f"train_y shape: {np.shape(train_y)}")
37    print(f"valid_X shape: {np.shape(valid_X)}")
38    print(f"valid_y shape: {np.shape(valid_y)}")
39    return (train_X, train_y), (valid_X, valid_y)
40
41
42if __name__ == "__main__":
43    load_data()

You can test the load_data function:

bash
python load_data.py

The expected output is:

[Out]
train_X shape: (8000, 10)
train_y shape: (8000, 1)
valid_X shape: (2000, 10)
valid_y shape: (2000, 1)

The Keras model

model_run.py contains the code for the neural network that we will train.

The model is implemented in the run() function below. We will provide this function to DeepHyper, which will call it to evaluate various hyperparameter settings. This function takes a point argument, which is a dictionary of tunable hyperparameters. In this case, we will tune:

  • The number of units of the Dense hidden layer (point['units'])

  • The activation function of the Dense layer (point['activation'])

  • The learning rate of the RMSprop optimizer (point['lr']).

After training, the validation \(R^{2}\) is returned by the run() function. This return value is the objective for maximization by the DeepHyper HPS search algorithm.

Step 1: polynome2/model_run.py
 1import numpy as np
 2import keras.backend as K
 3import keras
 4from keras.callbacks import EarlyStopping
 5from keras.layers import Dense
 6from keras.models import Sequential
 7from keras.optimizers import RMSprop
 8
 9import os
10import sys
11
12here = os.path.dirname(os.path.abspath(__file__))
13sys.path.insert(0, here)
14from load_data import load_data
15
16
17def r2(y_true, y_pred):
18    SS_res = keras.backend.sum(keras.backend.square(y_true - y_pred), axis=0)
19    SS_tot = keras.backend.sum(
20        keras.backend.square(y_true - keras.backend.mean(y_true, axis=0)), axis=0
21    )
22    output_scores = 1 - SS_res / (SS_tot + keras.backend.epsilon())
23    r2 = keras.backend.mean(output_scores)
24    return r2
25
26
27HISTORY = None
28
29
30def run(point):
31    global HISTORY
32    (x_train, y_train), (x_valid, y_valid) = load_data()
33    
34    if point["activation"] == "identity":
35        point["activation"] = None
36    	
37    model = Sequential()
38    model.add(
39        Dense(
40            point["units"],
41            activation=point["activation"],
42            input_shape=tuple(np.shape(x_train)[1:]),
43        )
44    )
45    model.add(Dense(1))
46
47    model.summary()
48
49    model.compile(loss="mse", optimizer=RMSprop(lr=point["lr"]), metrics=[r2])
50
51    history = model.fit(
52        x_train,
53        y_train,
54        batch_size=64,
55        epochs=1000,
56        verbose=1,
57        callbacks=[EarlyStopping(monitor="val_r2", mode="max", verbose=1, patience=10)],
58        validation_data=(x_valid, y_valid),
59    )
60
61    HISTORY = history.history
62
63    return history.history["val_r2"][-1]
64
65
66if __name__ == "__main__":
67    point = {"units": 10, "activation": "relu", "lr": 0.01}
68    objective = run(point)
69    print("objective: ", objective)
70    import matplotlib.pyplot as plt
71
72    plt.plot(HISTORY["val_r2"])
73    plt.xlabel("Epochs")
74    plt.ylabel("Objective: $R^2$")
75    plt.grid()
76    plt.show()

Note

Adding an EarlyStopping(...) callback is a good idea to stop the training of your model as soon as it is stops to improve.

...
callbacks=[EarlyStopping(
                    monitor='val_r2',
                    mode='max',
                    verbose=1,
                    patience=10
                )]
...

We can first train this model to evaluate the baseline accuracy:

bash
python model_run.py
[Out]
objective: -0.00040728187561035154
../_images/model_step_0_val_r2.png

Defining the HPS Problem space

The run function in model_run.py expects a hyperparameter dictionary with three keys: units, activation, and lr. We define the acceptable ranges for these hyperparameters with the Problem object inside problem.py. Hyperparameter ranges are defined using the following syntax:

  • Discrete integer ranges are generated from a tuple: (lower: int, upper: int)

  • Continous parameters are generated from a tuple: (lower: float, upper: float)

  • Categorical or nonordinal hyperparameters ranges can be given as a list of possible values: [val1, val2, ...]

You probably have one or more “reference” sets of hyperparameters that are either hand-crafted or chosen by intuition. To bootstrap the search with these so-called starting points, use the add_starting_point(...) method.

Note

Several starting points can be defined with Problem.add_starting_point(**dims). All starting points will be evaluated before generating other evaluations.

polynome2/problem.py
 1from deephyper.problem import HpProblem
 2
 3Problem = HpProblem()
 4
 5Problem.add_hyperparameter((1, 100), "units")
 6Problem.add_hyperparameter(["identity", "relu", "sigmoid", "tanh"], "activation")
 7Problem.add_hyperparameter((0.0001, 1.0), "lr")
 8
 9Problem.add_starting_point(units=10, activation="identity", lr=0.01)
10
11if __name__ == "__main__":
12    print(Problem)

You can look at the representation of your problem:

bash
python problem.py

The expected output is:

[Out]
Problem
{ 'activation': [None, 'relu', 'sigmoid', 'tanh'],
'lr': (0.0001, 1.0),
'units': (1, 100)}

Starting Point
{0: {'activation': None, 'lr': 0.01, 'units': 10}}

Running the search locally

Everything is ready to run. Recall the Python files defining our experiment:

polynome2/
      __init__.py
      load_data.py
      model_run.py
      problem.py

We have tested the syntax in all of these by running them individually. Now, let’s put it all together by tuning the 3 hyperparameters with asynchronous model-based search (AMBS).

bash
deephyper hps ambs --problem hps_demo.polynome2.problem.Problem --run hps_demo.polynome2.model_run.run

Note

The above command will require a long time to execute completely. If you want to generate a smaller dataset, append `–max-evals 100’ to the end of the command to expedite the process.

Note

In order to run DeepHyper locally and on other systems we are using deephyper.evaluator. For local evaluations we use the deephyper.evaluator.SubprocessEvaluator.

Note

Alternative to the command line above, paths to the problem.py and model_run.py files can be passed as arguments. DeepHyper requires that these modules contain an importable Problem instance and run callable, respectively. It is your responsibility to ensure that any other modules imported in problem.py or model_run.py are in the Python import search path.

We strongly recommend using a virtual environment with the start-project and new-problem command line tools. This ensures that any helper modules are easily accessible using the syntax import problem_name.helper_module.

After the search is over, you will find the following files in your working directory:

deephyper.log
results.csv
results.json

Deephyper analytics

We will use the deephyper-analytics command line tool to investigate the results.

Note

See the Analytics installation instructions of deephyper-analytics.

Run:

bash
deephyper-analytics notebook --type hps --output dh-analytics-hps.ipynb results.csv

Then start jupyter:

bash
jupyter notebook

Open the dh-analytics-hps notebook and run it:

path to data file: polynome2/results.csv

for customization please see: https://matplotlib.org/api/matplotlib_configuration_api.html

Setup & data loading

path_to_data_file = 'polynome2/results.csv'
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from pprint import pprint
from datetime import datetime
from tqdm import tqdm
from IPython.display import display, Markdown

width = 21
height = 13

matplotlib.rcParams.update({
    'font.size': 21,
    'figure.figsize': (width, height),
    'figure.facecolor': 'white',
    'savefig.dpi': 72,
    'figure.subplot.bottom': 0.125,
    'figure.edgecolor': 'white',
    'xtick.labelsize': 21,
    'ytick.labelsize': 21})

df = pd.read_csv(path_to_data_file)

display(Markdown(f'The search did _{df.count()[0]}_ **evaluations**.'))

df.head()

The search did 88 evaluations.

activation lr units objective elapsed_sec
0 NaN 0.010000 10 -67.720345 4.683628
1 sigmoid 0.210479 78 -47.973845 7.850657
2 sigmoid 0.849683 18 -7.910984 11.379633
3 tanh 0.951716 19 -2.596602 16.031375
4 sigmoid 0.898754 74 -21.409714 19.312386

Statistical summary

df.describe()
lr units objective elapsed_sec
count 100.000000 100.00000 100.000000 100.000000
mean 0.861301 13.12000 -3.468272 188.652953
std 0.112005 10.78746 11.586969 116.032871
min 0.010000 1.00000 -74.376173 4.683628
25% 0.861376 7.75000 -2.011465 87.576996
50% 0.871134 11.50000 -0.092576 178.604464
75% 0.876806 15.00000 0.494384 288.718287
max 0.997793 78.00000 0.746590 399.764441

Search trajectory

plt.plot(df.elapsed_sec, df.objective)
plt.ylabel('Objective')
plt.xlabel('Time (s.)')
plt.xlim(0)
plt.grid()
plt.show()
../_images/output_6_0.png

Pairplots

not_include = ['elapsed_sec']
sns.pairplot(df.loc[:, filter(lambda n: n not in not_include, df.columns)],
                diag_kind="kde", markers="+",
                plot_kws=dict(s=50, edgecolor="b", linewidth=1),
                diag_kws=dict(shade=True))
plt.show()
../_images/output_8_0.png
corr = df.loc[:, filter(lambda n: n not in not_include, df.columns)].corr()
sns.heatmap(corr, xticklabels=corr.columns, yticklabels=corr.columns, cmap=sns.diverging_palette(220, 10, as_cmap=True))
plt.show()
../_images/output_9_0.png

Best objective

i_max = df.objective.idxmax()
df.iloc[i_max]
activation         relu
lr             0.882041
units                21
objective       0.74659
elapsed_sec     394.818
Name: 98, dtype: object
dict(df.iloc[i_max])
{'activation': 'relu',
 'lr': 0.8820413612862609,
 'units': 21,
 'objective': 0.7465898108482361,
 'elapsed_sec': 394.81818103790283}

The best point the search found:

point = {
    'activation': 'relu',
    'lr': 0.8820413612862609,
    'units': 21
}

Just pass this point to your run function

Step 1: polynome2/model_run.py
 1
 2import numpy as np
 3import keras.backend as K
 4import keras
 5from keras.callbacks import EarlyStopping
 6from keras.layers import Dense
 7from keras.models import Sequential
 8from keras.optimizers import RMSprop
 9
10import os
11import sys
12here = os.path.dirname(os.path.abspath(__file__))
13sys.path.insert(0, here)
14from load_data import load_data
15
16
17def r2(y_true, y_pred):
18    SS_res = keras.backend.sum(keras.backend.square(y_true - y_pred), axis=0)
19    SS_tot = keras.backend.sum(
20        keras.backend.square(y_true - keras.backend.mean(y_true, axis=0)), axis=0
21    )
22    output_scores = 1 - SS_res / (SS_tot + keras.backend.epsilon())
23    r2 = keras.backend.mean(output_scores)
24    return r2
25
26
27HISTORY = None
28
29
30def run(point):
31    global HISTORY
32    (x_train, y_train), (x_valid, y_valid) = load_data()
33
34    model = Sequential()
35    model.add(Dense(
36        point['units'],
37        activation=point['activation'],
38        input_shape=tuple(np.shape(x_train)[1:])))
39    model.add(Dense(1))
40
41    model.summary()
42
43    model.compile(loss='mse', optimizer=RMSprop(lr=point['lr']), metrics=[r2])
44
45    history = model.fit(x_train, y_train,
46                        batch_size=64,
47                        epochs=1000,
48                        verbose=1,
49                        callbacks=[EarlyStopping(
50                            monitor='val_r2',
51                            mode='max',
52                            verbose=1,
53                            patience=10
54                        )],
55                        validation_data=(x_valid, y_valid))
56
57    HISTORY = history.history
58
59    return history.history['val_r2'][-1]
60
61
62if __name__ == '__main__':
63    point = {
64        'activation': 'relu',
65        'lr': 0.8820413612862609,
66        'units': 21
67    }
68    objective = run(point)
69    print('objective: ', objective)
70    import matplotlib.pyplot as plt
71    plt.plot(HISTORY['val_r2'])
72    plt.xlabel('Epochs')
73    plt.ylabel('Objective: $R^2$')
74    plt.grid()
75    plt.show()

And run the script:

bash
python model_run.py
[Out]
objective:  0.47821942329406736
../_images/model_step_1_val_r2.png

Running the search on ALCF’s Theta and Cooley

Now let’s run the same search, but scale out to run parallel model evaluations across the nodes of an HPC system such as Theta or Cooley. First create a Balsam database:

bash
$ balsam init polydb

Start and connect to the polydb database:

bash
$ source balsamactivate polydb

Set up the demo polynome2 problem, as before:

bash
$ deephyper start-project hps_demo
$ cd hps_demo/hps_demo/
$ deephyper new-problem hps polynome2

Use the balsam-submit command to set up and dispatch an AMBS job to the local scheduler:

bash
$ deephyper balsam-submit hps polynome2_demo -p hps_demo.polynome2.problem.Problem -r hps_demo.polynome2.model_run.run  \
   -t 30 -q debug-cache-quad -n 4 -A datascience -j mpi
[Out]
Validating Problem...OK
Validating run...OK
Bootstrapping apps...OK
Creating HPS(AMBS) BalsamJob...OK
Performing job submission...
Submit OK: Qlaunch {   'command': '/lus/theta-fs0/projects/datascience/msalim/deephyper/deephyper/db/qsubmit/qlaunch12.sh',
    'from_balsam': True,
    'id': 12,
    'job_mode': 'mpi',
    'nodes': 4,
    'prescheduled_only': False,
    'project': 'datascience',
    'queue': 'debug-cache-quad',
    'scheduler_id': 370907,
    'state': 'submitted',
    'wall_minutes': 30,
    'wf_filter': 'test_hps'}
**************************************************************************************************************************************
Success. The search will run at: /myprojects/deephyper/deephyper/db/data/test_hps/test_hps_2ef063ce
**************************************************************************************************************************************

Above, balsam-submit takes the following arguments:

  1. The first positional argument mode is either hps or nas

  2. The second positional argument workflow must be a unique identifier for the run. An error will be raised if this workflow already exists.

  3. -p Problem and -r Run arguments define the search, as before

  4. -t 60 indicates the walltime (minutes) of the scheduled job

  5. -n 4 requests four nodes on which to run the search. DeepHyper will automatically scale the search out across available nodes.

  6. -q Queue and -A Project pass the name of the job queue and project allocation to the HPC scheduler

  7. -j or --job-mode must be either mpi or serial. This controls how Balsam launches your model_runs.

Once the search is done, you will find results in the directory shown in the banner: /myprojects/deephyper/deephyper/db/data/test_hps/test_hps_2ef063ce.

Note

The examples so far assume that your DeepHyper models run in the same Python environment as DeepHyper and each model runs on a single node. If you need more control over model execution, say, to run containerized models, or to run data-parallel model training with Horovod, you can hook into the Balsam job controller. See Configuring model execution with Balsam for a detailed example.