Hyperparameter Search for Deep Learning (Basic)

Every DeepHyper search requires at least 2 Python objects as input:

  • run: your “black-box” function returning the objective value to be maximized

  • Problem: an instance of deephyper.problem.BaseProblem which defines the search space of input parameters to run

These objects are required for both HPS and NAS, but take on a slightly different meaning in the context of NAS.

We will illustrate DeepHyper HPS using a regression example. We generate synthetic data according to \(y = - \mathbf{x}^{T} \mathbf{x}\) for random \(N\)-dimensional input vectors \(\mathbf{x}\). Our regression model is a multilayer perceptron with 1 hidden layer, implemented in Keras. Using HPS, we will then tune the model hyperparameters to optimize the validation \(R^{2}\) metric.

Setting up the problem

Note

Be sure to work in a virtual environment where you can easily pip install new packages. This typically entails using either Anaconda, virtualenv, or Pipenv.

Let’s start by creating a new DeepHyper project workspace. This a directory where you will create search problem instances that are automatically installed and importable across your Python environment.

bash
$ deephyper start-project hps_demo

A new hps_demo directory is created, containing the following files:

hps_demo/
      hps_demo/
          __init__.py
      setup.py

We can now define DeepHyper search problems inside this directory, using either deephyper new-problem nas {name} or deephyper new-problem hps {name} for NAS or HPS, respectively.

Let’s set up an HPS problem called polynome2 as follows:

bash
$ cd hps_demo/hps_demo/
$ deephyper new-problem hps polynome2

A new HPS problem subdirectory should be in place. This is a Python subpackage containing sample code in the files __init__.py, load_data.py, model_run.py, and problem.py. Overall, your project directory should look like:

hps_demo/
      hps_demo/
          __init__.py
          polynome2/
              __init__.py
              load_data.py
              model_run.py
              problem.py
      setup.py

Generating data

The sample load_data.py will generate the training and validation data for our demo regression problem. While not required by the DeepHyper HPS API, it is helpful to encapsulate data loading and preparation in a separate module. This sample generates data from a function \(f\) where \(X \in [a, b]^n\) where \(f(X) = -\sum_{i=0}^{n-1} {x_i ^2}\):

polynome2/load_data.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import os
import numpy as np

np.random.seed(2018)


def load_data(dim=10, a=-50, b=50, prop=0.80, size=10000):
    """Generate a random distribution of data for polynome_2 function: -SUM(X**2) where "**" is an element wise operator in the continuous range [a, b].

    Args:
        dim (int): size of input vector for the polynome_2 function.
        a (int): minimum bound for all X dimensions.
        b (int): maximum bound for all X dimensions.
        prop (float): a value between [0., 1.] indicating how to split data between training set and validation set. `prop` corresponds to the ratio of data in training set. `1.-prop` corresponds to the amount of data in validation set.
        size (int): amount of data to generate. It is equal to `len(training_data)+len(validation_data).

    Returns:
        tuple(tuple(ndarray, ndarray), tuple(ndarray, ndarray)): of Numpy arrays: `(train_X, train_y), (valid_X, valid_y)`.
    """

    def polynome_2(x):
        return -sum([x_i ** 2 for x_i in x])

    d = b - a
    x = np.array([a + np.random.random(dim) * d for i in range(size)])
    y = np.array([[polynome_2(v)] for v in x])

    sep_index = int(prop * size)
    train_X = x[:sep_index]
    train_y = y[:sep_index]

    valid_X = x[sep_index:]
    valid_y = y[sep_index:]

    print(f"train_X shape: {np.shape(train_X)}")
    print(f"train_y shape: {np.shape(train_y)}")
    print(f"valid_X shape: {np.shape(valid_X)}")
    print(f"valid_y shape: {np.shape(valid_y)}")
    return (train_X, train_y), (valid_X, valid_y)


if __name__ == "__main__":
    load_data()

You can test the load_data function:

bash
python load_data.py

The expected output is:

[Out]
train_X shape: (8000, 10)
train_y shape: (8000, 1)
valid_X shape: (2000, 10)
valid_y shape: (2000, 1)

The Keras model

model_run.py contains the code for the neural network that we will train.

The model is implemented in the run() function below. We will provide this function to DeepHyper, which will call it to evaluate various hyperparameter settings. This function takes a point argument, which is a dictionary of tunable hyperparameters. In this case, we will tune:

  • The number of units of the Dense hidden layer (point['units'])

  • The activation function of the Dense layer (point['activation'])

  • The learning rate of the RMSprop optimizer (point['lr']).

After training, the validation \(R^{2}\) is returned by the run() function. This return value is the objective for maximization by the DeepHyper HPS search algorithm.

Step 1: polynome2/model_run.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
import numpy as np
import keras.backend as K
import keras
from keras.callbacks import EarlyStopping
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import RMSprop

import os
import sys

here = os.path.dirname(os.path.abspath(__file__))
sys.path.insert(0, here)
from load_data import load_data


def r2(y_true, y_pred):
    SS_res = keras.backend.sum(keras.backend.square(y_true - y_pred), axis=0)
    SS_tot = keras.backend.sum(
        keras.backend.square(y_true - keras.backend.mean(y_true, axis=0)), axis=0
    )
    output_scores = 1 - SS_res / (SS_tot + keras.backend.epsilon())
    r2 = keras.backend.mean(output_scores)
    return r2


HISTORY = None


def run(point):
    global HISTORY
    (x_train, y_train), (x_valid, y_valid) = load_data()

    model = Sequential()
    model.add(
        Dense(
            point["units"],
            activation=point["activation"],
            input_shape=tuple(np.shape(x_train)[1:]),
        )
    )
    model.add(Dense(1))

    model.summary()

    model.compile(loss="mse", optimizer=RMSprop(lr=point["lr"]), metrics=[r2])

    history = model.fit(
        x_train,
        y_train,
        batch_size=64,
        epochs=1000,
        verbose=1,
        callbacks=[EarlyStopping(monitor="val_r2", mode="max", verbose=1, patience=10)],
        validation_data=(x_valid, y_valid),
    )

    HISTORY = history.history

    return history.history["val_r2"][-1]


if __name__ == "__main__":
    point = {"units": 10, "activation": "relu", "lr": 0.01}
    objective = run(point)
    print("objective: ", objective)
    import matplotlib.pyplot as plt

    plt.plot(HISTORY["val_r2"])
    plt.xlabel("Epochs")
    plt.ylabel("Objective: $R^2$")
    plt.grid()
    plt.show()

Note

Adding an EarlyStopping(...) callback is a good idea to stop the training of your model as soon as it is stops to improve.

...
callbacks=[EarlyStopping(
                    monitor='val_r2',
                    mode='max',
                    verbose=1,
                    patience=10
                )]
...

We can first train this model to evaluate the baseline accuracy:

bash
python model_run.py
[Out]
objective: -0.00040728187561035154
../_images/model_step_0_val_r2.png

Defining the HPS Problem space

The run function in model_run.py expects a hyperparameter dictionary with three keys: units, activation, and lr. We define the acceptable ranges for these hyperparameters with the Problem object inside ``problem.py`. Hyperparameter ranges are defined using the following syntax:

  • Discrete integer ranges are generated from a tuple: (lower: int, upper: int)

  • Continous parameters are generated from a tuple: (lower: float, upper: float)

  • Categorical or nonordinal hyperparameters ranges can be given as a list of possible values: [val1, val2, ...]

You probably have one or more “reference” sets of hyperparameters that are either hand-crafted or chosen by intuition. To bootstrap the search with these so-called starting points, use the add_starting_point(...) method.

Note

Several starting points can be defined with Problem.add_starting_point(**dims). All starting points will be evaluated before generating other evaluations.

polynome2/problem.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from deephyper.problem import HpProblem

Problem = HpProblem()

Problem.add_dim("units", (1, 100))
Problem.add_dim("activation", [None, "relu", "sigmoid", "tanh"])
Problem.add_dim("lr", (0.0001, 1.0))

Problem.add_starting_point(units=10, activation=None, lr=0.01)

if __name__ == "__main__":
    print(Problem)

You can look at the representation of your problem:

bash
python problem.py

The expected output is:

[Out]
Problem
{ 'activation': [None, 'relu', 'sigmoid', 'tanh'],
'lr': (0.0001, 1.0),
'units': (1, 100)}

Starting Point
{0: {'activation': None, 'lr': 0.01, 'units': 10}}

Running the search locally

Everything is ready to run. Recall the Python files defining our experiment:

polynome2/
      __init__.py
      load_data.py
      model_run.py
      problem.py

We have tested the syntax in all of these by running them individually. Now, let’s put it all together by tuning the 3 hyperparameters with asynchronous model-based search (AMBS).

bash
deephyper hps ambs --problem hps_demo.polynome2.problem.Problem --run hps_demo.polynome2.model_run.run

Note

In order to run DeepHyper locally and on other systems we are using Evaluator Interface. For local evaluations we use the SubprocessEvaluator.

Note

Alternative to the command line above, paths to the problem.py and model_run.py files can be passed as arguments. DeepHyper requires that these modules contain an importable Problem instance and run callable, respectively. It is your responsibility to ensure that any other modules imported in problem.py or model_run.py are in the Python import search path.

We strongly recommend using a virtual environment with the start-project and new-problem command line tools. This ensures that any helper modules are easily accessible using the syntax import problem_name.helper_module.

After the search is over, you will find the following files in your working directory:

deephyper.log
results.csv
results.json

Deephyper analytics

We will use the deephyper-analytics command line tool to investigate the results.

Note

See the Analytics installation instructions of deephyper-analytics.

Run:

bash
deephyper-analytics hps -p results.csv

Then start jupyter:

bash
jupyter notebook

Open the dh-analytics-hps notebook and run it:

path to data file: polynome2/results.csv

for customization please see: https://matplotlib.org/api/matplotlib_configuration_api.html

Setup & data loading

path_to_data_file = 'polynome2/results.csv'
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from pprint import pprint
from datetime import datetime
from tqdm import tqdm
from IPython.display import display, Markdown

width = 21
height = 13

matplotlib.rcParams.update({
    'font.size': 21,
    'figure.figsize': (width, height),
    'figure.facecolor': 'white',
    'savefig.dpi': 72,
    'figure.subplot.bottom': 0.125,
    'figure.edgecolor': 'white',
    'xtick.labelsize': 21,
    'ytick.labelsize': 21})

df = pd.read_csv(path_to_data_file)

display(Markdown(f'The search did _{df.count()[0]}_ **evaluations**.'))

df.head()

The search did 88 evaluations.

activation lr units objective elapsed_sec
0 NaN 0.010000 10 -67.720345 4.683628
1 sigmoid 0.210479 78 -47.973845 7.850657
2 sigmoid 0.849683 18 -7.910984 11.379633
3 tanh 0.951716 19 -2.596602 16.031375
4 sigmoid 0.898754 74 -21.409714 19.312386

Statistical summary

df.describe()
lr units objective elapsed_sec
count 100.000000 100.00000 100.000000 100.000000
mean 0.861301 13.12000 -3.468272 188.652953
std 0.112005 10.78746 11.586969 116.032871
min 0.010000 1.00000 -74.376173 4.683628
25% 0.861376 7.75000 -2.011465 87.576996
50% 0.871134 11.50000 -0.092576 178.604464
75% 0.876806 15.00000 0.494384 288.718287
max 0.997793 78.00000 0.746590 399.764441

Search trajectory

plt.plot(df.elapsed_sec, df.objective)
plt.ylabel('Objective')
plt.xlabel('Time (s.)')
plt.xlim(0)
plt.grid()
plt.show()
../_images/output_6_0.png

Pairplots

not_include = ['elapsed_sec']
sns.pairplot(df.loc[:, filter(lambda n: n not in not_include, df.columns)],
                diag_kind="kde", markers="+",
                plot_kws=dict(s=50, edgecolor="b", linewidth=1),
                diag_kws=dict(shade=True))
plt.show()
../_images/output_8_0.png
corr = df.loc[:, filter(lambda n: n not in not_include, df.columns)].corr()
sns.heatmap(corr, xticklabels=corr.columns, yticklabels=corr.columns, cmap=sns.diverging_palette(220, 10, as_cmap=True))
plt.show()
../_images/output_9_0.png

Best objective

i_max = df.objective.idxmax()
df.iloc[i_max]
activation         relu
lr             0.882041
units                21
objective       0.74659
elapsed_sec     394.818
Name: 98, dtype: object
dict(df.iloc[i_max])
{'activation': 'relu',
 'lr': 0.8820413612862609,
 'units': 21,
 'objective': 0.7465898108482361,
 'elapsed_sec': 394.81818103790283}

The best point the search found:

point = {
    'activation': 'relu',
    'lr': 0.8820413612862609,
    'units': 21
}

Just pass this point to your run function

Step 1: polynome2/model_run.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75

import numpy as np
import keras.backend as K
import keras
from keras.callbacks import EarlyStopping
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import RMSprop

import os
import sys
here = os.path.dirname(os.path.abspath(__file__))
sys.path.insert(0, here)
from load_data import load_data


def r2(y_true, y_pred):
    SS_res = keras.backend.sum(keras.backend.square(y_true - y_pred), axis=0)
    SS_tot = keras.backend.sum(
        keras.backend.square(y_true - keras.backend.mean(y_true, axis=0)), axis=0
    )
    output_scores = 1 - SS_res / (SS_tot + keras.backend.epsilon())
    r2 = keras.backend.mean(output_scores)
    return r2


HISTORY = None


def run(point):
    global HISTORY
    (x_train, y_train), (x_valid, y_valid) = load_data()

    model = Sequential()
    model.add(Dense(
        point['units'],
        activation=point['activation'],
        input_shape=tuple(np.shape(x_train)[1:])))
    model.add(Dense(1))

    model.summary()

    model.compile(loss='mse', optimizer=RMSprop(lr=point['lr']), metrics=[r2])

    history = model.fit(x_train, y_train,
                        batch_size=64,
                        epochs=1000,
                        verbose=1,
                        callbacks=[EarlyStopping(
                            monitor='val_r2',
                            mode='max',
                            verbose=1,
                            patience=10
                        )],
                        validation_data=(x_valid, y_valid))

    HISTORY = history.history

    return history.history['val_r2'][-1]


if __name__ == '__main__':
    point = {
        'activation': 'relu',
        'lr': 0.8820413612862609,
        'units': 21
    }
    objective = run(point)
    print('objective: ', objective)
    import matplotlib.pyplot as plt
    plt.plot(HISTORY['val_r2'])
    plt.xlabel('Epochs')
    plt.ylabel('Objective: $R^2$')
    plt.grid()
    plt.show()

And run the script:

bash
python model_run.py
[Out]
objective:  0.47821942329406736
../_images/model_step_1_val_r2.png

Running the search on ALCF’s Theta and Cooley

Now let’s run the same search, but scale out to run parallel model evaluations across the nodes of an HPC system such as Theta or Cooley. First create a Balsam database:

bash
$ balsam init polydb

Start and connect to the polydb database:

bash
$ source balsamactivate polydb

Set up the demo polynome2 problem, as before:

bash
$ deephyper start-project hps_demo
$ cd hps_demo/hps_demo/
$ deephyper new-problem hps polynome2

Use the balsam-submit command to set up and dispatch an AMBS job to the local scheduler:

bash
$ deephyper balsam-submit hps polynome2_demo -p hps_demo.polynome2.problem.Problem -r hps_demo.polynome2.model_run.run  \
   -t 30 -q debug-cache-quad -n 4 -A datascience -j mpi
[Out]
Validating Problem...OK
Validating run...OK
Bootstrapping apps...OK
Creating HPS(AMBS) BalsamJob...OK
Performing job submission...
Submit OK: Qlaunch {   'command': '/lus/theta-fs0/projects/datascience/msalim/deephyper/deephyper/db/qsubmit/qlaunch12.sh',
    'from_balsam': True,
    'id': 12,
    'job_mode': 'mpi',
    'nodes': 4,
    'prescheduled_only': False,
    'project': 'datascience',
    'queue': 'debug-cache-quad',
    'scheduler_id': 370907,
    'state': 'submitted',
    'wall_minutes': 30,
    'wf_filter': 'test_hps'}
**************************************************************************************************************************************
Success. The search will run at: /myprojects/deephyper/deephyper/db/data/test_hps/test_hps_2ef063ce
**************************************************************************************************************************************

Above, balsam-submit takes the following arguments:

  1. The first positional argument mode is either hps or nas

  2. The second positional argument workflow must be a unique identifier for the run. An error will be raised if this workflow already exists.

  3. -p Problem and -r Run arguments define the search, as before

  4. -t 60 indicates the walltime (minutes) of the scheduled job

  5. -n 4 requests four nodes on which to run the search. DeepHyper will automatically scale the search out across available nodes.

  6. -q Queue and -A Project pass the name of the job queue and project allocation to the HPC scheduler

  7. -j or --job-mode must be either mpi or serial. This controls how Balsam launches your model_runs.

Once the search is done, you will find results in the directory shown in the banner: /myprojects/deephyper/deephyper/db/data/test_hps/test_hps_2ef063ce.

Note

The examples so far assume that your DeepHyper models run in the same Python environment as DeepHyper and each model runs on a single node. If you need more control over model execution, say, to run containerized models, or to run data-parallel model training with Horovod, you can hook into the Balsam job controller. See Configuring model execution with Balsam for a detailed example.