Neural Architecture Search for LSTM Neural Networks#

Warning

Be sure to work in a virtual environment where you can easily pip install new packages. This typically entails using either Anaconda, virtualenv, or Pipenv.

In this tutorial example, we wil recreate results from our recent paper on LSTM search for surrogate modeling of geophysical flows (DOI:10.1109/SC41405.2020.00012).

Setting up the problem#

Let’s start by creating a new DeepHyper project workspace. This is a directory where you will create search problem instances that are automatically installed and importable across your Python environment.

bash#
deephyper start-project dh_project

A new dh_project directory is created, containing the following files:

dh_project/
    dh_project/
        __init__.py
    setup.py

We can now define our neural architecture search problem inside this directory. Let’s set up a NAS problem called lstm_search as follows:

bash#
cd dh_project/dh_project/
deephyper new-problem nas lstm_search

A new NAS problem subdirectory should be in place. This is a Python subpackage containing sample code in the files __init__.py, load_data.py, search_space.py, and problem.py. Overall, your project directory should look like:

dh_project/
    dh_project/
        __init__.py
        lstm_search/
            __init__.py
            load_data.py
            search_space.py
            problem.py
    setup.py

Load the data#

Todo

explain how the data were created.

Download the data:

lstm_search/download_data.sh#
#!/bin/bash

wget "https://github.com/deephyper/tutorials/blob/main/tutorials/12_NAS_LSTM/dh_project/dh_project/lstm_search/True_Train.npy?raw=true" -O True_Train.npy
wget "https://github.com/deephyper/tutorials/blob/main/tutorials/12_NAS_LSTM/dh_project/dh_project/lstm_search/True_Test.npy?raw=true" -O True_Test.npy

Todo

  • why are the test data used as validation data?

  • why is the preprocessing fit on both? Should be fit on trained data only?

Transform and preprocess the data:

lstm_search/load_data.py#
import os
import numpy as np

from sklearn.preprocessing import MinMaxScaler

# useful to locate data files with respect to this file
HERE = os.path.dirname(os.path.abspath(__file__))


def load_data(verbose=0):
    """
    Generate data for linear function -sum(x_i).

    Return:
        Tuple of Numpy arrays: ``(train_X, train_y), (valid_X, valid_y)``.
    """

    rs = np.random.RandomState(2018)

    train_data = np.load(os.path.join(HERE, "True_Train.npy"))
    valid_data = np.load(os.path.join(HERE, "True_Test.npy"))

    features_train = np.transpose(train_data)
    features_valid = np.transpose(valid_data)[:700, :]
    features = np.concatenate((features_train, features_valid), axis=0)

    states = np.copy(features[:, :])  # Rows are time, Columns are state values

    scaler = MinMaxScaler()
    states = scaler.fit_transform(states)

    seq_num = 8

    # Need to make batches of 10 input sequences and 1 output
    total_size = np.shape(features)[0] - 2 * seq_num
    input_seq = np.zeros(shape=(total_size, seq_num, np.shape(states)[1]))
    output_seq = np.zeros(shape=(total_size, seq_num, np.shape(states)[1]))

    for t in range(total_size):
        input_seq[t, :, :] = states[None, t : t + seq_num, :]
        output_seq[t, :, :] = states[None, t + seq_num : t + 2 * seq_num, :]

    idx = np.arange(total_size)
    rs.shuffle(idx)

    input_seq = input_seq[idx, :, :]
    output_seq = output_seq[idx, :]

    # Temporal split
    num_samples_train = 900

    input_seq_train = input_seq[:num_samples_train, :, :]
    output_seq_train = output_seq[:num_samples_train, :, :]

    input_seq_valid = input_seq[num_samples_train:, :, :]
    output_seq_valid = output_seq[num_samples_train:, :, :]

    if verbose:
        print("Train Shapes: ", np.shape(input_seq_train), np.shape(output_seq_train))
        print("Valid Shapes: ", np.shape(input_seq_valid), np.shape(output_seq_valid))

    # Interface to run training data must me respected
    return (input_seq_train, output_seq_train), (input_seq_valid, output_seq_valid)


if __name__ == "__main__":
    load_data(verbose=1)

Test the function with python load_data.py and get the following output:

Train Shapes:  (900, 8, 5) (900, 8, 5)
Valid Shapes:  (211, 8, 5) (211, 8, 5)

Performance of a baseline LSTM#

Todo

show the performance of a baseline lstm

lstm_search/baseline_lstm.py#

Define a neural architecture search space#

lstm_search/search_space.py#
import collections

import tensorflow as tf

from deephyper.nas.space import KSearchSpace, SpaceFactory
from deephyper.nas.space.node import ConstantNode, VariableNode
from deephyper.nas.space.op.basic import Zero
from deephyper.nas.space.op.connect import Connect
from deephyper.nas.space.op.merge import AddByProjecting
from deephyper.nas.space.op.op1d import Identity
from deephyper.nas.space.op import operation

# Convert a Keras layer to a DeepHyper operation
Dense = operation(tf.keras.layers.Dense)
LSTM = operation(tf.keras.layers.LSTM)

class StackedLSTMFactory(SpaceFactory):
    def build(
        self,
        input_shape,
        output_shape,
        num_layers=5,
        **kwargs,
    ):

        self.ss = KSearchSpace(input_shape, output_shape)
        output_dim = output_shape[1]
        source = prev_input = self.ss.input_nodes[0]

        # look over skip connections within a range of the 2 previous nodes
        anchor_points = collections.deque([source], maxlen=2)

        for _ in range(num_layers):
            lstm = VariableNode()
            self.add_lstm_seq_(lstm)
            self.ss.connect(prev_input, lstm)

            cmerge = ConstantNode()
            cmerge.set_op(AddByProjecting(self.ss, [lstm], activation="relu"))

            for anchor in anchor_points:
                skipco = VariableNode()
                skipco.add_op(Zero())
                skipco.add_op(Connect(self.ss, anchor))
                self.ss.connect(skipco, cmerge)

            # ! for next iter
            prev_input = cmerge
            anchor_points.append(prev_input)

        y = ConstantNode(LSTM(output_dim, return_sequences=True))
        self.ss.connect(prev_input, y)

        return self.ss

    def add_lstm_seq_(self, node):
        node.add_op(Identity())  # we do not want to create a layer in this case
        for units in range(16, 97, 16):
            node.add_op(LSTM(units=units, return_sequences=True))


def create_search_space(
    input_shape=(
        8,
        5,
    ),
    output_shape=(
        8,
        5,
    ),
    num_layers=10,
    **kwargs,
):
    return StackedLSTMFactory()(
        input_shape, output_shape, num_layers=num_layers, **kwargs
    )


if __name__ == "__main__":
    shapes = dict(input_shape=(8, 5,), output_shape=(8, 5,))
    factory = StackedLSTMFactory()
    factory.plot_model(**shapes)

An example of a randomly generated architecture from this search space:

random model from stacked lstm search space

Create a problem instance#

lstm_search/problem.py#
from deephyper.problem import NaProblem
from dh_project.lstm_search.load_data import load_data
from dh_project.lstm_search.search_space import create_search_space

Problem = NaProblem(seed=2019)

Problem.load_data(load_data)

Problem.search_space(create_search_space, num_layers=5)

Problem.hyperparameters(
    batch_size=32,
    learning_rate=0.001,
    optimizer='adam',
    num_epochs=100,
    callbacks=dict(
        EarlyStopping=dict(
            monitor='val_r2',
            mode='max',
            verbose=0,
            patience=5
        )
    )
)

Problem.loss('mse')

Problem.metrics(['r2'])

Problem.objective('val_r2')


# Just to print your problem, to test its definition and imports in the current python environment.
if __name__ == '__main__':
    print(Problem)

Execute the search locally#

Everything is ready to run. Let’s remember the structure of our experiment:

lstm_search/
    __init__.py
    load_data.py
    problem.py
    search_space.py

Each of these files can also be tested one by one on the local machine (see tutorial-04 for details). Next, we will run a random search (RDM).

bash#
deephyper nas random --evaluator ray --problem dh_project.lstm_search.problem.Problem --max-evals 10 --num-workers 2

Note

In order to run DeepHyper locally and on other systems we are using deephyper.evaluator. For local evaluations we can use the deephyper.evaluator.RayEvaluator or the deephyper.evaluator.SubProcessEvaluator.

After the search is over, you will find the following files in your current folder:

deephyper.log
init_infos.json
results.csv
save/

Todo

  • analyse the resutls of the search (plot + topk)

  • shows the best model found

bash#
deephyper-analytics topk results.csv -k 3 -o topk.json

To visualize or use this best architecture you can recreated the Keras model this way:

lstm_search/best_model.py#
import json
import tensorflow as tf
from dh_project.lstm_search.problem import Problem

# Edit the path if necessary
path_to_topk_json = "topk.json"

# Load the json file
with open(path_to_topk_json, "r") as f:
    topk = json.load(f)

# Convert the arch_seq (a str now) to a list
arch_seq = json.loads(topk["0"]["arch_seq"])

# Create the Keras model using the problem
model = Problem.get_keras_model(arch_seq)
tf.keras.utils.plot_model(model, "best_model.png")

Then, execute the scrip python best_model.py and visualize the image produced:

best model found from stacked lstm search space