2. Neural Architecture Search with Multiple Input Tensors

In this tutorial we will extend on the previous basic NAS tutorial to allow for varying numbers of input tensors. This calls for the construction of a novel search space where the different input tensors may be connected to any of the variable node operations within the search space. The data use for this tutorial is provided in this repository and is a multifidelity surrogate modeling data set obtained from the Brannin function. In addition, to the independent variables for this modeling task, low and medium fidelity estimates of the output variable are used as additional inputs to the eventual high fidelity surrogate. Thus, this requires multiple input tensors which may independently or jointly interact with the neural architecture.

Warning

By design asyncio does not allow nested event loops. Jupyter is using Tornado which already starts an event loop. Therefore the following patch is required to run this tutorial.

Some parts of this tutorial requires pydot (pip install pydot) and Graphviz (see installation instructions at https://graphviz.gitlab.io/download/).

[1]:
!pip install nest_asyncio

import nest_asyncio
nest_asyncio.apply()
Requirement already satisfied: nest_asyncio in /Users/romainegele/opt/anaconda3/envs/dh-dev/lib/python3.8/site-packages (1.5.1)

2.1. Data from Brannin Function

First, we will look at the load_data function that loads and returns the training and validation data from the multifidelity Brannin function. This dataset is provided in the deephyper/tutorials repository.

The output interface of the load_data function is important when you have several inputs or outputs. In this case, for the inputs we have a list of 3 numpy arrays.

[2]:
import os
import numpy as np


def load_data():
    """Here we are loading data that has multiple inputs for the same output.
    Our goal is to not make ONE tensor with all inputs but have separate input tensors.
    """

    data = np.load("data.npz") # Data from the Brannin function

    xtrain = data["xtrain"]  # Independent variables (input tensor 1)
    ytrain_lf = data["ytrain_lf"]  # Low fidelity variable (input tensor 2)
    ytrain_mf = data["ytrain_mf"]  # Medium fidelity variable (input tensor 3)
    ytrain_hf = data["ytrain_hf"]  # High fidelity variable (output tensor)

    xtest = data["xtest"]
    ytest_lf = data["ytest_lf"]
    ytest_mf = data["ytest_mf"]
    ytest_hf = data["ytest_hf"]

    return ([xtrain, ytrain_lf, ytrain_mf], ytrain_hf), ([xtest, ytest_lf, ytest_mf], ytest_hf)

2.2. Neural Architecture Search Space

Now we define a neural architecture search space with multiple inputs. In the build(self,...) method we can see that 3 input nodes are automatically created based on the defined input_shape:

...
self = KSearchSpace(input_shape, output_shape)

# Three input tensors are automatically created based on the `input_shape`
input_0, input_1, input_2 = self.input_nodes
...
[3]:
import collections

import tensorflow as tf

from deephyper.nas import KSearchSpace
from deephyper.nas.node import ConstantNode, VariableNode
from deephyper.nas.operation import operation, Zero, Connect, AddByProjecting, Identity

Dense = operation(tf.keras.layers.Dense)
Concatenate = operation(tf.keras.layers.Concatenate)


class MultiInputsResNetMLPFactory(KSearchSpace):

    def __init__(self, input_shape, output_shape, seed=None, num_layers=3, mode="regression"):
        super().__init__(input_shape, output_shape, seed=seed)

        self.num_layers = num_layers
        assert mode in ["regression", "classification"]
        self.mode = mode

    def build(self):

        assert len(self.input_shape) == 3

        # Three input tensors are automatically created based on the `input_shape`
        input_0, input_1, input_2 = self.input_nodes

        concat = ConstantNode(Concatenate())
        self.connect(input_0, concat)
        self.connect(input_1, concat)
        self.connect(input_2, concat)

        # Input anchors (recorded so they can be connected to anywhere
        # in the architecture)
        input_anchors = [input_1, input_2]

        # Creates a Queue to store outputs of the 3 previously created  layers
        # to create potential residual connections
        skip_anchors = collections.deque([input_0], maxlen=3)

        prev_input = concat
        for _ in range(self.num_layers):
            dense = VariableNode()
            self.add_dense_to_(dense)
            self.connect(prev_input, dense)

            # ConstantNode to merge possible residual connections from the different
            # input tensors (input_0, input_1, input_2)
            merge_0 = ConstantNode()
            merge_0.set_op(AddByProjecting(self, [dense], activation="relu"))

            # Creates potential connections to the various input tensors
            for anchor in input_anchors:
                skipco = VariableNode()
                skipco.add_op(Zero())
                skipco.add_op(Connect(self, anchor))
                self.connect(skipco, merge_0)

            # ConstantNode to merge possible
            merge_1 = ConstantNode()
            merge_1.set_op(AddByProjecting(self, [merge_0], activation="relu"))

            # a potential connection to the variable nodes (vnodes) of the previous layers
            for anchor in skip_anchors:
                skipco = VariableNode()
                skipco.add_op(Zero())
                skipco.add_op(Connect(self, anchor))
                self.connect(skipco, merge_1)

            # ! for next iter
            prev_input = merge_1
            skip_anchors.append(prev_input)

        if self.mode == "regression":
            output_node = ConstantNode(Dense(self.output_shape[0]))
            self.connect(prev_input, output_node)
        else:
            output_node = ConstantNode(Dense(self.output_shape[0], activation="softmax"))
            self.connect(prev_input, output_node)

        return self

    def add_dense_to_(self, node):
        node.add_op(Identity())  # we do not want to create a layer in this case

        activations = [
            tf.keras.activations.linear,
            tf.keras.activations.relu,
            tf.keras.activations.tanh,
            tf.keras.activations.sigmoid,
        ]
        for units in range(16, 97, 16):
            for activation in activations:
                node.add_op(Dense(units=units, activation=activation))

Visualize a randomly generated neural network from this search space:

[4]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from tensorflow.keras.utils import plot_model

shapes = dict(input_shape=[(2,), (1,), (1,)], output_shape=(1,))
space = MultiInputsResNetMLPFactory(**shapes, num_layers=3).build()
model = space.sample()
plot_model(model, show_shapes=False, show_layer_names=False)
2021-10-18 10:47:33.795028: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[4]:
../../../../_images/tutorials_tutorials_notebooks_06_NAS_with_Multi_Inputs_tutorial_06_7_1.png

2.3. Neural Architecture Search Problem

Now, let us define the neural architecture search problem:

[5]:
from deephyper.problem import NaProblem

problem = NaProblem()

problem.load_data(load_data)

problem.search_space(MultiInputsResNetMLPFactory, num_layers=3)

problem.hyperparameters(
    batch_size=64,
    learning_rate=0.001,
    optimizer="adam",
    num_epochs=200,
    callbacks=dict(
        EarlyStopping=dict(
            monitor="val_r2", mode="max", verbose=0, patience=5  # or 'val_acc' ?
        ),
        ModelCheckpoint=dict(
            monitor="val_loss",
            mode="min",
            save_best_only=True,
            verbose=0,
            filepath="model.h5",
            save_weights_only=False,
        ),
    ),
)

problem.loss("mse")

problem.metrics(["r2"])

problem.objective("val_r2")

problem
[5]:
Problem is:
 * SEED = 2019 *
    - search space   : __main__.MultiInputsResNetMLPFactory
    - data loading   : __main__.load_data
    - preprocessing  : None
    - hyperparameters:
        * verbose: 0
        * batch_size: 64
        * learning_rate: 0.001
        * optimizer: adam
        * num_epochs: 200
        * callbacks: {'EarlyStopping': {'monitor': 'val_r2', 'mode': 'max', 'verbose': 0, 'patience': 5}, 'ModelCheckpoint': {'monitor': 'val_loss', 'mode': 'min', 'save_best_only': True, 'verbose': 0, 'filepath': 'model.h5', 'save_weights_only': False}}
    - loss           : mse
    - metrics        :
        * r2
    - objective      : val_r2

2.5. Analyse the Results

[9]:
results
[9]:
arch_seq id objective elapsed_sec duration
0 [17, 1, 0, 1, 8, 1, 1, 1, 0, 11, 0, 0, 0, 0, 1] 1 -6.042844 9.502147 7.243829
1 [2, 0, 1, 1, 20, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1] 2 -0.177686 15.757653 13.499307
2 [24, 1, 1, 0, 16, 0, 0, 1, 1, 11, 0, 1, 0, 0, 0] 3 -10.429945 22.361082 12.857737
3 [16, 0, 1, 1, 24, 0, 1, 1, 1, 19, 0, 0, 0, 1, 0] 4 0.666291 23.130446 7.371762
4 [17, 1, 1, 0, 15, 0, 1, 1, 1, 21, 0, 0, 0, 0, 0] 6 -8.231862 24.167113 1.035446
5 [17, 1, 1, 1, 0, 1, 1, 1, 0, 14, 1, 0, 0, 0, 0] 5 0.868289 31.644777 9.282904
6 [24, 0, 1, 1, 21, 0, 0, 0, 1, 15, 1, 1, 1, 0, 0] 7 0.145526 37.514402 13.346285
7 [21, 0, 1, 0, 3, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0] 8 0.568356 39.630536 7.984823
8 [8, 1, 1, 1, 3, 1, 0, 1, 1, 11, 0, 0, 0, 1, 1] 10 0.084703 44.283168 4.651748
9 [18, 0, 0, 0, 21, 1, 0, 1, 0, 14, 1, 0, 0, 0, 1] 11 0.674340 49.046729 4.762594

The deephyper-analytics command line is a way of analyzing this type of file. For example, we want to output the best configuration we can use the topk functionnality.

[10]:
!deephyper-analytics topk results.csv -k 3
'0':
  arch_seq: '[17, 1, 1, 1, 0, 1, 1, 1, 0, 14, 1, 0, 0, 0, 0]'
  duration: 9.2829041481
  elapsed_sec: 31.6447768211
  id: 5
  objective: 0.8682892919
'1':
  arch_seq: '[18, 0, 0, 0, 21, 1, 0, 1, 0, 14, 1, 0, 0, 0, 1]'
  duration: 4.7625939846
  elapsed_sec: 49.0467288494
  id: 11
  objective: 0.6743404865
'2':
  arch_seq: '[16, 0, 1, 1, 24, 0, 1, 1, 1, 19, 0, 0, 0, 1, 0]'
  duration: 7.3717617989
  elapsed_sec: 23.1304457188
  id: 4
  objective: 0.6662912965

Output the best neural network architecture:

[11]:
import json

best_config = results.iloc[results.objective.argmax()][:-3].to_dict()
arch_seq = json.loads(best_config["arch_seq"])
model = space.sample(arch_seq)
plot_model(model, show_shapes=False, show_layer_names=False)
[11]:
../../../../_images/tutorials_tutorials_notebooks_06_NAS_with_Multi_Inputs_tutorial_06_20_0.png