Neural Architecture Search for Graph Neural Networks#

In this tutorial, we will design neural architecture search spaces for graph neural networks, specifically for message passing neural networks.

For related papers, please check

The search space have multiple input tensors, including node features, edge features, edge pairs (source and target node indices) and node masks (number of nodes before zero-padding). The output of this search space is customizable. In this tutorial, we use an example from the QM7 dataset, which has a scalar output. The QM7 dataset is from the Deepchem library.

There are two main variable nodes, namely mpnn_cell and gather_cell: * mpnn_cell is a message passing layer with a varierty of activation, aggregation, update functions, etc. * gather_cell is global graph gather layer with a variety of global pooling functions.

We also adopted skip-connection in the search space; that is, the output of the n-1 layer is the direct input to the n+1 layer. Users can modify the number of mpnn_cell and maximum skip-connection distance to control the flexibility of skip-connection.

We used random search and aging evolution (regularized evolution) to conduct architecture search. In the paper, we found aging evolution has good scalability. We also showed that the best architecture from the search outperforms the moleculenet benchmarks. Users are more than welcome to furthur modify the search space to boost the performance.

Install Deepchem and RDKit#

We need DeepChem for the benchmark datasets. RDKit is also required to convert molecule smile string to a graph representation.

!pip install deepchem
!conda install -c rdkit rdkit -y
Requirement already satisfied: deepchem in /Users/romainegele/miniforge3/envs/dh-arm/lib/python3.9/site-packages (2.5.0)
Requirement already satisfied: joblib in /Users/romainegele/miniforge3/envs/dh-arm/lib/python3.9/site-packages (from deepchem) (1.1.0)
Requirement already satisfied: scipy in /Users/romainegele/miniforge3/envs/dh-arm/lib/python3.9/site-packages (from deepchem) (1.7.2)
Requirement already satisfied: pandas in /Users/romainegele/miniforge3/envs/dh-arm/lib/python3.9/site-packages (from deepchem) (1.3.4)
Requirement already satisfied: numpy in /Users/romainegele/miniforge3/envs/dh-arm/lib/python3.9/site-packages (from deepchem) (1.21.4)
Requirement already satisfied: scikit-learn in /Users/romainegele/miniforge3/envs/dh-arm/lib/python3.9/site-packages (from deepchem) (1.0.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /Users/romainegele/miniforge3/envs/dh-arm/lib/python3.9/site-packages (from pandas->deepchem) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /Users/romainegele/miniforge3/envs/dh-arm/lib/python3.9/site-packages (from pandas->deepchem) (2021.3)
Requirement already satisfied: threadpoolctl>=2.0.0 in /Users/romainegele/miniforge3/envs/dh-arm/lib/python3.9/site-packages (from scikit-learn->deepchem) (3.0.0)
Requirement already satisfied: six>=1.5 in /Users/romainegele/miniforge3/envs/dh-arm/lib/python3.9/site-packages (from python-dateutil>=2.7.3->pandas->deepchem) (1.15.0)
Collecting package metadata (current_repodata.json): done
Solving environment: done

==> WARNING: A newer version of conda exists. <==
  current version: 4.10.3
  latest version: 4.11.0

Please update conda by running

    $ conda update -n base conda

# All requested packages already installed.

Imports and GPU Detection#


By design asyncio does not allow nested event loops. Jupyter is using Tornado which already starts an event loop. Therefore the following patch is required to run this tutorial.

!pip install nest_asyncio

import nest_asyncio
Requirement already satisfied: nest_asyncio in /Users/romainegele/miniforge3/envs/dh-arm/lib/python3.9/site-packages (1.5.1)
import json
import os
import pathlib
import shutil

!export TF_XLA_FLAGS=--tf_xla_enable_xla_devices

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tqdm import tqdm


The TF_CPP_MIN_LOG_LEVEL can be used to avoid the logging of Tensorflow DEBUG, INFO and WARNING statements.


The following can be used to detect if GPU devices are available on the current host.

import tensorflow as tf

available_gpus = tf.config.list_physical_devices("GPU")
n_gpus = len(available_gpus)

if n_gpus > 1:
    n_gpus -= 1
is_gpu_available = n_gpus > 0

if is_gpu_available:
    print(f"{n_gpus} GPU{'s are' if n_gpus > 1 else ' is'} available.")
    print("No GPU available")
No GPU available

Start Ray#

We launch the Ray run-time depending on the detected local ressources. If GPU(s) is(are) detected then 1 worker is started for each GPU. If not, then only 1 worker is started. You can start more workers by setting num_cpus=1 to a value greater than 1.


In the case of GPUs it is important to follow this scheme to avoid multiple processes (Ray workers vs current process) to lock the same GPU.

import ray

log_to_driver = False

if not(ray.is_initialized()):
    if is_gpu_available:
        ray.init(num_cpus=n_gpus, num_gpus=n_gpus, log_to_driver=log_to_driver)
        ray.init(num_cpus=1, log_to_driver=log_to_driver)
2021-12-13 11:47:14,280 INFO -- View the Ray dashboard at

Reformatting Graph Dataset#

Now, we will start by reformatting a benchmark QM7 from molecule-net. We will first generate data for a training set (used for estimation) and a testing set (used to evaluate the final performance). Then the training set will be sub-divided in a new training set (used to estimate the neural network weights) and validation set (used to estimate the neural network hyperparameters and architecture).

The data is converted from a deepchem weave object to a list containing node features, edge features, edge pairs, node masks and GCN attention coefficients. You may check the details here

from deepchem.molnet import load_qm7
from deephyper.contrib.mpnn import get_all_mol_feat

def load_data(test_only=0, verbose=0):
    _, (train_data, valid_data, test_data), _ = load_qm7(featurizer='Weave',

    max_node, max_edge = [], [] # used to zero-pad the node and edge features to maximum dimension

    for data in [train_data, valid_data, test_data]:
        x = data.X
        size = len(x) # number of molecules in a dataset
        max_node.append(np.max([x[i].nodes.shape[0] for i in range(size)]))
        max_edge.append(np.max([x[i].pairs.shape[0] for i in range(size)]))
    max_node = np.max(max_node)
    max_edge = np.max(max_edge)

    x_train, y_train = get_all_mol_feat(train_data, max_node, max_edge)
    x_valid, y_valid = get_all_mol_feat(valid_data, max_node, max_edge)
    x_test, y_test = get_all_mol_feat(test_data, max_node, max_edge)

    if verbose:
        print(f'x_train shape: {[x.shape for x in x_train]}')
        print(f'y_train shape: {y_train.shape}')
        print(f'x_valid shape: {[x.shape for x in x_valid]}')
        print(f'y_valid shape: {y_valid.shape}')
        print(f'x_test shape: {[x.shape for x in x_test]}')
        print(f'y_test shape: {y_test.shape}')

    if test_only:
        return (x_test, y_test)
        return (x_train, y_train), (x_valid, y_valid)

Then the code to split the training data in a new training set and a validation set corresponds to:

# ignore warning/errors when downloading the dataset for the first time
(x, y), (vx, vy) = load_data(verbose=1)
(tx , ty) = load_data(test_only=1)
x_train shape: [(5470, 22, 75), (5470, 506, 2), (5470, 506, 14), (5470, 22), (5470, 506)]
y_train shape: (5470, 1)
x_valid shape: [(684, 22, 75), (684, 506, 2), (684, 506, 14), (684, 22), (684, 506)]
y_valid shape: (684, 1)
x_test shape: [(684, 22, 75), (684, 506, 2), (684, 506, 14), (684, 22), (684, 506)]
y_test shape: (684, 1)


When it is possible to factorize the two previous function into one, DeepHyper interface requires a function which returns (train_inputs, train_outputs), (valid_inputs, valid_outputs).

Baseline Neural Network#

Let us define a baseline neural network based on a regular multi-layer perceptron architecture which learn the mean estimate and minimise the mean squared error.

import deephyper.layers as dhl
from tensorflow.keras.callbacks import ModelCheckpoint
def gnn_default_model(shape):
    node_ = tf.keras.layers.Input(shape[0])
    adj_ = tf.keras.layers.Input(shape[1], dtype=tf.int32)
    edge_ = tf.keras.layers.Input(shape[2])
    mask_ = tf.keras.layers.Input(shape[3])
    degree_ = tf.keras.layers.Input(shape[4])

    input_ = [node_, adj_, edge_, mask_, degree_]

    node = dhl.SparseMPNN(state_dim=32,
                       activation='elu')([node_, adj_, edge_, mask_, degree_])

    node = dhl.GlobalAttentionPool(128)(node)
    node = tf.keras.layers.Flatten()(node)
    node = tf.keras.layers.Dense(32, activation='relu')(node)

    output = tf.keras.layers.Dense(1, activation='linear')(node)

    model = tf.keras.Model(input_, output)

    return model

shape = [item.shape[1:] for item in x]
model = gnn_default_model(shape)

model.compile(optimizer="adam", loss='mse')
mc = ModelCheckpoint('gnn_model.h5', monitor='val_loss', mode='min', save_weights_only=True)

history =, y,
                    validation_data=(vx, vy),
2021-12-13 11:47:19.988355: I tensorflow/compiler/mlir/] None of the MLIR Optimization Passes are enabled (registered 2)
2021-12-13 11:47:19.988498: W tensorflow/core/platform/profile_utils/] Failed to get CPU frequency: 0 Hz
Epoch 1/5
43/43 [==============================] - 5s 106ms/step - loss: 1.0011 - val_loss: 1.0239
Epoch 2/5
43/43 [==============================] - 4s 102ms/step - loss: 0.9441 - val_loss: 0.9451
Epoch 3/5
43/43 [==============================] - 4s 105ms/step - loss: 0.9161 - val_loss: 0.9458
Epoch 4/5
43/43 [==============================] - 5s 105ms/step - loss: 0.8961 - val_loss: 0.9464
Epoch 5/5
43/43 [==============================] - 5s 108ms/step - loss: 0.9018 - val_loss: 0.9554

We can do a vizualisation of our learning curves to make sure the training and validation loss decrease correctly.

width = 8
height = width/1.618
plt.figure(figsize=(width, height))

plt.plot(history["loss"], label="training")
plt.plot(history["val_loss"], label="validation")



Also, let us look at the prediction on the test set.

pred_ty = model.predict(tx)

width = 4.5
height = width
plt.figure(figsize=(width, height))

vmin, vmax = -4, 4
plt.scatter(ty, pred_ty)
plt.plot([vmin, vmax], [vmin, vmax], 'k--')
plt.xlim([vmin, vmax])
plt.ylim([vmin, vmax])

Define the Neural Architecture Search Space#

The neural architecture search space is composed of discrete decision variables. For each decision variable we choose among a list of possible operation to perform (e.g., fully connected, ReLU). To define this search space, it is necessary to use two classes:

  • KSearchSpace (for Keras Search Space): represents a directed acyclic graph (DAG) in which each node represents a chosen operation. It represents the possible neural networks that can be created.

  • SpaceFactory: is a utilitiy class used to pack the logic of a search space definition and share it with others.

Then, inside a KSearchSpace we will have two types of nodes: * VariableNode: corresponds to discrete decision variables and are used to define a list of possible operation. * ConstantNode: corresponds to fixed operation in the search space (e.g., input/outputs)

Finally, it is possible to reuse any tf.keras.layers to define a KSearchSpace. However, it is important to wrap each layer in an operation to perform a lazy memory allocation of tensors.

We implement the constructor __init__ and build method of the RegressionSpace a subclass of KSearchSpace. The __init__ method interface is:

def __init__(self, input_shape, output_shape, **kwargs):

for the build method the interface is:

def build(self):
    return self

where: * input_shape corresponds to a tuple or a list of tuple indicating the shapes of inputs tensors. * output_shape corresponds to the same but of output_tensors. * **kwargs denotes that any other key word argument can be defined by the user.

import collections, itertools

from deephyper.nas import KSearchSpace
from deephyper.nas.node import ConstantNode, VariableNode
from deephyper.nas.operation import operation, Zero, Connect, AddByProjecting, Identity

# define operations
Flatten = operation(tf.keras.layers.Flatten)
Dense = operation(tf.keras.layers.Dense)

SparseMPNN = operation(dhl.SparseMPNN)
GlobalAttentionPool = operation(dhl.GlobalAttentionPool)
GlobalAttentionSumPool = operation(dhl.GlobalAttentionSumPool)
GlobalAvgPool = operation(dhl.GlobalAvgPool)
GlobalMaxPool = operation(dhl.GlobalMaxPool)
GlobalSumPool = operation(dhl.GlobalSumPool)

class MPNNSpace(KSearchSpace):

    def __init__(self, input_shape, output_shape, seed=None, num_layers=3):
        super().__init__(input_shape, output_shape, seed=seed)
        self.num_layers = 3

    def build(self):

        source = prev_input = self.input_nodes[0]
        prev_input1 = self.input_nodes[1]
        prev_input2 = self.input_nodes[2]
        prev_input3 = self.input_nodes[3]
        prev_input4 = self.input_nodes[4]

        anchor_points = collections.deque([source], maxlen=3)

        for _ in range(self.num_layers):
            graph_attn_cell = VariableNode()
            self.connect(prev_input, graph_attn_cell)
            self.connect(prev_input1, graph_attn_cell)
            self.connect(prev_input2, graph_attn_cell)
            self.connect(prev_input3, graph_attn_cell)
            self.connect(prev_input4, graph_attn_cell)

            merge = ConstantNode()
            merge.set_op(AddByProjecting(self, [graph_attn_cell], activation="relu"))

            for node in anchor_points:
                skipco = VariableNode()
                skipco.add_op(Connect(self, node))
                self.connect(skipco, merge)

            prev_input = merge


        global_pooling_node = VariableNode()
        self.connect(prev_input, global_pooling_node)
        prev_input = global_pooling_node

        flatten_node = ConstantNode(Flatten())
        self.connect(prev_input, flatten_node)

        # Output
        output = ConstantNode(Dense(self.output_shape[0]))
        self.connect(flatten_node, output)

        return self

    def mpnn_cell(self, node):
        state_dims = [4, 8, 16, 32]
        Ts = [1, 2, 3, 4]
        attn_methods = ["const", "gat", "sym-gat", "linear", "gen-linear", "cos"]
        attn_heads = [1, 2, 4, 6]
        aggr_methods = ["max", "mean", "sum"]
        update_methods = ["gru", "mlp"]
        activations = [tf.keras.activations.sigmoid,

        for hp in itertools.product(state_dims,
            (state_dim, T, attn_method, attn_head, aggr_method, update_method, activation) = hp


    def gather_cell(self, node):
        for functions in [GlobalSumPool, GlobalMaxPool, GlobalAvgPool]:
            for axis in [-1, -2]:

        for state_dim in [16, 32, 64]:

Let us visualize a few randomly sampled neural architecture from this search space.

import matplotlib.image as mpimg
from tensorflow.keras.utils import plot_model

input_shape = [item.shape[1:] for item in x]
output_shape = y.shape[1:]
shapes = dict(input_shape=input_shape, output_shape=output_shape)
space = MPNNSpace(**shapes).build()


images = []
for i in range(4):

    model = space.sample()
    plot_model(model, "random_model.png", show_shapes=False, show_layer_names=False)
    image = mpimg.imread("random_model.png")
[(0, 18431), (0, 1), (0, 18431), (0, 1), (0, 1), (0, 18431), (0, 1), (0, 1), (0, 1), (0, 10)]

Define the Neural Architecture Optimization Problem#

In order to define a neural architecture search problem we have to use the NaProblem class. This class gives access to different method for the user to customize the training settings of neural networks.

from deephyper.problem import NaProblem

problem = NaProblem()

# Bind a function which returns (train_input, train_output), (valid_input, valid_output)

# Bind a function which returns a search space and give some arguments for the `build` method
problem.search_space(MPNNSpace, num_layers=2)

# Define a set of fixed hyperparameters for all trained neural networks
    num_epochs=1, # lower fidelity

# Define the loss to minimize

# Define complementary metrics

# Define the maximized objective. Here we take the negative of the validation loss.

Problem is:
    - search space   : __main__.MPNNSpace
    - data loading   : __main__.load_data
    - preprocessing  : None
    - hyperparameters:
        * verbose: 0
        * batch_size: 128
        * learning_rate: 0.001
        * optimizer: adam
        * num_epochs: 1
    - loss           : mse
    - metrics        :
    - objective      : -val_loss


Adding an EarlyStopping(...) callback is a good idea to stop the training of your model as soon as it stops to improve.

EarlyStopping=dict(monitor="val_loss", mode="min", verbose=0, patience=3)

Define the Evaluator Object#

The Evaluator object is responsible of defining the backend used to distribute the function evaluation in DeepHyper.

from deephyper.evaluator import Evaluator
from deephyper.evaluator.callback import LoggerCallback

def get_evaluator(run_function):

    # Default arguments for Ray: 1 worker and 1 worker per evaluation
    method_kwargs = {
        "num_cpus": 1,
        "num_cpus_per_task": 1,
        "callbacks": [LoggerCallback()] # To interactively follow the finished evaluations,

    # If GPU devices are detected then it will create 'n_gpus' workers
    # and use 1 worker for each evaluation
    if is_gpu_available:
        method_kwargs["num_cpus"] = n_gpus
        method_kwargs["num_gpus"] = n_gpus
        method_kwargs["num_cpus_per_task"] = 1
        method_kwargs["num_gpus_per_task"] = 1

    evaluator = Evaluator.create(
    print(f"Created new evaluator with {evaluator.num_workers} worker{'s' if evaluator.num_workers > 1 else ''} and config: {method_kwargs}", )

    return evaluator

For neural architecture search a standard training pipeline is provided by the run_base_trainer function from the module.

from import run_base_trainer