1. Hyperparameter search for text classification (Pytorch)

In this tutorial we present how to use hyperparameter optimization on a text classification analysis example from the Pytorch documentation.

Reference: This tutorial is based on materials from the Pytorch Documentation: Text classification with the torchtext library


By design asyncio does not allow nested event loops. Jupyter is using Tornado which already starts an event loop. Therefore the following patch is required to run this tutorial.

!pip install nest_asyncio

import nest_asyncio
Requirement already satisfied: nest_asyncio in /home/joceran/miniconda3/envs/deephyper/lib/python3.8/site-packages (1.5.1)

1.1. Imports

import ray
import json
import pandas as pd
from functools import partial

import torch

from torchtext.data.utils import get_tokenizer
from torchtext.data.functional import to_map_style_dataset
from torchtext.vocab import build_vocab_from_iterator

from torch.utils.data import DataLoader
from torch.utils.data.dataset import random_split

from torch import nn


The following can be used to detect if CUDA devices are available on the current host. Therefore, this notebook will automatically adapt the parallel execution based on the ressources available locally. However, it will not be the case if many compute nodes are requested.

is_gpu_available = torch.cuda.is_available()
n_gpus = torch.cuda.device_count()

1.2. The dataset

The torchtext library provides a few raw dataset iterators, which yield the raw text strings. For example, the AG_NEWS dataset iterators yield the raw data as a tuple of label and text. It has four labels (1 : World 2 : Sports 3 : Business 4 : Sci/Tec).

from torchtext.datasets import AG_NEWS

def load_data(train_ratio):
    train_iter, test_iter = AG_NEWS()
    train_dataset = to_map_style_dataset(train_iter)
    test_dataset = to_map_style_dataset(test_iter)
    num_train = int(len(train_dataset) * train_ratio)
    split_train, split_valid = \
        random_split(train_dataset, [num_train, len(train_dataset) - num_train])

    return split_train, split_valid, test_dataset

1.3. Preprocessing pipelines and Batch generation

Here is an example for typical NLP data processing with tokenizer and vocabulary. The first step is to build a vocabulary with the raw training dataset. Here we use built in factory function build_vocab_from_iterator which accepts iterator that yield list or iterator of tokens. Users can also pass any special symbols to be added to the vocabulary.

The vocabulary block converts a list of tokens into integers.

vocab(['here', 'is', 'an', 'example'])
>>> [475, 21, 30, 5286]

The text pipeline converts a text string into a list of integers based on the lookup table defined in the vocabulary. The label pipeline converts the label into integers. For example,

text_pipeline('here is the an example')
>>> [475, 21, 2, 30, 5286]
>>> 9
train_iter = AG_NEWS(split='train')
num_class = 4

tokenizer = get_tokenizer('basic_english')

def yield_tokens(data_iter):
    for _, text in data_iter:
        yield tokenizer(text)

vocab = build_vocab_from_iterator(yield_tokens(train_iter), specials=["<unk>"])
vocab_size = len(vocab)

text_pipeline = lambda x: vocab(tokenizer(x))
label_pipeline = lambda x: int(x) - 1

def collate_batch(batch, device):
    label_list, text_list, offsets = [], [], [0]
    for (_label, _text) in batch:
        processed_text = torch.tensor(text_pipeline(_text), dtype=torch.int64)
    label_list = torch.tensor(label_list, dtype=torch.int64)
    offsets = torch.tensor(offsets[:-1]).cumsum(dim=0)
    text_list = torch.cat(text_list)
    return label_list.to(device), text_list.to(device), offsets.to(device)

The collate_fn function works on a batch of samples generated from DataLoader. The input to collate_fn is a batch of data with the batch size in DataLoader, and collate_fn processes them according to the data processing pipelines declared previously.

1.4. Define the model

The model is composed of the nn.EmbeddingBag layer plus a linear layer for the classification purpose.

class TextClassificationModel(nn.Module):

    def __init__(self, vocab_size, embed_dim, num_class):
        super(TextClassificationModel, self).__init__()
        self.embedding = nn.EmbeddingBag(vocab_size, embed_dim, sparse=True)
        self.fc = nn.Linear(embed_dim, num_class)

    def init_weights(self):
        initrange = 0.5
        self.embedding.weight.data.uniform_(-initrange, initrange)
        self.fc.weight.data.uniform_(-initrange, initrange)

    def forward(self, text, offsets):
        embedded = self.embedding(text, offsets)
        return self.fc(embedded)

1.5. Define functions to train the model and evaluate results.

def train(model, criterion, optimizer, dataloader):

    for _, (label, text, offsets) in enumerate(dataloader):
        predicted_label = model(text, offsets)
        loss = criterion(predicted_label, label)
        torch.nn.utils.clip_grad_norm_(model.parameters(), 0.1)

def evaluate(model, dataloader):
    total_acc, total_count = 0, 0

    with torch.no_grad():
        for _, (label, text, offsets) in enumerate(dataloader):
            predicted_label = model(text, offsets)
            total_acc += (predicted_label.argmax(1) == label).sum().item()
            total_count += label.size(0)
    return total_acc/total_count

1.6. Define the run-function

The run-function defines how the objective that we want to maximize is computed. It takes a config dictionary as input and often returns a scalar value that we want to maximize. The config contains a sample value of hyperparameters that we want to tune. In this example we will search for:

  • num_epochs (default value: 10)

  • batch_size (default value: 64)

  • learning_rate (default value: 5)

A hyperparameter value can be acessed easily in the dictionary through the corresponding key, for example config["units"].

def get_run(train_ratio=0.95):
  def run(config: dict):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    embed_dim = 64

    collate_fn = partial(collate_batch, device=device)
    split_train, split_valid, _ = load_data(train_ratio)
    train_dataloader = DataLoader(split_train, batch_size=int(config["batch_size"]),
                                shuffle=True, collate_fn=collate_fn)
    valid_dataloader = DataLoader(split_valid, batch_size=int(config["batch_size"]),
                                shuffle=True, collate_fn=collate_fn)

    model = TextClassificationModel(vocab_size, int(embed_dim), num_class).to(device)

    criterion = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=config["learning_rate"])

    for _ in range(1, int(config["num_epochs"]) + 1):
        train(model, criterion, optimizer, train_dataloader)

    accu_test = evaluate(model, valid_dataloader)
    return accu_test
  return run

We create two versions of run, one quicker to evaluate for the seacrh, with a small training dataset, and another one, for performance evaluation, which uses a normal training/validation ratio.

quick_run = get_run(train_ratio=0.3)
perf_run = get_run(train_ratio=0.95)

The objective maximised by DeepHyper is the scalar value returned by the run-function.

In this tutorial it corresponds to the validation accuracy of the model after training.

1.7. Evaluate a default configuration

We evaluate the performance of the default set of hyperparameters provided in the Pytorch tutorial.

# We define a dictionnary for the default values
default_config = {
    "num_epochs": 10,
    "batch_size": 64,
    "learning_rate": 5,

# We launch the Ray run-time and execute the `run` function
# with the default configuration

if is_gpu_available:
    if not(ray.is_initialized()):
        ray.init(num_cpus=n_gpus, num_gpus=n_gpus, log_to_driver=False)

    run_default = ray.remote(num_cpus=1, num_gpus=1)(perf_run)
    objective_default = ray.get(run_default.remote(default_config))
    if not(ray.is_initialized()):
        ray.init(num_cpus=1, log_to_driver=False)
    run_default = perf_run
    objective_default = run_default(default_config)

print(f"Accuracy Default Configuration:  {objective_default:.3f}")
2021-10-20 18:10:07,961 INFO services.py:1245 -- View the Ray dashboard at
Accuracy Default Configuration:  0.906

1.8. Define the Hyperparameter optimization problem

Hyperparameter ranges are defined using the following syntax:

  • Discrete integer ranges are generated from a tuple (lower: int, upper: int)

  • Continuous prarameters are generated from a tuple (lower: float, upper: float)

  • Categorical or nonordinal hyperparameter ranges can be given as a list of possible values [val1, val2, ...]

We provide the default configuration of hyperparameters as a starting point of the problem.

from deephyper.problem import HpProblem

problem = HpProblem()
# Discrete hyperparameter (sampled with uniform prior)
problem.add_hyperparameter((5, 20), "num_epochs")
# Discrete and Real hyperparameters (sampled with log-uniform)
problem.add_hyperparameter((8, 256, "log-uniform"), "batch_size")
problem.add_hyperparameter((0.5, 5, "log-uniform"), "learning_rate")

# Add a starting point to try first
2021-10-20 18:12:29.671853: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-10-20 18:12:29.671904: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Configuration space object:
    batch_size, Type: UniformInteger, Range: [8, 256], Default: 45, on log-scale
    learning_rate, Type: UniformFloat, Range: [0.5, 5.0], Default: 1.5811388301, on log-scale
    num_epochs, Type: UniformInteger, Range: [5, 20], Default: 12

  Starting Point:
{0: {'batch_size': 64, 'learning_rate': 5, 'num_epochs': 10}}

1.9. Define the evaluator object

The Evaluator object allows to change the parallelization backend used by DeepHyper.
It is a standalone object which schedules the execution of remote tasks. All evaluators needs a run_function to be instantiated.
Then a keyword method defines the backend (e.g., "ray") and the method_kwargs corresponds to keyword arguments of this chosen method.
evaluator = Evaluator.create(run_function, method, method_kwargs)

Once created the evaluator.num_workers gives access to the number of available parallel workers.

Finally, to submit and collect tasks to the evaluator one just needs to use the following interface:

configs = [...]
tasks_done = evaluator.get("BATCH", size=1) # For asynchronous
tasks_done = evaluator.get("ALL") # For batch synchronous


Each Evaluator saves its own state, therefore it is crucial to create a new evaluator when launching a fresh search.

from deephyper.evaluator import Evaluator
from deephyper.evaluator.callback import LoggerCallback

def get_evaluator(run_function):
    # Default arguments for Ray: 1 worker and 1 worker per evaluation
    method_kwargs = {
        "num_cpus": 1,
        "num_cpus_per_task": 1,
        "callbacks": [LoggerCallback()]

    # If GPU devices are detected then it will create 'n_gpus' workers
    # and use 1 worker for each evaluation
    if is_gpu_available:
        method_kwargs["num_cpus"] = n_gpus
        method_kwargs["num_gpus"] = n_gpus
        method_kwargs["num_cpus_per_task"] = 1
        method_kwargs["num_gpus_per_task"] = 1

    evaluator = Evaluator.create(
    print(f"Created new evaluator with {evaluator.num_workers} worker{'s' if evaluator.num_workers > 1 else ''} and config: {method_kwargs}", )

    return evaluator

evaluator_1 = get_evaluator(quick_run)
Created new evaluator with 1 worker and config: {'num_cpus': 1, 'num_cpus_per_task': 1, 'callbacks': [<deephyper.evaluator.callback.LoggerCallback object at 0x7fdc677efaf0>]}

1.10. Define and run the asynchronous model-based search (AMBS)

We create the AMBS using the problem and evaluator defined above.

from deephyper.search.hps import AMBS
# Uncomment the following line to show the arguments of AMBS.
# Instanciate the search with the problem and a specific evaluator
search = AMBS(problem, evaluator_1)


All DeepHyper’s search algorithm have two stopping criteria:

        <li> <code>`max_evals (int)`</code>: Defines the maximum number of evaluations that we want to perform. Default to <code>-1</code> for an infinite number.</li>
        <li> <code>`timeout (int)`</code>: Defines a time budget (in seconds) before stopping the search. Default to <code>None</code> for an infinite time budget.</li>
results = search.search(max_evals=10)
[00001] -- best objective: 0.88673 -- received objective: 0.88673
[00002] -- best objective: 0.88673 -- received objective: 0.88354
[00003] -- best objective: 0.88796 -- received objective: 0.88796
[00004] -- best objective: 0.89012 -- received objective: 0.89012
[00005] -- best objective: 0.89294 -- received objective: 0.89294
[00006] -- best objective: 0.89294 -- received objective: 0.88995
[00007] -- best objective: 0.89693 -- received objective: 0.89693
[00008] -- best objective: 0.89693 -- received objective: 0.89171
[00009] -- best objective: 0.89693 -- received objective: 0.89379
[00010] -- best objective: 0.89693 -- received objective: 0.88664


The search call does not output any information about the current status of the search. However, a results.csv file is created in the local directly and can be visualized to see finished tasks.

The returned results is a Pandas Dataframe where columns are hyperparameters and information stored by the evaluator:

  • id is a unique identifier corresponding to the order of creation of tasks

  • objective is the value returned by the run-function

  • elapsed_sec is the time (in seconds) when the task completed since the creation of the evaluator.

  • duration is the duration (in seconds) of the task to be computed.

batch_size learning_rate num_epochs id objective elapsed_sec duration
0 64 5.000000 10 1 0.886726 57.968063 57.503992
1 70 3.307178 6 2 0.883536 93.855454 35.702444
2 102 4.980502 6 3 0.887964 128.688401 34.650392
3 157 4.174305 8 4 0.890119 170.045705 41.182382
4 16 3.834167 8 5 0.892940 236.750850 66.523159
5 26 3.565162 15 6 0.889952 334.821765 97.850210
6 14 0.762862 17 7 0.896929 473.556122 138.550948
7 42 1.415844 19 8 0.891714 579.014818 105.281558
8 12 0.640161 16 9 0.893786 720.144839 140.955280
9 13 0.939023 7 10 0.886643 784.013679 63.674891

1.11. Evaluate the best configuration

Now that the search is over, let us print the best configuration found during this run and evaluate it on the full training dataset.

i_max = results.objective.argmax()
best_config = results.iloc[i_max][:-3].to_dict()

print(f"The default configuration has an accuracy of {objective_default:.3f}. \n"
      f"The best configuration found by DeepHyper has an accuracy {results['objective'].iloc[i_max]:.3f}, \n"
      f"trained in {results['duration'].iloc[i_max]:.2f} secondes and \n"
      f"finished after {results['elapsed_sec'].iloc[i_max]:.2f} secondes of search.\n")

print(json.dumps(best_config, indent=4))
The default configuration has an accuracy of 0.906.
The best configuration found by DeepHyper has an accuracy 0.897,
trained in 138.55 secondes and
finished after 473.56 secondes of search.

    "batch_size": 14.0,
    "learning_rate": 0.7628621249544358,
    "num_epochs": 17.0,
    "id": 7.0
objective_best = perf_run(best_config)
print(f"Accuracy Best Configuration:  {objective_best:.3f}")
Accuracy Best Configuration:  0.913