Hyperparameter search for text classification#

Author(s): Romain Egele, Brett Eiffert.

In this tutorial we present how to use hyperparameter optimization on a text classification analysis example from the Pytorch documentation.

Reference: This tutorial is based on materials from the Pytorch Documentation: Text classification with the torchtext library

%%bash
pip install deephyper
pip install ray
pip install torch torchtext torchdata

Imports#

import ray
import json
import pandas as pd
from functools import partial

import torch

from torchtext.data.utils import get_tokenizer
from torchtext.data.functional import to_map_style_dataset
from torchtext.vocab import build_vocab_from_iterator

from torch.utils.data import DataLoader
from torch.utils.data.dataset import random_split

from torch import nn

Note

The following can be used to detect if <b>CUDA</b> devices are available on the current host. Therefore, this notebook will automatically adapt the parallel execution based on the ressources available locally. However, it will not be the case if many compute nodes are requested.

The dataset#

The torchtext library provides a few raw dataset iterators, which yield the raw text strings. For example, the AG_NEWS dataset iterators yield the raw data as a tuple of label and text. It has four labels (1 : World 2 : Sports 3 : Business 4 : Sci/Tec).

from torchtext.datasets import AG_NEWS

def load_data(train_ratio, fast=False):
    train_iter, test_iter = AG_NEWS()
    train_dataset = to_map_style_dataset(train_iter)
    test_dataset = to_map_style_dataset(test_iter)
    num_train = int(len(train_dataset) * train_ratio)
    split_train, split_valid = \
        random_split(train_dataset, [num_train, len(train_dataset) - num_train])

    ## downsample
    if fast:
        split_train, _ = random_split(split_train, [int(len(split_train)*.05), int(len(split_train)*.95)])
        split_valid, _ = random_split(split_valid, [int(len(split_valid)*.05), int(len(split_valid)*.95)])
        test_dataset, _ = random_split(test_dataset, [int(len(test_dataset)*.05), int(len(test_dataset)*.95)])

    return split_train, split_valid, test_dataset

Preprocessing pipelines and Batch generation#

Here is an example for typical NLP data processing with tokenizer and vocabulary. The first step is to build a vocabulary with the raw training dataset. Here we use built in factory function build_vocab_from_iterator which accepts iterator that yield list or iterator of tokens. Users can also pass any special symbols to be added to the vocabulary.

The vocabulary block converts a list of tokens into integers.

vocab(['here', 'is', 'an', 'example'])
>>> [475, 21, 30, 5286]

The text pipeline converts a text string into a list of integers based on the lookup table defined in the vocabulary. The label pipeline converts the label into integers. For example,

text_pipeline('here is the an example')
>>> [475, 21, 2, 30, 5286]
label_pipeline('10')
>>> 9
train_iter = AG_NEWS(split='train')
num_class = 4

tokenizer = get_tokenizer('basic_english')

def yield_tokens(data_iter):
    for _, text in data_iter:
        yield tokenizer(text)

vocab = build_vocab_from_iterator(yield_tokens(train_iter), specials=["<unk>"])
vocab.set_default_index(vocab["<unk>"])
vocab_size = len(vocab)

text_pipeline = lambda x: vocab(tokenizer(x))
label_pipeline = lambda x: int(x) - 1


def collate_batch(batch, device):
    label_list, text_list, offsets = [], [], [0]
    for (_label, _text) in batch:
        label_list.append(label_pipeline(_label))
        processed_text = torch.tensor(text_pipeline(_text), dtype=torch.int64)
        text_list.append(processed_text)
        offsets.append(processed_text.size(0))
    label_list = torch.tensor(label_list, dtype=torch.int64)
    offsets = torch.tensor(offsets[:-1]).cumsum(dim=0)
    text_list = torch.cat(text_list)
    return label_list.to(device), text_list.to(device), offsets.to(device)

Note

The collate_fn function works on a batch of samples generated from DataLoader. The input to collate_fn is a batch of data with the batch size in DataLoader, and collate_fn processes them according to the data processing pipelines declared previously.

Define the model#

The model is composed of the nn.EmbeddingBag layer plus a linear layer for the classification purpose.

class TextClassificationModel(nn.Module):

    def __init__(self, vocab_size, embed_dim, num_class):
        super(TextClassificationModel, self).__init__()
        self.embedding = nn.EmbeddingBag(vocab_size, embed_dim, sparse=False)
        self.fc = nn.Linear(embed_dim, num_class)
        self.init_weights()

    def init_weights(self):
        initrange = 0.5
        self.embedding.weight.data.uniform_(-initrange, initrange)
        self.fc.weight.data.uniform_(-initrange, initrange)
        self.fc.bias.data.zero_()

    def forward(self, text, offsets):
        embedded = self.embedding(text, offsets)
        return self.fc(embedded)

Define functions to train the model and evaluate results.#

def train(model, criterion, optimizer, dataloader):
    model.train()

    for _, (label, text, offsets) in enumerate(dataloader):
        optimizer.zero_grad()
        predicted_label = model(text, offsets)
        loss = criterion(predicted_label, label)
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 0.1)
        optimizer.step()

def evaluate(model, dataloader):
    model.eval()
    total_acc, total_count = 0, 0

    with torch.no_grad():
        for _, (label, text, offsets) in enumerate(dataloader):
            predicted_label = model(text, offsets)
            total_acc += (predicted_label.argmax(1) == label).sum().item()
            total_count += label.size(0)
    return total_acc/total_count

Define the run-function#

The run-function defines how the objective that we want to maximize is computed. It takes a config dictionary as input and often returns a scalar value that we want to maximize. The config contains a sample value of hyperparameters that we want to tune. In this example we will search for:

  • num_epochs (default value: 10)

  • batch_size (default value: 64)

  • learning_rate (default value: 5)

A hyperparameter value can be acessed easily in the dictionary through the corresponding key, for example config["units"].

def get_run(train_ratio=0.95):
  def run(config: dict):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    embed_dim = 64

    collate_fn = partial(collate_batch, device=device)
    split_train, split_valid, _ = load_data(train_ratio, fast=True) # set fast=false for longer running, more accurate example
    train_dataloader = DataLoader(split_train, batch_size=int(config["batch_size"]),
                                shuffle=True, collate_fn=collate_fn)
    valid_dataloader = DataLoader(split_valid, batch_size=int(config["batch_size"]),
                                shuffle=True, collate_fn=collate_fn)

    model = TextClassificationModel(vocab_size, int(embed_dim), num_class).to(device)

    criterion = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=config["learning_rate"])

    for _ in range(1, int(config["num_epochs"]) + 1):
        train(model, criterion, optimizer, train_dataloader)

    accu_test = evaluate(model, valid_dataloader)
    return accu_test
  return run

We create two versions of run, one quicker to evaluate for the search, with a small training dataset, and another one, for performance evaluation, which uses a normal training/validation ratio.

quick_run = get_run(train_ratio=0.3)
perf_run = get_run(train_ratio=0.95)

Note

The objective maximised by DeepHyper is the scalar value returned by the run-function.

In this tutorial it corresponds to the validation accuracy of the model after training.

Define the Hyperparameter optimization problem#

Hyperparameter ranges are defined using the following syntax:

  • Discrete integer ranges are generated from a tuple (lower: int, upper: int)

  • Continuous prarameters are generated from a tuple (lower: float, upper: float)

  • Categorical or nonordinal hyperparameter ranges can be given as a list of possible values [val1, val2, ...]

We provide the default configuration of hyperparameters as a starting point of the problem.

from deephyper.hpo import HpProblem

problem = HpProblem()

# Discrete hyperparameter (sampled with uniform prior)
problem.add_hyperparameter((5, 20), "num_epochs", default_value=10)

# Discrete and Real hyperparameters (sampled with log-uniform)
problem.add_hyperparameter((8, 512, "log-uniform"), "batch_size", default_value=64)
problem.add_hyperparameter((0.1, 10, "log-uniform"), "learning_rate", default_value=5)

problem
Configuration space object:
  Hyperparameters:
    batch_size, Type: UniformInteger, Range: [8, 512], Default: 64, on log-scale
    learning_rate, Type: UniformFloat, Range: [0.1, 10.0], Default: 5.0, on log-scale
    num_epochs, Type: UniformInteger, Range: [5, 20], Default: 10

Evaluate a default configuration#

We evaluate the performance of the default set of hyperparameters provided in the Pytorch tutorial.

#We launch the Ray run-time and execute the `run` function
#with the default configuration
if is_gpu_available:
    if not(ray.is_initialized()):
        ray.init(num_cpus=n_gpus, num_gpus=n_gpus, log_to_driver=False)

    run_default = ray.remote(num_cpus=1, num_gpus=1)(perf_run)
    objective_default = ray.get(run_default.remote(problem.default_configuration))
else:
    if not(ray.is_initialized()):
        ray.init(num_cpus=1, log_to_driver=False)
    run_default = perf_run
    objective_default = run_default(problem.default_configuration)

print(f"Accuracy Default Configuration:  {objective_default:.3f}")
2025-03-18 00:44:17,179 INFO worker.py:1832 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
Accuracy Default Configuration:  0.867

Define the evaluator object#

The Evaluator object allows to change the parallelization backend used by DeepHyper. It is a standalone object which schedules the execution of remote tasks. All evaluators needs a run_function to be instantiated. Then a keyword method defines the backend (e.g., "ray") and the method_kwargs corresponds to keyword arguments of this chosen method.

evaluator = Evaluator.create(run_function, method, method_kwargs)

Once created the evaluator.num_workers gives access to the number of available parallel workers.

Finally, to submit and collect tasks to the evaluator one just needs to use the following interface:

configs = [...]
evaluator.submit(configs)
...
tasks_done = evaluator.get("BATCH", size=1) # For asynchronous
tasks_done = evaluator.get("ALL") # For batch synchronous

Warning

Each Evaluator saves its own state, therefore it is crucial to create a new evaluator when launching a fresh search.

from deephyper.evaluator import Evaluator
from deephyper.evaluator.callback import TqdmCallback

def get_evaluator(run_function):
    # Default arguments for Ray: 1 worker and 1 worker per evaluation
    method_kwargs = {
        "num_cpus": 1,
        "num_cpus_per_task": 1,
        "callbacks": [TqdmCallback()]
    }

    # If GPU devices are detected then it will create 'n_gpus' workers
    # and use 1 worker for each evaluation
    if is_gpu_available:
        method_kwargs["num_cpus"] = n_gpus
        method_kwargs["num_gpus"] = n_gpus
        method_kwargs["num_cpus_per_task"] = 1
        method_kwargs["num_gpus_per_task"] = 1

    evaluator = Evaluator.create(
        run_function,
        method="ray",
        method_kwargs=method_kwargs
    )
    print(f"Created new evaluator with {evaluator.num_workers} worker{'s' if evaluator.num_workers > 1 else ''} and config: {method_kwargs}", )

    return evaluator

evaluator_1 = get_evaluator(quick_run)
Created new evaluator with 1 worker and config: {'num_cpus': 1, 'num_cpus_per_task': 1, 'callbacks': [<deephyper.evaluator.callback.TqdmCallback object at 0x1339d1e20>]}

Define and run the Centralized Bayesian Optimization search (CBO)#

We create the CBO using the problem and evaluator defined above.

from deephyper.hpo import CBO

Instanciate the search with the problem and a specific evaluator

WARNING:root:Results file already exists, it will be renamed to /Users/35e/Projects/DeepHyper/deephyper/examples/examples_hpo/results_20250318-004434.csv

Note

All DeepHyper’s search algorithm have two stopping criteria:
  • max_evals (int): Defines the maximum number of evaluations that we want to perform. Default to -1 for an infinite number.

  • timeout (int): Defines a time budget (in seconds) before stopping the search. Default to None for an infinite time budget.

results = search.search(max_evals=30)
  0%|          | 0/30 [00:00<?, ?it/s]
  3%|▎         | 1/30 [00:00<00:00, 1901.32it/s, failures=0, objective=0.357]
  7%|▋         | 2/30 [00:04<01:08,  2.46s/it, failures=0, objective=0.357]
  7%|▋         | 2/30 [00:04<01:08,  2.46s/it, failures=0, objective=0.357]
 10%|█         | 3/30 [00:09<01:29,  3.30s/it, failures=0, objective=0.357]
 10%|█         | 3/30 [00:09<01:29,  3.30s/it, failures=0, objective=0.357]
 13%|█▎        | 4/30 [00:13<01:29,  3.45s/it, failures=0, objective=0.357]
 13%|█▎        | 4/30 [00:13<01:29,  3.45s/it, failures=0, objective=0.576]
 17%|█▋        | 5/30 [00:48<06:05, 14.63s/it, failures=0, objective=0.576]
 17%|█▋        | 5/30 [00:48<06:05, 14.63s/it, failures=0, objective=0.586]
 20%|██        | 6/30 [01:02<05:41, 14.24s/it, failures=0, objective=0.586]
 20%|██        | 6/30 [01:02<05:41, 14.24s/it, failures=0, objective=0.586]
 23%|██▎       | 7/30 [01:04<04:03, 10.57s/it, failures=0, objective=0.586]
 23%|██▎       | 7/30 [01:04<04:03, 10.57s/it, failures=0, objective=0.586]
 27%|██▋       | 8/30 [01:08<03:06,  8.48s/it, failures=0, objective=0.586]
 27%|██▋       | 8/30 [01:08<03:06,  8.48s/it, failures=0, objective=0.586]
 30%|███       | 9/30 [01:12<02:26,  6.97s/it, failures=0, objective=0.586]
 30%|███       | 9/30 [01:12<02:26,  6.97s/it, failures=0, objective=0.586]
 33%|███▎      | 10/30 [01:20<02:25,  7.28s/it, failures=0, objective=0.586]
 33%|███▎      | 10/30 [01:20<02:25,  7.28s/it, failures=0, objective=0.776]
 37%|███▋      | 11/30 [01:25<02:07,  6.71s/it, failures=0, objective=0.776]
 37%|███▋      | 11/30 [01:25<02:07,  6.71s/it, failures=0, objective=0.776]
 40%|████      | 12/30 [01:32<02:01,  6.78s/it, failures=0, objective=0.776]
 40%|████      | 12/30 [01:32<02:01,  6.78s/it, failures=0, objective=0.776]
 43%|████▎     | 13/30 [01:41<02:03,  7.27s/it, failures=0, objective=0.776]
 43%|████▎     | 13/30 [01:41<02:03,  7.27s/it, failures=0, objective=0.776]
 47%|████▋     | 14/30 [01:50<02:05,  7.83s/it, failures=0, objective=0.776]
 47%|████▋     | 14/30 [01:50<02:05,  7.83s/it, failures=0, objective=0.794]
 50%|█████     | 15/30 [01:56<01:51,  7.44s/it, failures=0, objective=0.794]
 50%|█████     | 15/30 [01:56<01:51,  7.44s/it, failures=0, objective=0.797]
 53%|█████▎    | 16/30 [02:03<01:38,  7.04s/it, failures=0, objective=0.797]
 53%|█████▎    | 16/30 [02:03<01:38,  7.04s/it, failures=0, objective=0.805]
 57%|█████▋    | 17/30 [02:08<01:24,  6.48s/it, failures=0, objective=0.805]
 57%|█████▋    | 17/30 [02:08<01:24,  6.48s/it, failures=0, objective=0.805]
 60%|██████    | 18/30 [02:10<01:03,  5.27s/it, failures=0, objective=0.805]
 60%|██████    | 18/30 [02:10<01:03,  5.27s/it, failures=0, objective=0.805]
 63%|██████▎   | 19/30 [02:16<00:59,  5.43s/it, failures=0, objective=0.805]
 63%|██████▎   | 19/30 [02:16<00:59,  5.43s/it, failures=0, objective=0.805]
 67%|██████▋   | 20/30 [02:23<00:58,  5.84s/it, failures=0, objective=0.805]
 67%|██████▋   | 20/30 [02:23<00:58,  5.84s/it, failures=0, objective=0.805]
 70%|███████   | 21/30 [02:28<00:50,  5.65s/it, failures=0, objective=0.805]
 70%|███████   | 21/30 [02:28<00:50,  5.65s/it, failures=0, objective=0.805]
 73%|███████▎  | 22/30 [02:33<00:44,  5.58s/it, failures=0, objective=0.805]
 73%|███████▎  | 22/30 [02:33<00:44,  5.58s/it, failures=0, objective=0.805]
 77%|███████▋  | 23/30 [02:39<00:40,  5.73s/it, failures=0, objective=0.805]
 77%|███████▋  | 23/30 [02:39<00:40,  5.73s/it, failures=0, objective=0.805]
 80%|████████  | 24/30 [03:01<01:03, 10.59s/it, failures=0, objective=0.805]
 80%|████████  | 24/30 [03:01<01:03, 10.59s/it, failures=0, objective=0.814]
 83%|████████▎ | 25/30 [03:27<01:15, 15.03s/it, failures=0, objective=0.814]
 83%|████████▎ | 25/30 [03:27<01:15, 15.03s/it, failures=0, objective=0.814]
 87%|████████▋ | 26/30 [03:50<01:09, 17.36s/it, failures=0, objective=0.814]
 87%|████████▋ | 26/30 [03:50<01:09, 17.36s/it, failures=0, objective=0.814]
 90%|█████████ | 27/30 [04:13<00:57, 19.06s/it, failures=0, objective=0.814]
 90%|█████████ | 27/30 [04:13<00:57, 19.06s/it, failures=0, objective=0.814]
 93%|█████████▎| 28/30 [04:31<00:37, 18.84s/it, failures=0, objective=0.814]
 93%|█████████▎| 28/30 [04:31<00:37, 18.84s/it, failures=0, objective=0.814]
 97%|█████████▋| 29/30 [05:12<00:25, 25.55s/it, failures=0, objective=0.814]
 97%|█████████▋| 29/30 [05:12<00:25, 25.55s/it, failures=0, objective=0.814]
100%|██████████| 30/30 [05:32<00:00, 23.86s/it, failures=0, objective=0.814]
100%|██████████| 30/30 [05:32<00:00, 23.86s/it, failures=0, objective=0.814]

The returned results is a Pandas Dataframe where columns are hyperparameters and information stored by the evaluator:

  • job_id is a unique identifier corresponding to the order of creation of tasks

  • objective is the value returned by the run-function

  • timestamp_submit is the time (in seconds) when the hyperparameter configuration was submitted by the Evaluator relative to the creation of the evaluator.

  • timestamp_gather is the time (in seconds) when the hyperparameter configuration was collected by the Evaluator relative to the creation of the evaluator.

p:batch_size p:learning_rate p:num_epochs objective job_id job_status m:timestamp_submit m:timestamp_gather
0 218 1.603682 5 0.356667 0 DONE 1.201744 10.083410
1 154 0.105166 16 0.301429 1 DONE 10.165432 15.010249
2 200 0.144150 17 0.294762 2 DONE 15.043195 19.502163
3 286 3.508177 15 0.576190 3 DONE 19.533200 23.204575
4 10 0.374523 16 0.586190 4 DONE 23.234220 58.793532
5 31 0.165805 16 0.398095 5 DONE 58.824440 72.247476
6 356 0.135121 11 0.265238 6 DONE 72.277940 75.083312
7 234 0.219208 15 0.308095 7 DONE 75.113756 79.015994
8 239 0.949777 14 0.439048 8 DONE 79.045637 82.634237
9 53 3.226746 14 0.776190 9 DONE 82.664993 90.600132
10 42 3.250431 7 0.664048 10 DONE 90.804079 96.020963
11 67 2.493013 14 0.702143 11 DONE 96.205311 102.946039
12 52 1.969489 14 0.706429 12 DONE 103.129317 111.346218
13 53 5.363429 16 0.793810 13 DONE 111.526737 120.479798
14 96 6.516995 17 0.796667 14 DONE 120.659979 127.026014
15 135 6.515920 19 0.804762 15 DONE 127.209640 133.142229
16 223 6.444707 20 0.737619 16 DONE 133.326585 138.320355
17 143 8.122533 5 0.534286 17 DONE 138.506371 140.751131
18 137 8.871360 18 0.766429 18 DONE 140.939695 146.575524
19 109 4.227949 19 0.780952 19 DONE 146.764230 153.347868
20 123 5.800259 15 0.755000 20 DONE 153.535532 158.566730
21 164 5.829206 19 0.762143 21 DONE 158.759242 163.985452
22 135 3.215795 19 0.731667 22 DONE 164.173174 170.059973
23 22 6.480428 19 0.814286 23 DONE 170.249493 192.001965
24 17 3.658221 18 0.804048 24 DONE 192.192851 217.386762
25 20 2.774962 19 0.808810 25 DONE 217.577718 240.172801
26 20 1.502854 19 0.781429 26 DONE 240.363467 263.214262
27 28 3.161690 19 0.802143 27 DONE 263.547650 281.546914
28 11 8.864483 20 0.802381 28 DONE 281.738626 322.728173
29 21 4.391751 17 0.807143 29 DONE 322.925621 342.655973


Evaluate the best configuration#

Now that the search is over, let us print the best configuration found during this run and evaluate it on the full training dataset.

i_max = results.objective.argmax()
best_config = results.iloc[i_max][:-3].to_dict()
best_config = {k[2:]: v for k, v in best_config.items() if k.startswith("p:")}

print(f"The default configuration has an accuracy of {objective_default:.3f}. \n"
      f"The best configuration found by DeepHyper has an accuracy {results['objective'].iloc[i_max]:.3f}, \n"
      f"finished after {results['m:timestamp_gather'].iloc[i_max]:.2f} secondes of search.\n")

print(json.dumps(best_config, indent=4))
The default configuration has an accuracy of 0.867.
The best configuration found by DeepHyper has an accuracy 0.814,
finished after 192.00 secondes of search.

{
    "batch_size": 22,
    "learning_rate": 6.480428119634669,
    "num_epochs": 19
}
objective_best = perf_run(best_config)
print(f"Accuracy Best Configuration:  {objective_best:.3f}")
Accuracy Best Configuration:  0.903

Total running time of the script: (7 minutes 12.166 seconds)

Gallery generated by Sphinx-Gallery