Hyperparameter search for text classification

Hyperparameter search for text classification#

Author(s): Romain Egele, Brett Eiffert.

In this tutorial we present how to use hyperparameter optimization on a text classification analysis example from the Pytorch documentation.

Reference: This tutorial is based on materials from the Pytorch Documentation: Text classification with the torchtext library

%%bash
pip install deephyper ray numpy==1.26.4 torch torchtext==0.17.2 torchdata==0.7.1 'portalocker>=2.0.0'

Imports#

All imports used in the tutorial are declared at the top of the file.

Note

The following can be used to detect if CUDA devices are available on the current host. Therefore, this notebook will automatically adapt the parallel execution based on the ressources available locally. However, it will not be the case if many compute nodes are requested.

If GPU is available, this code will enabled the tutorial to use the GPU for pytorch operations.

The dataset#

The torchtext library provides a few raw dataset iterators, which yield the raw text strings. For example, the AG_NEWS dataset iterators yield the raw data as a tuple of label and text. It has four labels (1 : World 2 : Sports 3 : Business 4 : Sci/Tec).

Preprocessing pipelines and Batch generation#

Here is an example for typical NLP data processing with tokenizer and vocabulary. The first step is to build a vocabulary with the raw training dataset. Here we use built in factory function build_vocab_from_iterator which accepts iterator that yield list or iterator of tokens. Users can also pass any special symbols to be added to the vocabulary.

The vocabulary block converts a list of tokens into integers.

vocab(['here', 'is', 'an', 'example'])
>>> [475, 21, 30, 5286]

The text pipeline converts a text string into a list of integers based on the lookup table defined in the vocabulary. The label pipeline converts the label into integers. For example,

text_pipeline('here is the an example')
>>> [475, 21, 2, 30, 5286]
label_pipeline('10')
>>> 9

Note

The collate_fn function works on a batch of samples generated from DataLoader. The input to collate_fn is a batch of data with the batch size in DataLoader, and collate_fn processes them according to the data processing pipelines declared previously.

Define the model#

The model is composed of the nn.EmbeddingBag layer plus a linear layer for the classification purpose.

Define functions to train the model and evaluate results.#

Define the run-function#

The run-function defines how the objective that we want to maximize is computed. It takes a config dictionary as input and often returns a scalar value that we want to maximize. The config contains a sample value of hyperparameters that we want to tune. In this example we will search for:

num_epochs (default value: 10)
batch_size (default value: 64)
learning_rate (default value: 5)

A hyperparameter value can be acessed easily in the dictionary through the corresponding key, for example config["units"].

We create two versions of run, one quicker to evaluate for the search, with a small training dataset, and another one, for performance evaluation, which uses a normal training/validation ratio.

quick_run = get_run(train_ratio=0.3)
perf_run = get_run(train_ratio=0.95)

Note

The objective maximised by DeepHyper is the scalar value returned by the run-function.

In this tutorial it corresponds to the validation accuracy of the model after training.

Define the Hyperparameter optimization problem#

Hyperparameter ranges are defined using the following syntax:

Discrete integer ranges are generated from a tuple (lower: int, upper: int)
Continuous prarameters are generated from a tuple (lower: float, upper: float)
Categorical or nonordinal hyperparameter ranges can be given as a list of possible values [val1, val2, ...]

We provide the default configuration of hyperparameters as a starting point of the problem.

from deephyper.hpo import HpProblem

problem = HpProblem()

# Discrete hyperparameter (sampled with uniform prior)
problem.add_hyperparameter((5, 20), "num_epochs", default_value=10)

# Discrete and Real hyperparameters (sampled with log-uniform)
problem.add_hyperparameter((8, 512, "log-uniform"), "batch_size", default_value=64)
problem.add_hyperparameter((0.1, 10, "log-uniform"), "learning_rate", default_value=5)

problem

Configuration space object:
  Hyperparameters:
    batch_size, Type: UniformInteger, Range: [8, 512], Default: 64, on log-scale
    learning_rate, Type: UniformFloat, Range: [0.1, 10.0], Default: 5.0, on log-scale
    num_epochs, Type: UniformInteger, Range: [5, 20], Default: 10

Evaluate a default configuration#

We evaluate the performance of the default set of hyperparameters provided in the Pytorch tutorial.

#We launch the Ray run-time and execute the `run` function
#with the default configuration
if is_gpu_available:
    if not(ray.is_initialized()):
        ray.init(num_cpus=n_gpus, num_gpus=n_gpus, log_to_driver=False)

    run_default = ray.remote(num_cpus=1, num_gpus=1)(perf_run)
    objective_default = ray.get(run_default.remote(problem.default_configuration))
else:
    if not(ray.is_initialized()):
        ray.init(num_cpus=1, log_to_driver=False)
    run_default = perf_run
    objective_default = run_default(problem.default_configuration)

print(f"Accuracy Default Configuration:  {objective_default:.3f}")

2025-08-18 14:44:20,921 INFO worker.py:1843 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
Accuracy Default Configuration:  0.863

Define the evaluator object#

The Evaluator object allows to change the parallelization backend used by DeepHyper. It is a standalone object which schedules the execution of remote tasks. All evaluators needs a run_function to be instantiated. Then a keyword method defines the backend (e.g., "ray") and the method_kwargs corresponds to keyword arguments of this chosen method.

evaluator = Evaluator.create(run_function, method, method_kwargs)

Once created the evaluator.num_workers gives access to the number of available parallel workers.

Finally, to submit and collect tasks to the evaluator one just needs to use the following interface:

configs = [...]
evaluator.submit(configs)
...
tasks_done = evaluator.get("BATCH", size=1) # For asynchronous
tasks_done = evaluator.get("ALL") # For batch synchronous

Warning

Each Evaluator saves its own state, therefore it is crucial to create a new evaluator when launching a fresh search.

from deephyper.evaluator import Evaluator
from deephyper.evaluator.callback import TqdmCallback

def get_evaluator(run_function):
    # Default arguments for Ray: 1 worker and 1 worker per evaluation
    method_kwargs = {
        "num_cpus": 1,
        "num_cpus_per_task": 1,
        "callbacks": [TqdmCallback()]
    }

    # If GPU devices are detected then it will create 'n_gpus' workers
    # and use 1 worker for each evaluation
    if is_gpu_available:
        method_kwargs["num_cpus"] = n_gpus
        method_kwargs["num_gpus"] = n_gpus
        method_kwargs["num_cpus_per_task"] = 1
        method_kwargs["num_gpus_per_task"] = 1

    evaluator = Evaluator.create(
        run_function,
        method="ray",
        method_kwargs=method_kwargs
    )
    print(f"Created new evaluator with {evaluator.num_workers} worker{'s' if evaluator.num_workers > 1 else ''} and config: {method_kwargs}", )

    return evaluator

evaluator = get_evaluator(quick_run)

Created new evaluator with 1 worker and config: {'num_cpus': 1, 'num_cpus_per_task': 1, 'callbacks': [<deephyper.evaluator.callback.TqdmCallback object at 0x3a52a0b00>]}

Define and run the Centralized Bayesian Optimization search (CBO)#

We create the CBO using the problem and evaluator defined above.

from deephyper.hpo import CBO

Instanciate the search with the problem and a specific evaluator

search = CBO(problem)

Results file already exists, it will be renamed to /Users/rp5/Documents/DeepHyper/deephyper/examples/examples_hpo/results_20250818-144426.csv

Note

All DeepHyper’s search algorithm have two stopping criteria:

max_evals (int): Defines the maximum number of evaluations that we want to perform. Default to -1 for an infinite number.
timeout (int): Defines a time budget (in seconds) before stopping the search. Default to None for an infinite time budget.

results = search.search(evaluator, max_evals=30)

  0%|          | 0/30 [00:00<?, ?it/s]
  3%|▎         | 1/30 [00:00<00:00, 5398.07it/s, failures=0, objective=0.385]
  7%|▋         | 2/30 [00:02<00:32,  1.16s/it, failures=0, objective=0.385]
  7%|▋         | 2/30 [00:02<00:32,  1.16s/it, failures=0, objective=0.602]
 10%|█         | 3/30 [00:06<01:07,  2.49s/it, failures=0, objective=0.602]
 10%|█         | 3/30 [00:06<01:07,  2.49s/it, failures=0, objective=0.602]
 13%|█▎        | 4/30 [00:08<00:57,  2.21s/it, failures=0, objective=0.602]
 13%|█▎        | 4/30 [00:08<00:57,  2.21s/it, failures=0, objective=0.69]
 17%|█▋        | 5/30 [00:09<00:48,  1.93s/it, failures=0, objective=0.69]
 17%|█▋        | 5/30 [00:09<00:48,  1.93s/it, failures=0, objective=0.69]
 20%|██        | 6/30 [00:12<00:51,  2.15s/it, failures=0, objective=0.69]
 20%|██        | 6/30 [00:12<00:51,  2.15s/it, failures=0, objective=0.74]
 23%|██▎       | 7/30 [00:14<00:51,  2.25s/it, failures=0, objective=0.74]
 23%|██▎       | 7/30 [00:14<00:51,  2.25s/it, failures=0, objective=0.74]
 27%|██▋       | 8/30 [00:16<00:45,  2.08s/it, failures=0, objective=0.74]
 27%|██▋       | 8/30 [00:16<00:45,  2.08s/it, failures=0, objective=0.74]
 30%|███       | 9/30 [00:22<01:11,  3.38s/it, failures=0, objective=0.74]
 30%|███       | 9/30 [00:22<01:11,  3.38s/it, failures=0, objective=0.816]
 33%|███▎      | 10/30 [00:31<01:41,  5.09s/it, failures=0, objective=0.816]
 33%|███▎      | 10/30 [00:31<01:41,  5.09s/it, failures=0, objective=0.816]
 37%|███▋      | 11/30 [00:42<02:07,  6.73s/it, failures=0, objective=0.816]
 37%|███▋      | 11/30 [00:42<02:07,  6.73s/it, failures=0, objective=0.82]
 40%|████      | 12/30 [00:48<02:00,  6.69s/it, failures=0, objective=0.82]
 40%|████      | 12/30 [00:48<02:00,  6.69s/it, failures=0, objective=0.82]
 43%|████▎     | 13/30 [00:59<02:14,  7.90s/it, failures=0, objective=0.82]
 43%|████▎     | 13/30 [00:59<02:14,  7.90s/it, failures=0, objective=0.82]
 47%|████▋     | 14/30 [01:08<02:11,  8.20s/it, failures=0, objective=0.82]
 47%|████▋     | 14/30 [01:08<02:11,  8.20s/it, failures=0, objective=0.82]
 50%|█████     | 15/30 [01:15<01:56,  7.76s/it, failures=0, objective=0.82]
 50%|█████     | 15/30 [01:15<01:56,  7.76s/it, failures=0, objective=0.82]
 53%|█████▎    | 16/30 [01:22<01:45,  7.53s/it, failures=0, objective=0.82]
 53%|█████▎    | 16/30 [01:22<01:45,  7.53s/it, failures=0, objective=0.82]
 57%|█████▋    | 17/30 [01:32<01:49,  8.41s/it, failures=0, objective=0.82]
 57%|█████▋    | 17/30 [01:32<01:49,  8.41s/it, failures=0, objective=0.82]
 60%|██████    | 18/30 [01:41<01:42,  8.56s/it, failures=0, objective=0.82]
 60%|██████    | 18/30 [01:41<01:42,  8.56s/it, failures=0, objective=0.82]
 63%|██████▎   | 19/30 [01:45<01:18,  7.11s/it, failures=0, objective=0.82]
 63%|██████▎   | 19/30 [01:45<01:18,  7.11s/it, failures=0, objective=0.82]
 67%|██████▋   | 20/30 [01:54<01:18,  7.86s/it, failures=0, objective=0.82]
 67%|██████▋   | 20/30 [01:54<01:18,  7.86s/it, failures=0, objective=0.82]
 70%|███████   | 21/30 [02:00<01:03,  7.08s/it, failures=0, objective=0.82]
 70%|███████   | 21/30 [02:00<01:03,  7.08s/it, failures=0, objective=0.82]
 73%|███████▎  | 22/30 [02:05<00:52,  6.62s/it, failures=0, objective=0.82]
 73%|███████▎  | 22/30 [02:05<00:52,  6.62s/it, failures=0, objective=0.82]
 77%|███████▋  | 23/30 [02:11<00:44,  6.39s/it, failures=0, objective=0.82]
 77%|███████▋  | 23/30 [02:11<00:44,  6.39s/it, failures=0, objective=0.82]
 80%|████████  | 24/30 [02:17<00:38,  6.35s/it, failures=0, objective=0.82]
 80%|████████  | 24/30 [02:17<00:38,  6.35s/it, failures=0, objective=0.82]
 83%|████████▎ | 25/30 [02:25<00:33,  6.65s/it, failures=0, objective=0.82]
 83%|████████▎ | 25/30 [02:25<00:33,  6.65s/it, failures=0, objective=0.82]
 87%|████████▋ | 26/30 [02:35<00:31,  7.81s/it, failures=0, objective=0.82]
 87%|████████▋ | 26/30 [02:35<00:31,  7.81s/it, failures=0, objective=0.82]
 90%|█████████ | 27/30 [02:41<00:21,  7.21s/it, failures=0, objective=0.82]
 90%|█████████ | 27/30 [02:41<00:21,  7.21s/it, failures=0, objective=0.82]
 93%|█████████▎| 28/30 [02:46<00:13,  6.69s/it, failures=0, objective=0.82]
 93%|█████████▎| 28/30 [02:46<00:13,  6.69s/it, failures=0, objective=0.82]
 97%|█████████▋| 29/30 [02:52<00:06,  6.25s/it, failures=0, objective=0.82]
 97%|█████████▋| 29/30 [02:52<00:06,  6.25s/it, failures=0, objective=0.82]
100%|██████████| 30/30 [02:56<00:00,  5.62s/it, failures=0, objective=0.82]
100%|██████████| 30/30 [02:56<00:00,  5.62s/it, failures=0, objective=0.82]
100%|██████████| 30/30 [02:56<00:00,  5.88s/it, failures=0, objective=0.82]

The returned results is a Pandas Dataframe where columns are hyperparameters and information stored by the evaluator:

job_id is a unique identifier corresponding to the order of creation of tasks
objective is the value returned by the run-function
timestamp_submit is the time (in seconds) when the hyperparameter configuration was submitted by the Evaluator relative to the creation of the evaluator.
timestamp_gather is the time (in seconds) when the hyperparameter configuration was collected by the Evaluator relative to the creation of the evaluator.

results

	p:batch_size	p:learning_rate	p:num_epochs	objective	job_id	job_status	m:timestamp_submit	m:timestamp_gather
0	245	1.631048	9	0.385000	0	DONE	0.806100	3.504088
1	56	1.112981	16	0.601667	1	DONE	3.526653	5.828167
2	23	0.143010	16	0.394524	2	DONE	5.839790	10.173068
3	264	5.976338	17	0.690000	3	DONE	10.184379	11.913512
4	136	1.760105	8	0.425714	4	DONE	11.925383	13.314167
5	82	2.249590	20	0.739524	5	DONE	13.325365	15.923063
6	53	0.913211	14	0.563095	6	DONE	15.934397	18.385043
7	283	3.305364	19	0.590714	7	DONE	18.615469	20.076355
8	17	2.536505	20	0.816429	8	DONE	20.309002	26.365440
9	11	2.086957	20	0.815714	9	DONE	26.609885	35.309791
10	9	8.421344	20	0.820238	10	DONE	35.544787	45.785644
11	16	0.911853	20	0.746905	11	DONE	46.023974	52.367163
12	9	4.070022	20	0.803571	12	DONE	52.705372	63.077662
13	11	2.122347	20	0.809286	13	DONE	63.321248	71.979538
14	16	6.979859	20	0.807143	14	DONE	72.221648	78.707475
15	15	1.276222	20	0.770476	15	DONE	78.946452	85.686414
16	9	1.361241	20	0.800476	16	DONE	85.925403	96.158468
17	11	2.886584	20	0.807619	17	DONE	96.397284	105.057524
18	36	0.523676	20	0.582857	18	DONE	105.291716	108.786150
19	10	0.765039	20	0.746905	19	DONE	109.025325	118.416570
20	22	7.267154	20	0.818095	20	DONE	118.738970	123.678590
21	20	9.786512	20	0.806190	21	DONE	123.911904	129.223925
22	19	8.426698	20	0.805714	22	DONE	129.460490	135.063884
23	17	6.601954	20	0.816190	23	DONE	135.300321	141.337653
24	14	8.959608	20	0.798571	24	DONE	141.570983	148.691888
25	9	8.727873	20	0.815952	25	DONE	148.930730	159.216285
26	19	6.263893	20	0.799762	26	DONE	159.453678	165.011176
27	21	8.539551	20	0.814524	27	DONE	165.337578	170.484569
28	22	1.172272	20	0.751429	28	DONE	170.722989	175.696918
29	30	5.805169	20	0.796190	29	DONE	175.931295	179.869023

Evaluate the best configuration#

Now that the search is over, let us print the best configuration found during this run and evaluate it on the full training dataset.

i_max = results.objective.argmax()
best_config = results.iloc[i_max][:-3].to_dict()
best_config = {k[2:]: v for k, v in best_config.items() if k.startswith("p:")}

print(f"The default configuration has an accuracy of {objective_default:.3f}. \n"
      f"The best configuration found by DeepHyper has an accuracy {results['objective'].iloc[i_max]:.3f}, \n"
      f"finished after {results['m:timestamp_gather'].iloc[i_max]:.2f} secondes of search.\n")

print(json.dumps(best_config, indent=4))

The default configuration has an accuracy of 0.863.
The best configuration found by DeepHyper has an accuracy 0.820,
finished after 45.79 secondes of search.

{
    "batch_size": 9,
    "learning_rate": 8.421343891942513,
    "num_epochs": 20
}

objective_best = perf_run(best_config)
print(f"Accuracy Best Configuration:  {objective_best:.3f}")

Accuracy Best Configuration:  0.807

Total running time of the script: (3 minutes 47.673 seconds)

Download Jupyter notebook: plot_hpo_text_classification.ipynb

Download Python source code: plot_hpo_text_classification.py

Download zipped: plot_hpo_text_classification.zip

Gallery generated by Sphinx-Gallery

Hyperparameter search for text classification

Contents

Hyperparameter search for text classification#

Imports#

The dataset#

Preprocessing pipelines and Batch generation#

Define the model#

Define functions to train the model and evaluate results.#

Define the run-function#

Define the Hyperparameter optimization problem#

Evaluate a default configuration#

Define the evaluator object#

Define and run the Centralized Bayesian Optimization search (CBO)#

Evaluate the best configuration#