Hyperparameter Search for Machine Learning (Basic)

In this tutorial, we will show how to tune the hyperparameters of the Random Forest (RF) classifier in scikit-learn for the Airlines data set.

Let us start by creating a DeepHyper project and a problem for our application:

$ deephyper start-project dhproj
$ cd dhproj/dhproj/
$ deephyper new-problem hps rf_tuning
$ cd rf_tuning/

Create a script to test the accuracy of the baseline model:

def test_config(config):
    import numpy as np
    from sklearn.utils import check_random_state
    from sklearn.ensemble import RandomForestClassifier
    from deephyper.benchmark.datasets import airlines as dataset

    rs_data = np.random.RandomState(seed=42)

    ratio_test = 0.33
    ratio_valid = (1 - ratio_test) * 0.33

    train, valid, test = dataset.load_data(

    rs_classifier = check_random_state(42)

    classifier = RandomForestClassifier(n_jobs=8, random_state=rs_classifier, **config)

    acc_train = classifier.score(*train)
    acc_valid = classifier.score(*valid)
    acc_test = classifier.score(*test)

    print(f"Accuracy on Training: {acc_train:.3f}")
    print(f"Accuracy on Validation: {acc_valid:.3f}")
    print(f"Accuracy on Testing: {acc_test:.3f}")

Run the script and record the training, validation, and test accuracy as follows:

$ python -i test_config.py
>>> test_config({})

Running the script will give the the following output:

Accuracy on Training: 0.879
Accuracy on Validation: 0.621
Accuracy on Testing: 0.620

The accuracy values show that the RandomForest classifier with default hyperparameters results in overfitting and thus poor generalization (high accuracy on training data but not on the validation and test data).

Next, we optimize the hyperparameters of the RandomForest classifier to address the overfitting problem and improve the accuracy on the vaidation and test data. Create load_data.py file to load and return training and validation data:

import numpy as np

from sklearn.utils import resample
from deephyper.benchmark.datasets import airlines as dataset

def load_data():

    # In this case passing a random state is critical to make sure
    # that the same data are loaded all the time and that the test set
    # is not mixed with either the training or validation set.
    # It is important to not avoid setting a global seed for safety reasons.
    random_state = np.random.RandomState(seed=42)

    # Proportion of the test set on the full dataset
    ratio_test = 0.33

    # Proportion of the valid set on "dataset \ test set"
    # here we want the test and validation set to have same number of elements
    ratio_valid = (1 - ratio_test) * 0.33

    # The 3rd result is ignored with "_" because it corresponds to the test set
    # which is not interesting for us now.
    (X_train, y_train), (X_valid, y_valid), _, _ = dataset.load_data(

    # Uncomment the next line if you want to sub-sample the training data to speed-up
    # the search, "n_samples" controls the size of the new training data
    # X_train, y_train = resample(X_train, y_train, n_samples=int(1e4))

    print(f"X_train shape: {np.shape(X_train)}")
    print(f"y_train shape: {np.shape(y_train)}")
    print(f"X_valid shape: {np.shape(X_valid)}")
    print(f"y_valid shape: {np.shape(y_valid)}")
    return (X_train, y_train), (X_valid, y_valid)

if __name__ == "__main__":


Subsampling with X_train, y_train = resample(X_train, y_train, n_samples=int(1e4)) can be useful if you want to speed-up your search. By subsampling the training time will reduce.

To test this code:

$ python load_data.py

The expected output is:

X_train shape: (10000, 7)
y_train shape: (10000,)
X_valid shape: (119258, 7)
y_valid shape: (119258,)

Create model_run.py file to train and evaluate the RF model. This function has to return a scalar value (typically, validation accuracy), which will be maximized by the search algorithm.

from sklearn.utils import check_random_state
from sklearn.ensemble import RandomForestClassifier

from dhproj.rf_tuning.load_data import load_data

def run(config):

    rs = check_random_state(42)

    (X, y), (vX, vy) = load_data()

    classifier = RandomForestClassifier(n_jobs=8, random_state=rs, **config)
    classifier.fit(X, y)

    mean_accuracy = classifier.score(vX, vy)

    return mean_accuracy

Create problem.py to define the search space of hyperparameters for the RF model:

from deephyper.problem import HpProblem

Problem = HpProblem()

Problem.add_hyperparameter((10, 300), "n_estimators")
Problem.add_hyperparameter(["gini", "entropy"], "criterion")
Problem.add_hyperparameter((1, 50), "max_depth")
Problem.add_hyperparameter((2, 10), "min_samples_split")

# We define a starting point with the defaul hyperparameters from sklearn-learn
# that we consider good in average.
    n_estimators=100, criterion="gini", max_depth=50, min_samples_split=2

if __name__ == "__main__":

Run the problem.py with $ python problem.py in your shell. The output will be:

Configuration space object:
        criterion, Type: Categorical, Choices: {gini, entropy}, Default: gini
        max_depth, Type: UniformInteger, Range: [1, 50], Default: 26
        min_samples_split, Type: UniformInteger, Range: [2, 10], Default: 6
        n_estimators, Type: UniformInteger, Range: [10, 300], Default: 155

    Starting Point:
    {0: {'criterion': 'gini',
        'max_depth': 50,
        'min_samples_split': 2,
        'n_estimators': 100}}

Run the search for 20 model evaluations using the following command line:

$ deephyper hps ambs --problem dhproj.rf_tuning.problem.Problem --run dhproj.rf_tuning.model_run.run --max-evals 20 --evaluator subprocess --n-jobs 4

Once the search is over, the results.csv file contains the hyperparameters configurations evaluated during the search and their corresponding objective value (validation accuracy). Create test_best_config.py as given belwo. It will extract the best configuration from the results.csv and run RF with it.

import pandas as pd
from test_config import test_config

df = pd.read_csv("results.csv")
best_config = df.iloc[df.objective.argmax()][:-2].to_dict()

Compared to the default configuration, we can see the accuracy improvement in the validation and test data sets.

Accuracy on Training: 0.748
Accuracy on Validation: 0.666
Accuracy on Testing: 0.666