deephyper.skopt.utils.RandomForestRegressor#

class deephyper.skopt.utils.RandomForestRegressor(*args: Any, **kwargs: Any)[source]#

Bases: ForestRegressor

RandomForestRegressor that supports conditional standard deviation computation.

Parameters:
  • n_estimators (int, optional) – The number of trees in the forest. Defaults to 100.

  • criterion (str, optional) – The function to measure the quality of a split. Supported criteria are: - "mse": mean squared error (variance reduction) - "mae": mean absolute error Defaults to "mse".

  • max_features (int | float | str | None, optional) –

    The number of features to consider when looking for the best split. Defaults to "1.0".

    • If int, consider max_features features at each split.

    • If float, treat as a percentage: int(max_features * n_features).

    • If "sqrt", use sqrt(n_features).

    • If "log2", use log2(n_features).

    • If None, use all features.

    Note

    The search for a split does not stop until at least one valid partition of the node samples is found, even if this requires inspecting more than max_features features.

  • max_depth (int | None, optional) – Maximum depth of the tree. If None, nodes expand until all leaves are pure or contain fewer than min_samples_split samples. Defaults to`` None``.

  • min_samples_split (int | float, optional) –

    Minimum number of samples required to split an internal node. Defaults to 2.

    • If int, use the exact number.

    • If float, interpret as a percentage:

    ceil(min_samples_split * n_samples).

  • min_samples_leaf (int | float, optional) –

    Minimum number of samples required at a leaf node. Defaults to 1.

    • If int, use the exact number.

    • If float, interpret as a percentage: ceil(min_samples_leaf * n_samples).

  • min_weight_fraction_leaf (float, optional) – Minimum weighted fraction of the total sample weight required at a leaf node. Defaults to 0.0.

  • max_leaf_nodes (int | None, optional) – Grow trees with at most max_leaf_nodes in best-first fashion. If None, unlimited. Defaults to None.

  • min_impurity_decrease (float, optional) –

    A node will be split if the impurity decrease is greater than or equal to this value. Defaults to 0.0.

    Weighted impurity decrease:

    N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)
    

    where N is total weighted samples, N_t samples at current node, N_t_L left child, and N_t_R right child.

  • bootstrap (bool, optional) – Whether bootstrap samples are used when building trees. Defaults to True.

  • oob_score (bool, optional) – Whether to use out-of-bag samples to estimate R² on unseen data. Defaults to False.

  • n_jobs (int, optional) – Number of parallel jobs for fit and predict. If -1, use all cores. Defaults to 1.

  • random_state (int | RandomState | None, optional) – Seed or random number generator. Defaults to None.

  • verbose (int, optional) – Verbosity level of the tree-building process. Defaults to 0.

  • warm_start (bool, optional) – If True, reuse solution from previous call to fit and add more estimators. Otherwise fit a new forest. Defaults to False.

  • splitter (str) – The splitter strategy in ["random", "best"]. Defaults to "best".

estimators_#

Fitted sub-estimators.

Type:

list[DecisionTreeRegressor]

feature_importances_#

Feature importances, shape (n_features,).

Type:

ndarray

n_features_#

Number of features at fit time.

Type:

int

n_outputs_#

Number of outputs at fit time.

Type:

int

oob_score_#

Out-of-bag R² score.

Type:

float

oob_prediction_#

Out-of-bag predictions, shape (n_samples,).

Type:

ndarray

Notes

The default hyperparameters (e.g., max_depth, min_samples_leaf) result in fully grown, unpruned trees, which may become large in memory. Consider adjusting these values to reduce complexity.

Features are always randomly permuted at each split. Therefore, the best split may vary even with identical training data, max_features=n_features, and bootstrap=False. To ensure deterministic behavior, set random_state.

References

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32.

Methods

predict

Predict continuous output for X.

__call__(*args: Any, **kwargs: Any) Any#

Call self as a function.

predict(X, return_std=False, disentangled_std=False)[source]#

Predict continuous output for X.

Args: X : array of shape = (n_samples, n_features)

Input data.

return_stdboolean

Whether or not to return the standard deviation.

Returns: predictions : array-like of shape = (n_samples,)

Predicted values for X. If criterion is set to “mse”, then predictions[i] ~= mean(y | X[i]).

stdarray-like of shape=(n_samples,)

Standard deviation of y at X. If criterion is set to “mse”, then std[i] ~= std(y | X[i]).

disentangled_std : the std is returned disentangled between aleatoric and epistemic.