deephyper.skopt.utils.RandomForestRegressor#
- class deephyper.skopt.utils.RandomForestRegressor(*args: Any, **kwargs: Any)[source]#
Bases:
ForestRegressorRandomForestRegressor that supports conditional standard deviation computation.
- Parameters:
n_estimators (int, optional) – The number of trees in the forest. Defaults to
100.criterion (str, optional) – The function to measure the quality of a split. Supported criteria are: -
"mse": mean squared error (variance reduction) -"mae": mean absolute error Defaults to"mse".max_features (int | float | str | None, optional) –
The number of features to consider when looking for the best split. Defaults to
"1.0".If int, consider
max_featuresfeatures at each split.If float, treat as a percentage:
int(max_features * n_features).If
"sqrt", usesqrt(n_features).If
"log2", uselog2(n_features).If
None, use all features.
Note
The search for a split does not stop until at least one valid partition of the node samples is found, even if this requires inspecting more than
max_featuresfeatures.max_depth (int | None, optional) – Maximum depth of the tree. If None, nodes expand until all leaves are pure or contain fewer than min_samples_split samples. Defaults to`` None``.
min_samples_split (int | float, optional) –
Minimum number of samples required to split an internal node. Defaults to
2.If int, use the exact number.
If float, interpret as a percentage:
ceil(min_samples_split * n_samples).
min_samples_leaf (int | float, optional) –
Minimum number of samples required at a leaf node. Defaults to
1.If int, use the exact number.
If float, interpret as a percentage:
ceil(min_samples_leaf * n_samples).
min_weight_fraction_leaf (float, optional) – Minimum weighted fraction of the total sample weight required at a leaf node. Defaults to
0.0.max_leaf_nodes (int | None, optional) – Grow trees with at most
max_leaf_nodesin best-first fashion. If None, unlimited. Defaults toNone.min_impurity_decrease (float, optional) –
A node will be split if the impurity decrease is greater than or equal to this value. Defaults to 0.0.
Weighted impurity decrease:
N_t / N * (impurity - N_t_R / N_t * right_impurity - N_t_L / N_t * left_impurity)
where
Nis total weighted samples,N_tsamples at current node,N_t_Lleft child, andN_t_Rright child.bootstrap (bool, optional) – Whether bootstrap samples are used when building trees. Defaults to
True.oob_score (bool, optional) – Whether to use out-of-bag samples to estimate R² on unseen data. Defaults to
False.n_jobs (int, optional) – Number of parallel jobs for
fitandpredict. If -1, use all cores. Defaults to1.random_state (int | RandomState | None, optional) – Seed or random number generator. Defaults to
None.verbose (int, optional) – Verbosity level of the tree-building process. Defaults to
0.warm_start (bool, optional) – If
True, reuse solution from previous call tofitand add more estimators. Otherwise fit a new forest. Defaults toFalse.splitter (str) – The splitter strategy in
["random", "best"]. Defaults to"best".
- feature_importances_#
Feature importances, shape (n_features,).
- Type:
ndarray
- oob_prediction_#
Out-of-bag predictions, shape (n_samples,).
- Type:
ndarray
Notes
The default hyperparameters (e.g.,
max_depth,min_samples_leaf) result in fully grown, unpruned trees, which may become large in memory. Consider adjusting these values to reduce complexity.Features are always randomly permuted at each split. Therefore, the best split may vary even with identical training data,
max_features=n_features, andbootstrap=False. To ensure deterministic behavior, setrandom_state.References
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32.
Methods
Predict continuous output for X.
- predict(X, return_std=False, disentangled_std=False)[source]#
Predict continuous output for X.
Args: X : array of shape = (n_samples, n_features)
Input data.
- return_stdboolean
Whether or not to return the standard deviation.
Returns: predictions : array-like of shape = (n_samples,)
Predicted values for X. If criterion is set to “mse”, then predictions[i] ~= mean(y | X[i]).
- stdarray-like of shape=(n_samples,)
Standard deviation of y at X. If criterion is set to “mse”, then std[i] ~= std(y | X[i]).
disentangled_std : the std is returned disentangled between aleatoric and epistemic.