deephyper.skopt.utils.RandomForestRegressor

deephyper.skopt.utils.RandomForestRegressor#

class deephyper.skopt.utils.RandomForestRegressor(*args: Any, **kwargs: Any)[source]#

Bases: ForestRegressor

RandomForestRegressor that supports conditional std computation.

Parameters:
  • n_estimators (integer, optional (default=10)) – The number of trees in the forest.

  • criterion (string, optional (default="mse")) – The function to measure the quality of a split. Supported criteria are “mse” for the mean squared error, which is equal to variance reduction as feature selection criterion, and “mae” for the mean absolute error.

  • max_features (int, float, string or None, optional (default="1.0")) –

    The number of features to consider when looking for the best split:

    • If int, then consider max_features features at each split.

    • If float, then max_features is a percentage and int(max_features * n_features) features are considered at each split.

    • If “sqrt”, then max_features=sqrt(n_features).

    • If “log2”, then max_features=log2(n_features).

    • If None, then max_features=n_features.

    Note

    The search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

  • max_depth ((e.g.) – The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

  • min_samples_split (int, float, optional (default=2)) –

    The minimum number of samples required to split an internal node:

    • If int, then consider min_samples_split as the minimum number.

    • If float, then min_samples_split is a percentage and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.

  • min_samples_leaf (int, float, optional (default=1)) –

    The minimum number of samples required to be at a leaf node:

    • If int, then consider min_samples_leaf as the minimum number.

    • If float, then min_samples_leaf is a percentage and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.

  • min_weight_fraction_leaf (float, optional (default=0.)) – The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.

  • max_leaf_nodes (int or None, optional (default=None)) – Grow trees with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.

  • min_impurity_decrease (float, optional (default=0.)) –

    A node will be split if this split induces a decrease of the impurity greater than or equal to this value. The weighted impurity decrease equation is the following:

    N_t / N * (impurity - N_t_R / N_t * right_impurity
                        - N_t_L / N_t * left_impurity)
    

    where N is the total number of samples, N_t is the number of samples at the current node, N_t_L is the number of samples in the left child, and N_t_R is the number of samples in the right child. N, N_t, N_t_R and N_t_L all refer to the weighted sum, if sample_weight is passed.

  • bootstrap (boolean, optional (default=True)) – Whether bootstrap samples are used when building trees.

  • oob_score (float) – whether to use out-of-bag samples to estimate the R^2 on unseen data.

  • n_jobs (integer, optional (default=1)) – The number of jobs to run in parallel for both fit and predict. If -1, then the number of jobs is set to the number of cores.

  • random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

  • verbose (int, optional (default=0)) – Controls the verbosity of the tree building process.

  • warm_start (bool, optional (default=False)) – When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest.

  • Attributes

  • ----------

  • estimators (list of DecisionTreeRegressor) – The collection of fitted sub-estimators.

  • feature_importances (array of shape = [n_features]) – The feature importances (the higher, the more important the feature).

  • n_features (int) – The number of features when fit is performed.

  • n_outputs (int) – The number of outputs when fit is performed.

  • oob_score – Score of the training dataset obtained using an out-of-bag estimate.

  • oob_prediction (array of shape = [n_samples]) – Prediction computed with out-of-bag estimate on the training set.

  • Notes

  • -----

  • trees (The default values for the parameters controlling the size of the)

  • max_depth

  • min_samples_leaf

  • and (etc.) lead to fully grown)

  • To (unpruned trees which can potentially be very large on some data sets.)

  • consumption (reduce memory)

  • be (the complexity and size of the trees should)

  • values. (controlled by setting those parameter)

  • Therefore (The features are always randomly permuted at each split.)

:param : :param the best found split may vary: :param even with the same training data: :param : :param max_features=n_features and bootstrap=False: :param if the improvement: :param of the criterion is identical for several splits enumerated during the: :param search of the best split. To obtain a deterministic behaviour during: :param fitting: :param random_state has to be fixed.: :param References: :param ———-: :param .. [1] L. Breiman: :param “Random Forests”: :param Machine Learning: :param 45(1): :param 5-32: :param 2001.:

Methods

predict

Predict continuous output for X.

__call__(*args: Any, **kwargs: Any) Any#

Call self as a function.

predict(X, return_std=False, disentangled_std=False)[source]#

Predict continuous output for X.

Args: X : array of shape = (n_samples, n_features)

Input data.

return_stdboolean

Whether or not to return the standard deviation.

Returns: predictions : array-like of shape = (n_samples,)

Predicted values for X. If criterion is set to “mse”, then predictions[i] ~= mean(y | X[i]).

stdarray-like of shape=(n_samples,)

Standard deviation of y at X. If criterion is set to “mse”, then std[i] ~= std(y | X[i]).

disentangled_std : the std is returned disentangled between aleatoric and epistemic.