deephyper.ensemble
deephyper.ensemble¶
The ensemble
module provides a way to build ensembles of checkpointed deep neural networks from tensorflow.keras
, with .h5
format, to regularize and boost predictive performance as well as estimate better uncertainties.
-
class
deephyper.ensemble.
BaggingEnsembleClassifier
(model_dir, loss=<function mse>, size=5, verbose=True, ray_address='', num_cpus=1, num_gpus=None, selection='topk')[source]¶ Bases:
deephyper.ensemble._bagging_ensemble.BaggingEnsemble
Ensemble for classification based on uniform averaging of the predictions of each members.
- Parameters
model_dir (str) – Path to directory containing saved Keras models in .h5 format.
loss (callable) – a callable taking (y_true, y_pred) as input.
size (int, optional) – Number of unique models used in the ensemble. Defaults to 5.
verbose (bool, optional) – Verbose mode. Defaults to True.
ray_address (str, optional) – Address of the Ray cluster. If “auto” it will try to connect to an existing cluster. If “” it will start a local Ray cluster. Defaults to “”.
num_cpus (int, optional) – Number of CPUs allocated to load one model and predict. Defaults to 1.
num_gpus (int, optional) – Number of GPUs allocated to load one model and predict. Defaults to None.
batch_size (int, optional) – Batch size used batchify the inference of loaded models. Defaults to 32.
selection (str, optional) – Selection strategy to build the ensemble. Value in
["topk"]
. Default totopk
.
-
evaluate
(X, y, metrics=None)¶ Compute metrics based on the provided data.
- Parameters
X (array) – An array of input data.
y (array) – An array of true output data.
metrics (callable, optional) – A metric. Defaults to None.
-
fit
(X, y)¶ Fit the current algorithm to the provided data.
- Parameters
X (array) – The input data.
y (array) – The output data.
- Returns
The current fitted instance.
- Return type
-
load
(file: str) → None¶ Load an ensemble from a save.
- Parameters
file (str) – Path to the save of the ensemble.
-
load_members_files
(file: str = 'ensemble.json') → None¶ Load the members composing an ensemble.
- Parameters
file (str, optional) – Path of JSON file containing the ensemble members. All members needs to be accessible in
model_dir
. Defaults to “ensemble.json”.
-
predict
(X) → numpy.ndarray¶ Execute an inference of the ensemble for the provided data.
- Parameters
X (array) – An array of input data.
- Returns
The prediction.
- Return type
array
-
class
deephyper.ensemble.
BaggingEnsembleRegressor
(model_dir, loss=<function mse>, size=5, verbose=True, ray_address='', num_cpus=1, num_gpus=None, selection='topk')[source]¶ Bases:
deephyper.ensemble._bagging_ensemble.BaggingEnsemble
Ensemble for regression based on uniform averaging of the predictions of each members.
- Parameters
model_dir (str) – Path to directory containing saved Keras models in .h5 format.
loss (callable) – a callable taking (y_true, y_pred) as input.
size (int, optional) – Number of unique models used in the ensemble. Defaults to 5.
verbose (bool, optional) – Verbose mode. Defaults to True.
ray_address (str, optional) – Address of the Ray cluster. If “auto” it will try to connect to an existing cluster. If “” it will start a local Ray cluster. Defaults to “”.
num_cpus (int, optional) – Number of CPUs allocated to load one model and predict. Defaults to 1.
num_gpus (int, optional) – Number of GPUs allocated to load one model and predict. Defaults to None.
batch_size (int, optional) – Batch size used batchify the inference of loaded models. Defaults to 32.
selection (str, optional) – Selection strategy to build the ensemble. Value in
["topk"]
. Default totopk
.
-
evaluate
(X, y, metrics=None)¶ Compute metrics based on the provided data.
- Parameters
X (array) – An array of input data.
y (array) – An array of true output data.
metrics (callable, optional) – A metric. Defaults to None.
-
fit
(X, y)¶ Fit the current algorithm to the provided data.
- Parameters
X (array) – The input data.
y (array) – The output data.
- Returns
The current fitted instance.
- Return type
-
load
(file: str) → None¶ Load an ensemble from a save.
- Parameters
file (str) – Path to the save of the ensemble.
-
load_members_files
(file: str = 'ensemble.json') → None¶ Load the members composing an ensemble.
- Parameters
file (str, optional) – Path of JSON file containing the ensemble members. All members needs to be accessible in
model_dir
. Defaults to “ensemble.json”.
-
predict
(X) → numpy.ndarray¶ Execute an inference of the ensemble for the provided data.
- Parameters
X (array) – An array of input data.
- Returns
The prediction.
- Return type
array
-
class
deephyper.ensemble.
BaseEnsemble
(model_dir, loss, size=5, verbose=True, ray_address='', num_cpus=1, num_gpus=None, batch_size=32)[source]¶ Bases:
abc.ABC
Base class for ensembles, every new ensemble algorithms needs to extend this class.
- Parameters
model_dir (str) – Path to directory containing saved Keras models in .h5 format.
loss (callable) – a callable taking (y_true, y_pred) as input.
size (int, optional) – Number of unique models used in the ensemble. Defaults to 5.
verbose (bool, optional) – Verbose mode. Defaults to True.
ray_address (str, optional) – Address of the Ray cluster. If “auto” it will try to connect to an existing cluster. If “” it will start a local Ray cluster. Defaults to “”.
num_cpus (int, optional) – Number of CPUs allocated to load one model and predict. Defaults to 1.
num_gpus (int, optional) – Number of GPUs allocated to load one model and predict. Defaults to None.
batch_size (int, optional) – Batch size used batchify the inference of loaded models. Defaults to 32.
-
abstract
evaluate
(X, y, metrics=None)[source]¶ Compute metrics based on the provided data.
- Parameters
X (array) – An array of input data.
y (array) – An array of true output data.
metrics (callable, optional) – A metric. Defaults to None.
-
abstract
fit
(X, y)[source]¶ Fit the current algorithm to the provided data.
- Parameters
X (array) – The input data.
y (array) – The output data.
- Returns
The current fitted instance.
- Return type
-
load
(file: str) → None[source]¶ Load an ensemble from a save.
- Parameters
file (str) – Path to the save of the ensemble.
-
load_members_files
(file: str = 'ensemble.json') → None[source]¶ Load the members composing an ensemble.
- Parameters
file (str, optional) – Path of JSON file containing the ensemble members. All members needs to be accessible in
model_dir
. Defaults to “ensemble.json”.
-
abstract
predict
(X)[source]¶ Execute an inference of the ensemble for the provided data.
- Parameters
X (array) – An array of input data.
- Returns
The prediction.
- Return type
array
-
class
deephyper.ensemble.
UQBaggingEnsembleClassifier
(model_dir, loss=<function cce>, size=5, verbose=True, ray_address='', num_cpus=1, num_gpus=None, batch_size=32, selection='topk')[source]¶ Bases:
deephyper.ensemble._uq_bagging_ensemble.UQBaggingEnsemble
Ensemble with uncertainty quantification for classification based on uniform averaging of the predictions of each members.
- Parameters
model_dir (str) – Path to directory containing saved Keras models in .h5 format.
loss (callable) – a callable taking (y_true, y_pred) as input.
size (int, optional) – Number of unique models used in the ensemble. Defaults to 5.
verbose (bool, optional) – Verbose mode. Defaults to True.
ray_address (str, optional) – Address of the Ray cluster. If “auto” it will try to connect to an existing cluster. If “” it will start a local Ray cluster. Defaults to “”.
num_cpus (int, optional) – Number of CPUs allocated to load one model and predict. Defaults to 1.
num_gpus (int, optional) – Number of GPUs allocated to load one model and predict. Defaults to None.
batch_size (int, optional) – Batch size used batchify the inference of loaded models. Defaults to 32.
selection (str, optional) – Selection strategy to build the ensemble. Value in
[["topk", "caruana", "friedman"]
. Default totopk
.
-
evaluate
(X, y, metrics=None, scaler_y=None)¶ Compute metrics based on the provided data.
- Parameters
X (array) – An array of input data.
y (array) – An array of true output data.
metrics (callable, optional) – A metric. Defaults to None.
-
fit
(X, y)¶ Fit the current algorithm to the provided data.
- Parameters
X (array) – The input data.
y (array) – The output data.
- Returns
The current fitted instance.
- Return type
-
load
(file: str) → None¶ Load an ensemble from a save.
- Parameters
file (str) – Path to the save of the ensemble.
-
load_members_files
(file: str = 'ensemble.json') → None¶ Load the members composing an ensemble.
- Parameters
file (str, optional) – Path of JSON file containing the ensemble members. All members needs to be accessible in
model_dir
. Defaults to “ensemble.json”.
-
predict
(X) → numpy.ndarray¶ Execute an inference of the ensemble for the provided data.
- Parameters
X (array) – An array of input data.
- Returns
The prediction.
- Return type
array
-
class
deephyper.ensemble.
UQBaggingEnsembleRegressor
(model_dir, loss=<function nll>, size=5, verbose=True, ray_address='', num_cpus=1, num_gpus=None, batch_size=32, selection='topk')[source]¶ Bases:
deephyper.ensemble._uq_bagging_ensemble.UQBaggingEnsemble
Ensemble with uncertainty quantification for regression based on uniform averaging of the predictions of each members.
- Parameters
model_dir (str) – Path to directory containing saved Keras models in .h5 format.
loss (callable) – a callable taking (y_true, y_pred) as input.
size (int, optional) – Number of unique models used in the ensemble. Defaults to 5.
verbose (bool, optional) – Verbose mode. Defaults to True.
ray_address (str, optional) – Address of the Ray cluster. If “auto” it will try to connect to an existing cluster. If “” it will start a local Ray cluster. Defaults to “”.
num_cpus (int, optional) – Number of CPUs allocated to load one model and predict. Defaults to 1.
num_gpus (int, optional) – Number of GPUs allocated to load one model and predict. Defaults to None.
batch_size (int, optional) – Batch size used batchify the inference of loaded models. Defaults to 32.
selection (str, optional) – Selection strategy to build the ensemble. Value in
[["topk", "caruana", "friedman"]
. Default totopk
.
-
evaluate
(X, y, metrics=None, scaler_y=None)¶ Compute metrics based on the provided data.
- Parameters
X (array) – An array of input data.
y (array) – An array of true output data.
metrics (callable, optional) – A metric. Defaults to None.
-
fit
(X, y)¶ Fit the current algorithm to the provided data.
- Parameters
X (array) – The input data.
y (array) – The output data.
- Returns
The current fitted instance.
- Return type
-
load
(file: str) → None¶ Load an ensemble from a save.
- Parameters
file (str) – Path to the save of the ensemble.
-
load_members_files
(file: str = 'ensemble.json') → None¶ Load the members composing an ensemble.
- Parameters
file (str, optional) – Path of JSON file containing the ensemble members. All members needs to be accessible in
model_dir
. Defaults to “ensemble.json”.
-
predict
(X) → numpy.ndarray¶ Execute an inference of the ensemble for the provided data.
- Parameters
X (array) – An array of input data.
- Returns
The prediction.
- Return type
array
-
predict_var_decomposition
(X)[source]¶ Execute an inference of the ensemble for the provided data with uncertainty quantification estimates. The aleatoric uncertainty corresponds to the expected value of learned variance of each model composing the ensemble \(\mathbf{E}[\sigma_\theta^2(\mathbf{x})]\). The epistemic uncertainty corresponds to the variance of learned mean estimates of each model composing the ensemble \(\mathbf{V}[\mu_\theta(\mathbf{x})]\).
- Parameters
X (array) – An array of input data.
- Returns
where
y
is the mixture distribution,u1
is the aleatoric component of the variance ofy
andu2
is the epistemic component of the variance ofy
.- Return type
y, u1, u2