ErrorUtils¶
- class mastml.error_analysis.ErrorUtils[source]¶
Bases:
object
Collection of functions to conduct error analysis on certain types of models (uncertainty quantification), and prepare residual and model error data for plotting, as well as recalibrate model errors with various methods
- Args:
None
- Methods:
- _collect_error_data: method to collect all residuals, model errors, and dataset standard deviation over many data splits
- Args:
savepath: (str), string denoting the path to save output to
data_type: (str), string denoting the data type analyzed, e.g. train, test, leftout
- Returns:
model_errors: (pd.Series), series containing the predicted model errors
residuals: (pd.Series), series containing the true model errors (residuals)
dataset_stdev: (float), standard deviation of the data set
- _recalibrate_errors: method to recalibrate the model errors using negative log likelihood function from work of Palmer et al.
- Args:
model_errors: (pd.Series), series containing the predicted (uncalibrated) model errors
residuals: (pd.Series), series containing the true model errors (residuals)
- Returns:
model_errors: (pd.Series), series containing the predicted (calibrated) model errors
a: (float), the slope of the recalibration linear fit
b: (float), the intercept of the recalibration linear fit
- _parse_error_data: method to prepare the provided residuals and model errors for plotting the binned RvE (residual vs error) plots
- Args:
model_errors: (pd.Series), series containing the predicted model errors
residuals: (pd.Series), series containing the true model errors (residuals)
dataset_stdev: (float), standard deviation of the data set
number_of_bins: (int), the number of bins to digitize the data into for making the RvE (residual vs. error) plot
- Returns:
bin_values: (np.array), the x-axis of the RvE plot: reduced model error values digitized into bins
rms_residual_values: (np.array), the y-axis of the RvE plot: the RMS of the residual values digitized into bins
num_values_per_bin: (np.array), the number of data samples in each bin
number_of_bins: (int), the number of bins to put the model error and residual data into.
- _get_model_errors: method for generating the model error values using either the standard deviation of weak learners or jackknife-after-bootstrap method of Wager et al.
- Args:
model: (mastml.models object), a MAST-ML model, e.g. SklearnModel or EnsembleModel
X: (pd.DataFrame), dataframe of the X feature matrix
X_train: (pd.DataFrame), dataframe of the X training data feature matrix
X_test: (pd.DataFrame), dataframe of the X test data feature matrix
error_method: (str), string denoting the UQ error method to use. Viable options are ‘stdev_weak_learners’ and ‘jackknife_after_bootstrap’
remove_outlier_learners: (bool), whether specific weak learners that are found to deviate from 3 sigma of the average prediction for a given data point are removed (Default False)
- Returns:
model_errors: (pd.Series), series containing the predicted model errors
num_removed_learners: (list), list of number of removed weak learners for each data point
- _remove_outlier_preds: method to flag and remove outlier weak learner predictions
- Args:
preds: (list), list of predicted values of a given data point from an ensemble of weak learners
- Returns:
preds_cleaned: (list), ammended list of predicted values of a given data point from an ensemble of weak learners, with predictions from outlier learners removed
num_outliers: (int), the number of removed weak learners for the data point evaluated