LearningCurve

class mastml.learning_curve.LearningCurve[source]

Bases: object

This class is used to construct learning curves, both in the form of model performance vs. amount of training data and model performance vs. number of features used in the fit.

Args:

None

Methods:
evaluate: Sets up a save directory and performs both the data and feature-based learning curves
Args:

model: (SklearnModel or EnsembleModel), a model made in MAST-ML

X: (pd.DataFrame), dataframe containing the X feature matrix

y: (pd.Series), series containing the target y data

savepath: (str), string denoting the savepath to save the learning curve output

groups: (pd.Series), series of group designation

train_sizes: (list or np.array), list or array of floats denoting fractions of training data to evaluate for data learning curve

cv: (scikit-learn cross-validation object), a scikit-learn cross-validation object

scoring: (str), string denoting name of regression metric to evaluate learning curves. See mastml.metrics.Metrics._metric_zoo for full list

selector: (mastml.feature_selector), a mastml.feature_selectors instance

make_plot: (bool), whether or not to make the learning curve plots

data_learning_curve: Method that calculates the model CV score as a function of amount of training data used
Args:

model: (SklearnModel or EnsembleModel), a model made in MAST-ML

X: (pd.DataFrame), dataframe containing the X feature matrix

y: (pd.Series), series containing the target y data

savepath: (str), string denoting the savepath to save the learning curve output

groups: (pd.Series), series of group designation

train_sizes: (list or np.array), list or array of floats denoting fractions of training data to evaluate for data learning curve

cv: (scikit-learn cross-validation object), a scikit-learn cross-validation object

scoring: (str), string denoting name of regression metric to evaluate learning curves. See mastml.metrics.Metrics._metric_zoo for full list

make_plot: (bool), whether or not to make the learning curve plots

Returns:

None

feature_learning_curve: Method that calculates the model CV score as a function of the number of features used
Args:

model: (SklearnModel or EnsembleModel), a model made in MAST-ML

X: (pd.DataFrame), dataframe containing the X feature matrix

y: (pd.Series), series containing the target y data

savepath: (str), string denoting the savepath to save the learning curve output

groups: (pd.Series), series of group designation

cv: (scikit-learn cross-validation object), a scikit-learn cross-validation object

scoring: (str), string denoting name of regression metric to evaluate learning curves. See mastml.metrics.Metrics._metric_zoo for full list

selector: (mastml.feature_selector), a mastml.feature_selectors instance

make_plot: (bool), whether or not to make the learning curve plots

Returns:

None

_setup_savedir: Method to create the output save directory for learning curve data
Args:

savepath: (str), string denoting the base path to save the output to

Returns:

splitdir: (str), path where learning curve data will be saved to

Methods Summary

data_learning_curve(model, X, y[, savepath, ...])

evaluate(model, X, y[, savepath, groups, ...])

feature_learning_curve(model, X, y[, ...])

Methods Documentation

data_learning_curve(model, X, y, savepath=None, groups=None, train_sizes=None, cv=None, scoring=None, make_plot=True)[source]
evaluate(model, X, y, savepath=None, groups=None, train_sizes=None, cv=None, scoring=None, selector=None, make_plot=True, make_new_dir=True)[source]
feature_learning_curve(model, X, y, savepath=None, groups=None, cv=None, scoring=None, selector=None, make_plot=True)[source]