sample_learning_curve

mastml.learning_curve.sample_learning_curve(X, y, estimator, cv, scoring, Xgroups=None)[source]

Method that calculates data used to plot a sample learning curve, e.g. the RMSE of a cross-validation routine using a specified model and a given fraction of the total training data

Args:

X: (numpy array), array of X data values

y: (numpy array), array of y data values

estimator: (scikit-learn model object), a scikit-learn model used for fitting

cv: (scikit-learn cross validation object), a scikit-learn cross validation object to construct train/test splits

scoring: (scikit-learn metric object), a scikit-learn metric to use as a scorer

Xgroups: (list), list of row indices corresponding to each group

Returns:

train_sizes: (numpy array), array of fractions of training data used in learning curve

train_mean: (numpy array), array of means of training data scores for each training data fraction

test_mean: (numpy array), array of means of testing data scores for each training data fraction

train_stdev: (numpy array), array of standard deviations of training data scores for each training data fraction

test_stdev: (numpy array), array of standard deviations of testing data scores for each training data fraction