feature_learning_curve

mastml.learning_curve.feature_learning_curve(X, y, estimator, cv, scoring, selector_name, savepath, n_features_to_select=None, Xgroups=None)[source]

Method that calculates data used to plot a feature learning curve, e.g. the RMSE of a cross-validation routine using a specified model and a given number of features

Args:

X: (numpy array), array of X data values

y: (numpy array), array of y data values

estimator: (scikit-learn model object), a scikit-learn model used for fitting

cv: (scikit-learn cross validation object), a scikit-learn cross validation object to construct train/test splits

scoring: (scikit-learn metric object), a scikit-learn metric to use as a scorer

selector_name: (str), name of a scikit-learn or MAST-ML feature selection routine

n_features_to_select: (int), total number of features to select, i.e. stopping criterion for number of features

Xgroups: (list), list of row indices corresponding to each group

Returns:

train_sizes: (numpy array), array of fractions of training data used in learning curve

train_mean: (numpy array), array of means of training data scores for each number of features

test_mean: (numpy array), array of means of testing data scores for each number of features

train_stdev: (numpy array), array of standard deviations of training data scores for each number of features

test_stdev: (numpy array), array of standard deviations of testing data scores for each number of features