Code Documentation: Data splitters¶
mastml.legos.data_splitters Module¶
The data_splitters module contains a collection of classes for generating (train_indices, test_indices) pairs from a dataframe or a numpy array.
- For more information and a list of scikit-learn splitter classes, see:
- http://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection
Classes¶
BaseEstimator |
Base class for all estimators in scikit-learn. |
Bootstrap (n[, n_bootstraps, train_size, …]) |
# Note: Bootstrap taken directly from sklearn Github (https://github.com/scikit-learn/scikit-learn/blob/0.11.X/sklearn/cross_validation.py) # which was necessary as it was later removed from more recent sklearn releases Random sampling with replacement cross-validation iterator Provides train/test indices to split data in train test sets while resampling the input n_bootstraps times: each time a new random split of the data is performed and then samples are drawn (with replacement) on each side of the split to build the training and test sets. |
JustEachGroup () |
Class to train the model on one group at a time and test it on the rest of the data This class wraps scikit-learn’s LeavePGroupsOut with P set to n-1. |
LeaveCloseCompositionsOut ([dist_threshold, …]) |
Leave-P-out where you exclude materials with compositions close to those the test set |
LeaveOutPercent ([percent_leave_out, n_repeats]) |
Class to train the model using a certain percentage of data as training data |
NearestNeighbors (*[, n_neighbors, radius, …]) |
Unsupervised learner for implementing neighbor searches. |
NoSplit () |
Class to just train the model on the training data and test it on that same data. |
SplittersUnion (splitters) |
Class to take the union of two separate splitting routines, so that many splitting routines can be performed at once |
TransformerMixin |
Mixin class for all transformers in scikit-learn. |