Code Documentation: Data splitters

mastml.legos.data_splitters Module

The data_splitters module contains a collection of classes for generating (train_indices, test_indices) pairs from a dataframe or a numpy array.

For more information and a list of scikit-learn splitter classes, see:
http://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection

Classes

BaseEstimator Base class for all estimators in scikit-learn.
Bootstrap(n[, n_bootstraps, train_size, …]) # Note: Bootstrap taken directly from sklearn Github (https://github.com/scikit-learn/scikit-learn/blob/0.11.X/sklearn/cross_validation.py) # which was necessary as it was later removed from more recent sklearn releases Random sampling with replacement cross-validation iterator Provides train/test indices to split data in train test sets while resampling the input n_bootstraps times: each time a new random split of the data is performed and then samples are drawn (with replacement) on each side of the split to build the training and test sets.
JustEachGroup() Class to train the model on one group at a time and test it on the rest of the data This class wraps scikit-learn’s LeavePGroupsOut with P set to n-1.
LeaveCloseCompositionsOut([dist_threshold, …]) Leave-P-out where you exclude materials with compositions close to those the test set
LeaveOutPercent([percent_leave_out, n_repeats]) Class to train the model using a certain percentage of data as training data
NearestNeighbors(*[, n_neighbors, radius, …]) Unsupervised learner for implementing neighbor searches.
NoSplit() Class to just train the model on the training data and test it on that same data.
SplittersUnion(splitters) Class to take the union of two separate splitting routines, so that many splitting routines can be performed at once
TransformerMixin Mixin class for all transformers in scikit-learn.

Class Inheritance Diagram

Inheritance diagram of mastml.legos.data_splitters.Bootstrap, mastml.legos.data_splitters.JustEachGroup, mastml.legos.data_splitters.LeaveCloseCompositionsOut, mastml.legos.data_splitters.LeaveOutPercent, mastml.legos.data_splitters.NoSplit, mastml.legos.data_splitters.SplittersUnion