Code Documentation: Data splitters¶

mastml.legos.data_splitters Module¶

The data_splitters module contains a collection of classes for generating (train_indices, test_indices) pairs from a dataframe or a numpy array.

For more information and a list of scikit-learn splitter classes, see:: http://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection

`BaseEstimator`	Base class for all estimators in scikit-learn.
`Bootstrap`(n[, n_bootstraps, train_size, …])	# Note: Bootstrap taken directly from sklearn Github (https://github.com/scikit-learn/scikit-learn/blob/0.11.X/sklearn/cross_validation.py) # which was necessary as it was later removed from more recent sklearn releases Random sampling with replacement cross-validation iterator Provides train/test indices to split data in train test sets while resampling the input n_bootstraps times: each time a new random split of the data is performed and then samples are drawn (with replacement) on each side of the split to build the training and test sets.
`JustEachGroup`()	Class to train the model on one group at a time and test it on the rest of the data This class wraps scikit-learn’s LeavePGroupsOut with P set to n-1.
`LeaveCloseCompositionsOut`([dist_threshold, …])	Leave-P-out where you exclude materials with compositions close to those the test set
`LeaveOutPercent`([percent_leave_out, n_repeats])	Class to train the model using a certain percentage of data as training data
`NearestNeighbors`(*[, n_neighbors, radius, …])	Unsupervised learner for implementing neighbor searches.
`NoSplit`()	Class to just train the model on the training data and test it on that same data.
`SplittersUnion`(splitters)	Class to take the union of two separate splitting routines, so that many splitting routines can be performed at once
`TransformerMixin`	Mixin class for all transformers in scikit-learn.