Bootstrap

class mastml.legos.data_splitters.Bootstrap(n, n_bootstraps=3, train_size=0.5, test_size=None, n_train=None, n_test=None, random_state=0)[source]

Bases: object

# Note: Bootstrap taken directly from sklearn Github (https://github.com/scikit-learn/scikit-learn/blob/0.11.X/sklearn/cross_validation.py) # which was necessary as it was later removed from more recent sklearn releases Random sampling with replacement cross-validation iterator Provides train/test indices to split data in train test sets while resampling the input n_bootstraps times: each time a new random split of the data is performed and then samples are drawn (with replacement) on each side of the split to build the training and test sets. Note: contrary to other cross-validation strategies, bootstrapping will allow some samples to occur several times in each splits. However a sample that occurs in the train split will never occur in the test split and vice-versa. If you want each sample to occur at most once you should probably use ShuffleSplit cross validation instead.

Args:
n : int
Total number of elements in the dataset.
n_bootstraps : int (default is 3)
Number of bootstrapping iterations
train_size : int or float (default is 0.5)
If int, number of samples to include in the training split (should be smaller than the total number of samples passed in the dataset). If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split.
test_size : int or float or None (default is None)
If int, number of samples to include in the training set (should be smaller than the total number of samples passed in the dataset). If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If None, n_test is set as the complement of n_train.
random_state : int or RandomState
Pseudo number generator state used for random sampling.

Attributes Summary

indices

Methods Summary

get_n_splits([X, y, groups])
split(X, y[, groups])

Attributes Documentation

indices = True

Methods Documentation

get_n_splits(X=None, y=None, groups=None)[source]
split(X, y, groups=None)[source]