LeaveCloseCompositionsOut¶
-
class
mastml.legos.data_splitters.
LeaveCloseCompositionsOut
(dist_threshold=0.1, nn_kwargs=None)[source]¶ Bases:
sklearn.model_selection._split.BaseCrossValidator
Leave-P-out where you exclude materials with compositions close to those the test set
Computes the distance between the element fraction vectors. For example, the \(L_2\) distance between Al and Cu is \(\sqrt{2}\) and the \(L_1\) distance between Al and Al0.9Cu0.1 is 0.2.
Consequently, this splitter requires a list of compositions as the input to split rather than the features.
- Args:
- dist_threshold (float): Entries must be farther than this distance to be included in the
- training set
- nn_kwargs (dict): Keyword arguments for the scikit-learn NearestNeighbor class used
- to find nearest points
Methods Summary
get_n_splits
([X, y, groups])Returns the number of splitting iterations in the cross-validator split
(X[, y, groups])Generate indices to split data into training and test set. Methods Documentation
-
get_n_splits
(X=None, y=None, groups=None)[source]¶ Returns the number of splitting iterations in the cross-validator
-
split
(X, y=None, groups=None)[source]¶ Generate indices to split data into training and test set.
- X : array-like of shape (n_samples, n_features)
- Training data, where n_samples is the number of samples and n_features is the number of features.
- y : array-like of shape (n_samples,)
- The target variable for supervised learning problems.
- groups : array-like of shape (n_samples,), default=None
- Group labels for the samples used while splitting the dataset into train/test set.
- train : ndarray
- The training set indices for that split.
- test : ndarray
- The testing set indices for that split.