Code Documentation: Feature Selectors

mastml.feature_selectors Module

This module contains a collection of routines to perform feature selection.

BaseSelector:

Base class to have MAST-ML like workflow functionality for feature selectors. All feature selection routines should inherit this base class

SklearnFeatureSelector:

Class to wrap feature selectors from the scikit-learn package and make them have functionality from BaseSelector. Any scikit-learn feature selector from sklearn.feature_selection can be used by providing the name of the selector class as a string.

NoSelect:

Class that performs no feature selection and just uses all features in the dataset. Needed as a placeholder when evaluating data splits in a MAST-ML run where feature selection is not performed.

EnsembleModelFeatureSelector:

Class to selects features based on the feature importances scores obtained when fitting an ensemble-based model. Any model with the feature_importances_ attribute will work, e.g. sklearn’s RandomForestRegressor and GradientBoostingRegressor.

PearsonSelector:

Class that selects features based on their Pearson correlation score with the target data. Can also be used to assess Pearson correlation between features for use to reduce dimensionality of the feature space.

MASTMLFeatureSelector:

Class written for MAST-ML to perform more flexible forward selection than what can be found in scikit-learn. Allows the user to specify a particular model and cross validation routine for selecting features, as well as the ability to forcibly select certain features on the outset.

ShapFeatureSelector:

Class to select features based on how much each of the features contribute to the model in predicting the target data.

Functions

pearsonr(x, y)

Pearson correlation coefficient and p-value for testing non-correlation.

root_mean_squared_error(y_true, y_pred)

Method that calculates the root mean squared error (RMSE)

selected_features_correlation(X, savepath, ...)

Function to get the correlation between two sets of features selected from two different methods of feature selection

Classes

BaseEstimator()

Base class for all estimators in scikit-learn.

BaseSelector()

Base class that forms foundation of MAST-ML feature selectors

EnsembleModelFeatureSelector(model, ...[, ...])

Class custom-written for MAST-ML to conduct selection of features with ensemble model feature importances

KFold([n_splits, shuffle, random_state])

K-Folds cross-validator

MASTMLFeatureSelector(model, ...[, cv, ...])

Class custom-written for MAST-ML to conduct forward selection of features with flexible model and cv scheme

NoSelect()

Class for having a "null" transform where the output is the same as the input.

PearsonSelector(threshold_between_features, ...)

Class custom-written for MAST-ML to conduct selection of features based on Pearson correlation coefficent between features and target.

ShapFeatureSelector(model, n_features_to_select)

Class custom-written for MAST-ML to conduct selection of features with SHAP

SklearnFeatureSelector(selector, **kwargs)

Class that wraps scikit-learn feature selection methods with some new MAST-ML functionality

TransformerMixin()

Mixin class for all transformers in scikit-learn.

datetime(year, month, day[, hour[, minute[, ...)

The year, month and day arguments are required.

Class Inheritance Diagram

Inheritance diagram of mastml.feature_selectors.BaseSelector, mastml.feature_selectors.EnsembleModelFeatureSelector, mastml.feature_selectors.MASTMLFeatureSelector, mastml.feature_selectors.NoSelect, mastml.feature_selectors.PearsonSelector, mastml.feature_selectors.ShapFeatureSelector, mastml.feature_selectors.SklearnFeatureSelector