BasePreprocessor¶

class mastml.preprocessing.BasePreprocessor(preprocessor, as_frame=False)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Base class to provide new methods beyond sklearn fit_transform, such as dataframe support and directory management

Args:

preprocessor : a sklearn.preprocessor object, e.g. StandardScaler or mastml.preprocessing object

Methods:

fit_transform: method that fits the data to the preprocessor, then transforms it to the preprocessed data

Args:

X: (pd.DataFrame), dataframe of X features

y: (pd.Series), series of y target data

Returns:

Transformed data (pd.DataFrame or numpy array based on self.as_frame)

evaluate: main method to evaluate a preprocessor, build directory and save data output

Args:

X: (pd.DataFrame), dataframe of X features

y: (pd.Series), series of y target data

savepath: (str), string containing main savepath to construct splits for saving output

Returns:

Xnew (pd.DataFrame or numpy array), dataframe or array of the preprocessed X features

help: method to output key information on class use, e.g. methods and parameters

_setup_savedir: method to create a savedir based on the provided model, splitter, selector names and datetime

Args:

model: (mastml.models.SklearnModel or other estimator object), an estimator, e.g. KernelRidge

selector: (mastml.feature_selectors or other selector object), a selector, e.g. EnsembleModelFeatureSelector

savepath: (str), string designating the savepath

Returns:

splitdir: (str), string containing the new subdirectory to save results to

Methods Summary

`evaluate`(X[, y, savepath, file_name, …])
`fit`(X)
`fit_transform`(X[, y])	Fit to data, then transform it.
`help`()
`inverse_transform`(X)
`transform`(X)

Methods Documentation

evaluate(X, y=None, savepath=None, file_name='', make_new_dir=False)[source]¶

fit_transform(X, y=None, **fit_params)[source]¶

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

X : array-like of shape (n_samples, n_features): Input samples.
y : array-like of shape (n_samples,) or (n_samples, n_outputs), default=None: Target values (None for unsupervised transformations).
**fit_params : dict: Additional fit parameters.

X_new : ndarray array of shape (n_samples, n_features_new): Transformed array.