DataCleaning

class mastml.data_cleaning.DataCleaning[source]

Bases: object

Class to perform various data cleaning operations, such as imputation or NaN removal

Args:

None

Methods:
remove: Method that removes a full column or row of data values if one column or row contains NaN or is blank
Args:

X: (pd.DataFrame), dataframe containing X data

y: (pd.Series), series containing y data

axis: (int), whether to remove rows (axis=0) or columns (axis=1)

Returns:

X: (pd.DataFrame): dataframe of cleaned X data

y: (pd.Series): series of cleaned y data

imputation: Method that imputes values to the missing places based on the median, mean, etc. of the data in the column
Args:

X: (pd.DataFrame), dataframe containing X data

y: (pd.Series), series containing y data

strategy: (str), method of imputation, e.g. median, mean, etc.

Returns:

X: (pd.DataFrame): dataframe of cleaned X data

y: (pd.Series): series of cleaned y data

ppca: Method that imputes data using principal component analysis to interpolate missing values
Args:

X: (pd.DataFrame), dataframe containing X data

y: (pd.Series), series containing y data

Returns:

X: (pd.DataFrame): dataframe of cleaned X data

y: (pd.Series): series of cleaned y data

evaluate: Main method to evaluate initial data analysis routines (e.g. flag outliers), perform data cleaning and save output to folder
Args:

X: (pd.DataFrame), dataframe containing X data

y: (pd.Series), series containing y data

method: (str), data cleaning method name, must be one of ‘remove’, ‘imputation’ or ‘ppca’

savepath: (str), string containing the savepath information

kwargs: additional keyword arguments needed for the remove, imputation or ppca methods

Returns:

X: (pd.DataFrame): dataframe of cleaned X data

y: (pd.Series): series of cleaned y data

_setup_savedir: method to create a savedir based on the provided model, splitter, selector names and datetime
Args:

savepath: (str), string designating the savepath

Returns:

splitdir: (str), string containing the new subdirectory to save results to

Methods Summary

evaluate(X, y, method[, savepath, make_new_dir])

imputation(X, y, strategy)

ppca(X, y)

remove(X, y, axis)

Methods Documentation

evaluate(X, y, method, savepath=None, make_new_dir=True, **kwargs)[source]
imputation(X, y, strategy)[source]
ppca(X, y)[source]
remove(X, y, axis)[source]