DataCleaning¶
- class mastml.data_cleaning.DataCleaning[source]¶
Bases:
object
Class to perform various data cleaning operations, such as imputation or NaN removal
- Args:
None
- Methods:
- remove: Method that removes a full column or row of data values if one column or row contains NaN or is blank
- Args:
X: (pd.DataFrame), dataframe containing X data
y: (pd.Series), series containing y data
axis: (int), whether to remove rows (axis=0) or columns (axis=1)
- Returns:
X: (pd.DataFrame): dataframe of cleaned X data
y: (pd.Series): series of cleaned y data
- imputation: Method that imputes values to the missing places based on the median, mean, etc. of the data in the column
- Args:
X: (pd.DataFrame), dataframe containing X data
y: (pd.Series), series containing y data
strategy: (str), method of imputation, e.g. median, mean, etc.
- Returns:
X: (pd.DataFrame): dataframe of cleaned X data
y: (pd.Series): series of cleaned y data
- ppca: Method that imputes data using principal component analysis to interpolate missing values
- Args:
X: (pd.DataFrame), dataframe containing X data
y: (pd.Series), series containing y data
- Returns:
X: (pd.DataFrame): dataframe of cleaned X data
y: (pd.Series): series of cleaned y data
- evaluate: Main method to evaluate initial data analysis routines (e.g. flag outliers), perform data cleaning and save output to folder
- Args:
X: (pd.DataFrame), dataframe containing X data
y: (pd.Series), series containing y data
method: (str), data cleaning method name, must be one of ‘remove’, ‘imputation’ or ‘ppca’
savepath: (str), string containing the savepath information
kwargs: additional keyword arguments needed for the remove, imputation or ppca methods
- Returns:
X: (pd.DataFrame): dataframe of cleaned X data
y: (pd.Series): series of cleaned y data
- _setup_savedir: method to create a savedir based on the provided model, splitter, selector names and datetime
- Args:
savepath: (str), string designating the savepath
- Returns:
splitdir: (str), string containing the new subdirectory to save results to
Methods Summary
evaluate
(X, y, method[, savepath, make_new_dir])imputation
(X, y, strategy)ppca
(X, y)remove
(X, y, axis)Methods Documentation