DataUtilities¶
- class mastml.data_cleaning.DataUtilities[source]¶
Bases:
object
Class that contains some basic data analysis utilities, such as flagging columns that contain problematic string entries, or flagging potential outlier values based on threshold values
- Args:
None
- Methods:
- flag_outliers: Method that scans values in each X feature matrix column and flags values that are larger than X standard deviations from the average of that column value. The index and column values of potentially problematic points are listed and written to an output file.
- Args:
X: (pd.DataFrame), dataframe containing X data
y: (pd.Series), series containing y data
savepath: (str), string containing the save path directory
n_stdevs: (int), number of standard deviations to use as threshold value
- Returns:
None
- flag_columns_with_strings: Method that ascertains which columns in data contain string entries
- Args:
X: (pd.DataFrame), dataframe containing X data
y: (pd.Series), series containing y data
savepath: (str), string containing the save path directory
- Returns:
None
Methods Summary
flag_columns_with_strings
(X, y, savepath)flag_outliers
(X, y, savepath[, n_stdevs])Methods Documentation