DataUtilities

class mastml.data_cleaning.DataUtilities[source]

Bases: object

Class that contains some basic data analysis utilities, such as flagging columns that contain problematic string entries, or flagging potential outlier values based on threshold values

Args:

None

Methods:
flag_outliers: Method that scans values in each X feature matrix column and flags values that are larger than X standard deviations from the average of that column value. The index and column values of potentially problematic points are listed and written to an output file.
Args:

X: (pd.DataFrame), dataframe containing X data

y: (pd.Series), series containing y data

savepath: (str), string containing the save path directory

n_stdevs: (int), number of standard deviations to use as threshold value

Returns:

None

flag_columns_with_strings: Method that ascertains which columns in data contain string entries
Args:

X: (pd.DataFrame), dataframe containing X data

y: (pd.Series), series containing y data

savepath: (str), string containing the save path directory

Returns:

None

Methods Summary

flag_columns_with_strings(X, y, savepath)

flag_outliers(X, y, savepath[, n_stdevs])

Methods Documentation

classmethod flag_columns_with_strings(X, y, savepath)[source]
classmethod flag_outliers(X, y, savepath, n_stdevs=3)[source]