Code Documentation: Data cleaner

mastml.data_cleaner Module

The data_cleaner module is used to clean missing or NaN values from pandas dataframes (e.g. removing NaN, imputation, etc.)

Functions

columns_with_strings(df) Method that ascertains which columns in data contain string entries
flag_outliers(df, conf_not_input_features, …) Method that scans values in each X feature matrix column and flags values that are larger than 3 standard deviations from the average of that column value.
imputation(df, strategy[, cols_to_leave_out]) Method that imputes values to the missing places based on the median, mean, etc.
orth(A[, rcond]) Construct an orthonormal basis for the range of A using SVD
ppca(df[, cols_to_leave_out]) Method that performs a recursive PCA routine to use PCA of known columns to fill in missing values in particular column
remove(df, axis) Method that removes a full column or row of data values if one column or row contains NaN or is blank

Classes

PPCA() Class to perform probabilistic principal component analysis (PPCA) to fill in missing data.
SimpleImputer(*[, missing_values, strategy, …]) Imputation transformer for completing missing values.

Class Inheritance Diagram

Inheritance diagram of mastml.data_cleaner.PPCA