Code Documentation: Data Cleaning

mastml.data_cleaning Module

This module provides various methods for cleaning data that has been imported into MAST-ML, prior to model fitting.

DataCleaning:

Class that enables easy use of various data cleaning methods, such as removal of missing values, different modes of data imputation, or using principal componenet analysis to fill interpolate missing values.

DataUtilities:

Support class used to evaluate some basic statistics of imported data, such as its distribution, mean, etc. Also provides a means of flagging potential outlier datapoints based on their deviation from the overall data distribution.

PPCA:

Class used by the PCA data cleaning routine in the DataCleaning class to perform probabilistic PCA to fill in missing data.

Classes

Counter(**kwds)

Dict subclass for counting hashable items.

DataCleaning()

Class to perform various data cleaning operations, such as imputation or NaN removal

DataUtilities()

Class that contains some basic data analysis utilities, such as flagging columns that contain problematic string entries, or flagging potential outlier values based on threshold values

Histogram()

Class to generate histogram plots, such as histograms of residual values

PPCA()

Class to perform probabilistic principal component analysis (PPCA) to fill in missing data.

SimpleImputer(*[, missing_values, strategy, ...])

Imputation transformer for completing missing values.

datetime(year, month, day[, hour[, minute[, ...)

The year, month and day arguments are required.

Class Inheritance Diagram

Inheritance diagram of mastml.data_cleaning.DataCleaning, mastml.data_cleaning.DataUtilities, mastml.data_cleaning.PPCA