DataframeUtilities

class mastml.feature_generators.DataframeUtilities[source]

Bases: object

Class of basic utilities for dataframe manipulation, and exchanging between dataframes and numpy arrays

Args:
None
Methods:
clean_dataframe : Method to clean dataframes after feature generation has occurred, to remove columns that have a single missing or NaN value, or remove a row that is fully empty
Args:
df: (dataframe), a post feature generation dataframe that needs cleaning
Returns:
df: (dataframe), the cleaned dataframe
merge_dataframe_columns : merge two dataframes by concatenating the column names (duplicate columns omitted)
Args:

dataframe1: (dataframe), a pandas dataframe object

dataframe2: (dataframe), a pandas dataframe object

Returns:
dataframe: (dataframe), merged dataframe
merge_dataframe_rows : merge two dataframes by concatenating the row contents (duplicate rows omitted)
Args:

dataframe1: (dataframe), a pandas dataframe object

dataframe2: (dataframe), a pandas dataframe object

Returns:
dataframe: (dataframe), merged dataframe
get_dataframe_statistics : obtain basic statistics about data contained in the dataframe
Args:
dataframe: (dataframe), a pandas dataframe object
Returns:
dataframe_stats: (dataframe), dataframe containing input dataframe statistics
dataframe_to_array : transform a pandas dataframe to a numpy array
Args:
dataframe: (dataframe), a pandas dataframe object
Returns:
array: (numpy array), a numpy array representation of the inputted dataframe
array_to_dataframe : transform a numpy array to a pandas dataframe
Args:
array: (numpy array), a numpy array
Returns:
dataframe: (dataframe), a pandas dataframe representation of the inputted numpy array
concatenate_arrays : merge two numpy arrays by concatenating along the columns
Args:

Xarray: (numpy array), a numpy array object

yarray: (numpy array), a numpy array object

Returns:
array: (numpy array), a numpy array merging the two input arrays
assign_columns_as_features : adds column names to dataframe based on the x and y feature names
Args:

dataframe: (dataframe), a pandas dataframe object

x_features: (list), list containing x feature names

y_feature: (str), target feature name

Returns:
dataframe: (dataframe), dataframe containing same data as input, with columns labeled with features
save_all_dataframe_statistics : obtain dataframe statistics and save it to a csv file
Args:

dataframe: (dataframe), a pandas dataframe object

data_path: (str), file path to save dataframe statistics to

Returns:
fname: (str), name of file dataframe stats saved to

Methods Summary

array_to_dataframe(array)
assign_columns_as_features(dataframe, …[, …])
clean_dataframe(df)
concatenate_arrays(X_array, y_array)
dataframe_to_array(dataframe)
get_dataframe_statistics(dataframe)
merge_dataframe_columns(dataframe1, dataframe2)
merge_dataframe_rows(dataframe1, dataframe2)
remove_constant_columns(dataframe)
save_all_dataframe_statistics(dataframe, …)

Methods Documentation

classmethod array_to_dataframe(array)[source]
classmethod assign_columns_as_features(dataframe, x_features, y_feature, remove_first_row=True)[source]
classmethod clean_dataframe(df)[source]
classmethod concatenate_arrays(X_array, y_array)[source]
classmethod dataframe_to_array(dataframe)[source]
classmethod get_dataframe_statistics(dataframe)[source]
classmethod merge_dataframe_columns(dataframe1, dataframe2)[source]
classmethod merge_dataframe_rows(dataframe1, dataframe2)[source]
classmethod remove_constant_columns(dataframe)[source]
classmethod save_all_dataframe_statistics(dataframe, configdict)[source]