DataframeUtilities

class mastml.feature_generators.DataframeUtilities[source]

Bases: object

Class of basic utilities for dataframe manipulation, and exchanging between dataframes and numpy arrays

Args:

None

Methods:
clean_dataframeMethod to clean dataframes after feature generation has occurred, to remove columns that have a single missing or NaN value, or remove a row that is fully empty
Args:

df: (dataframe), a post feature generation dataframe that needs cleaning

Returns:

df: (dataframe), the cleaned dataframe

merge_dataframe_columnsmerge two dataframes by concatenating the column names (duplicate columns omitted)
Args:

dataframe1: (dataframe), a pandas dataframe object

dataframe2: (dataframe), a pandas dataframe object

Returns:

dataframe: (dataframe), merged dataframe

merge_dataframe_rowsmerge two dataframes by concatenating the row contents (duplicate rows omitted)
Args:

dataframe1: (dataframe), a pandas dataframe object

dataframe2: (dataframe), a pandas dataframe object

Returns:

dataframe: (dataframe), merged dataframe

get_dataframe_statisticsobtain basic statistics about data contained in the dataframe
Args:

dataframe: (dataframe), a pandas dataframe object

Returns:

dataframe_stats: (dataframe), dataframe containing input dataframe statistics

dataframe_to_arraytransform a pandas dataframe to a numpy array
Args:

dataframe: (dataframe), a pandas dataframe object

Returns:

array: (numpy array), a numpy array representation of the inputted dataframe

array_to_dataframetransform a numpy array to a pandas dataframe
Args:

array: (numpy array), a numpy array

Returns:

dataframe: (dataframe), a pandas dataframe representation of the inputted numpy array

concatenate_arraysmerge two numpy arrays by concatenating along the columns
Args:

Xarray: (numpy array), a numpy array object

yarray: (numpy array), a numpy array object

Returns:

array: (numpy array), a numpy array merging the two input arrays

assign_columns_as_featuresadds column names to dataframe based on the x and y feature names
Args:

dataframe: (dataframe), a pandas dataframe object

x_features: (list), list containing x feature names

y_feature: (str), target feature name

Returns:

dataframe: (dataframe), dataframe containing same data as input, with columns labeled with features

save_all_dataframe_statisticsobtain dataframe statistics and save it to a csv file
Args:

dataframe: (dataframe), a pandas dataframe object

data_path: (str), file path to save dataframe statistics to

Returns:

fname: (str), name of file dataframe stats saved to

Methods Summary

array_to_dataframe(array)

assign_columns_as_features(dataframe, ...[, ...])

clean_dataframe(df)

concatenate_arrays(X_array, y_array)

dataframe_to_array(dataframe)

get_dataframe_statistics(dataframe)

merge_dataframe_columns(dataframe1, dataframe2)

merge_dataframe_rows(dataframe1, dataframe2)

remove_constant_columns(dataframe)

save_all_dataframe_statistics(dataframe, ...)

Methods Documentation

classmethod array_to_dataframe(array)[source]
classmethod assign_columns_as_features(dataframe, x_features, y_feature, remove_first_row=True)[source]
classmethod clean_dataframe(df)[source]
classmethod concatenate_arrays(X_array, y_array)[source]
classmethod dataframe_to_array(dataframe)[source]
classmethod get_dataframe_statistics(dataframe)[source]
classmethod merge_dataframe_columns(dataframe1, dataframe2)[source]
classmethod merge_dataframe_rows(dataframe1, dataframe2)[source]
classmethod remove_constant_columns(dataframe)[source]
classmethod save_all_dataframe_statistics(dataframe, configdict)[source]