LocalDatasets¶
- class mastml.datasets.LocalDatasets(file_path, feature_names=None, target=None, extra_columns=None, group_column=None, testdata_columns=None, average_duplicates=False, average_duplicates_col=None, as_frame=False)[source]¶
Bases:
object
Class to handle import and organization of a dataset stored locally.
- Args:
file_path: (str), path to the data file to import
feature_names: (list), list of strings containing the X feature names
target: (str), string denoting the y data (target) name
extra_columns: (list), list of strings containing additional column names that are not features or target
group_column: (str), string denoting the name of an input column to be used to group data
testdata_columns: (list), list of strings containing column names denoting sets of left-out data. Entries should be marked with a 0 (not left out) or 1 (left out)
average_duplicates: (bool), whether to average duplicate entries from the imported data.
average_duplicates_col: (str), string denoting column name to perform averaging of duplicate entries. Needs to be specified if average_duplicates is True.
as_frame: (bool), whether to return data as pandas dataframe (otherwise will be numpy array)
- Methods:
- _import: imports the data. Should be either .csv or .xlsx format
- Args:
None
- Returns:
df: (pd.DataFrame), pandas dataframe of full dataset
- _get_features: Method to assess which columns below to target, feature_names
- Args:
df: (pd.DataFrame), pandas dataframe of full dataset
- Returns:
None
- load_data: Method to import the data and ascertain which columns are features, target and extra based on provided input.
- Args:
copy: (bool), whether or not to copy the imported data to the designated savepath
savepath: (str), path to save the data to (used if copy=True)
- Returns:
data_dict: (dict), dictionary containing dataframes of X, y, groups, X_extra, X_testdata
Methods Summary
load_data
([copy, savepath])Methods Documentation