LocalDatasets¶
-
class
mastml.datasets.
LocalDatasets
(file_path, feature_names=None, target=None, extra_columns=None, group_column=None, testdata_columns=None, as_frame=False)[source]¶ Bases:
object
Class to handle import and organization of a dataset stored locally.
- Args:
file_path: (str), path to the data file to import
feature_names: (list), list of strings containing the X feature names
target: (str), string denoting the y data (target) name
extra_columns: (list), list of strings containing additional column names that are not features or target
group_column: (str), string denoting the name of an input column to be used to group data
testdata_columns: (list), list of strings containing column names denoting sets of left-out data. Entries should be marked with a 0 (not left out) or 1 (left out)
as_frame: (bool), whether to return data as pandas dataframe (otherwise will be numpy array)
- Methods:
- _import: imports the data. Should be either .csv or .xlsx format
- Args:
- None
- Returns:
- df: (pd.DataFrame), pandas dataframe of full dataset
- _get_features: Method to assess which columns below to target, feature_names
- Args:
- df: (pd.DataFrame), pandas dataframe of full dataset
- Returns:
- None
- load_data: Method to import the data and ascertain which columns are features, target and extra based on provided input.
- Args:
- None
- Returns:
- data_dict: (dict), dictionary containing dataframes of X, y, groups, X_extra, X_testdata
Methods Summary
load_data
()Methods Documentation