LocalDatasets

class mastml.datasets.LocalDatasets(file_path, feature_names=None, target=None, extra_columns=None, group_column=None, testdata_columns=None, as_frame=False)[source]

Bases: object

Class to handle import and organization of a dataset stored locally.

Args:

file_path: (str), path to the data file to import

feature_names: (list), list of strings containing the X feature names

target: (str), string denoting the y data (target) name

extra_columns: (list), list of strings containing additional column names that are not features or target

group_column: (str), string denoting the name of an input column to be used to group data

testdata_columns: (list), list of strings containing column names denoting sets of left-out data. Entries should be marked with a 0 (not left out) or 1 (left out)

as_frame: (bool), whether to return data as pandas dataframe (otherwise will be numpy array)

Methods:
_import: imports the data. Should be either .csv or .xlsx format
Args:
None
Returns:
df: (pd.DataFrame), pandas dataframe of full dataset
_get_features: Method to assess which columns below to target, feature_names
Args:
df: (pd.DataFrame), pandas dataframe of full dataset
Returns:
None
load_data: Method to import the data and ascertain which columns are features, target and extra based on provided input.
Args:
None
Returns:
data_dict: (dict), dictionary containing dataframes of X, y, groups, X_extra, X_testdata

Methods Summary

load_data()

Methods Documentation

load_data()[source]