OneHotGroupGenerator

class mastml.feature_generators.OneHotGroupGenerator(featurize_df, remove_constant_columns=False)[source]

Bases: BaseGenerator

Class to generate one-hot encoded values from a list of categories using scikit-learn’s one hot encoder method More info at: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

Args:

featurize_df: (pd.DataFrame): pandas dataframe of group (category) names to make one hot features from

remove_constant_columns: (bool), whether to remove constant columns from the generated feature set. It is recommended

for this to be set to False to preserve as many features as possible, to avoid potential issues at inference time when features for new test points need to be generated.

Methods:
fit: pass through, copies input columns as pre-generated features
Args:

X: (pd.DataFrame), input dataframe containing X data

y: (pd.Series), series containing y data

transform: generate the one-hot encoded features. There will be n columns made, where n = number of unique categories in groups
Args:

None.

Returns:

df: (dataframe), output dataframe containing generated features

y: (series), output y data as series

Methods Summary

fit(X[, y])

transform([X])

Methods Documentation

fit(X, y=None)[source]
transform(X=None)[source]