MatminerFeatureGenerator

class mastml.feature_generators.MatminerFeatureGenerator(featurize_df, featurizer, composition_feature_types=['magpie', 'deml', 'matminer'], structure_feature_type='CoulombMatrix', remove_constant_columns=False, **kwargs)[source]

Bases: BaseGenerator

Class to wrap feature generator routines contained in the matminer package to more neatly conform to the MAST-ML working environment, and have all under a single class

Args:

featurize_df: (pd.DataFrame), input dataframe to be featurized. Needs to contain at least a column with chemical compositions or pymatgen Structure objects

featurizer: (str), type of featurization to conduct. Valid names are “composition” or “structure”

composition_feature_types: (list of str), if featurizer=’composition’, the type of composition-based features to include. Valid values are ‘magpie’, ‘deml’, ‘matminer’, ‘matscholar_el’, ‘megnet_el’

structure_feature_type: (str), if featurizer=’structure’, the type of structure-based featurization to conduct. See list below for valid names.

remove_constant_columns: (bool), whether or not to remove feature columns that are constant values. Default is False.

kwargs: additional keyword arguments needed if structure based features are being made

Available structure featurizer types for structure_feature_types:

matminer.featurizers.structure [‘BagofBonds’, x ‘BondFractions’, x ‘ChemicalOrdering’, x ‘CoulombMatrix’, x ‘DensityFeatures’, x ‘Dimensionality’, x ‘ElectronicRadialDistributionFunction’, NEEDS TO BE FLATTENED ‘EwaldEnergy’, x, returns all NaN ‘GlobalInstabilityIndex’, x, returns all NaN ‘GlobalSymmetryFeatures’, x ‘JarvisCFID’, x ‘MaximumPackingEfficiency’, x ‘MinimumRelativeDistances’, x ‘OrbitalFieldMatrix’, x ‘PartialRadialDistributionFunction’, x ‘RadialDistributionFunction’, NEEDS TO BE FLATTENED ‘SineCoulombMatrix’, x ‘SiteStatsFingerprint’, x ‘StructuralComplexity’, x ‘StructuralHeterogeneity’, x ‘XRDPowderPattern’] x

matminer.featurizers.site [‘AGNIFingerprints’, ‘AngularFourierSeries’, ‘AseAtomsAdaptor’, ‘AverageBondAngle’, ‘AverageBondLength’, ‘BondOrientationalParameter’, ‘ChemEnvSiteFingerprint’, ‘ChemicalSRO’, ‘ConvexHull’, ‘CoordinationNumber’, ‘CrystalNN’, ‘CrystalNNFingerprint’, ‘Element’, ‘EwaldSiteEnergy’, ‘EwaldSummation’, ‘Gaussian’, ‘GaussianSymmFunc’, ‘GeneralizedRadialDistributionFunction’, ‘Histogram’, ‘IntersticeDistribution’, ‘LocalGeometryFinder’, ‘LocalPropertyDifference’, ‘LocalStructOrderParams’, ‘MagpieData’, ‘MultiWeightsChemenvStrategy’, ‘OPSiteFingerprint’, ‘SOAP’, ‘SimplestChemenvStrategy’, ‘SiteElementalProperty’, ‘VoronoiFingerprint’, ‘VoronoiNN’]

Methods:
fit: present for convenience and just passes through
Args:

X: (pd.DataFrame), the X feature matrix

Returns:

self

transform: generates new features and transforms to have new dataframe with generated features
Args:

X: (pd.DataFrame), the X feature matrix

Returns:

df: (pd.DataFrame), the transformed dataframe containing generated features

y: (pd.Series), the target y-data

generate_matminer_features: method to generate the composition or structure features of interest
Args:

X: (pd.DataFrame), the X feature matrix

Returns:

df: (pd.DataFrame), the transformed dataframe containing generated features

Methods Summary

fit(X[, y])

generate_matminer_features(X)

transform(X)

Methods Documentation

fit(X, y=None)[source]
generate_matminer_features(X)[source]
transform(X)[source]