MatminerFeatureGenerator¶
- class mastml.feature_generators.MatminerFeatureGenerator(featurize_df, featurizer, composition_feature_types=['magpie', 'deml', 'matminer'], structure_feature_type='CoulombMatrix', remove_constant_columns=False, **kwargs)[source]¶
Bases:
BaseGenerator
Class to wrap feature generator routines contained in the matminer package to more neatly conform to the MAST-ML working environment, and have all under a single class
- Args:
featurize_df: (pd.DataFrame), input dataframe to be featurized. Needs to contain at least a column with chemical compositions or pymatgen Structure objects
featurizer: (str), type of featurization to conduct. Valid names are “composition” or “structure”
composition_feature_types: (list of str), if featurizer=’composition’, the type of composition-based features to include. Valid values are ‘magpie’, ‘deml’, ‘matminer’, ‘matscholar_el’, ‘megnet_el’
structure_feature_type: (str), if featurizer=’structure’, the type of structure-based featurization to conduct. See list below for valid names.
remove_constant_columns: (bool), whether or not to remove feature columns that are constant values. Default is False.
kwargs: additional keyword arguments needed if structure based features are being made
- Available structure featurizer types for structure_feature_types:
matminer.featurizers.structure [‘BagofBonds’, x ‘BondFractions’, x ‘ChemicalOrdering’, x ‘CoulombMatrix’, x ‘DensityFeatures’, x ‘Dimensionality’, x ‘ElectronicRadialDistributionFunction’, NEEDS TO BE FLATTENED ‘EwaldEnergy’, x, returns all NaN ‘GlobalInstabilityIndex’, x, returns all NaN ‘GlobalSymmetryFeatures’, x ‘JarvisCFID’, x ‘MaximumPackingEfficiency’, x ‘MinimumRelativeDistances’, x ‘OrbitalFieldMatrix’, x ‘PartialRadialDistributionFunction’, x ‘RadialDistributionFunction’, NEEDS TO BE FLATTENED ‘SineCoulombMatrix’, x ‘SiteStatsFingerprint’, x ‘StructuralComplexity’, x ‘StructuralHeterogeneity’, x ‘XRDPowderPattern’] x
matminer.featurizers.site [‘AGNIFingerprints’, ‘AngularFourierSeries’, ‘AseAtomsAdaptor’, ‘AverageBondAngle’, ‘AverageBondLength’, ‘BondOrientationalParameter’, ‘ChemEnvSiteFingerprint’, ‘ChemicalSRO’, ‘ConvexHull’, ‘CoordinationNumber’, ‘CrystalNN’, ‘CrystalNNFingerprint’, ‘Element’, ‘EwaldSiteEnergy’, ‘EwaldSummation’, ‘Gaussian’, ‘GaussianSymmFunc’, ‘GeneralizedRadialDistributionFunction’, ‘Histogram’, ‘IntersticeDistribution’, ‘LocalGeometryFinder’, ‘LocalPropertyDifference’, ‘LocalStructOrderParams’, ‘MagpieData’, ‘MultiWeightsChemenvStrategy’, ‘OPSiteFingerprint’, ‘SOAP’, ‘SimplestChemenvStrategy’, ‘SiteElementalProperty’, ‘VoronoiFingerprint’, ‘VoronoiNN’]
- Methods:
- fit: present for convenience and just passes through
- Args:
X: (pd.DataFrame), the X feature matrix
- Returns:
self
- transform: generates new features and transforms to have new dataframe with generated features
- Args:
X: (pd.DataFrame), the X feature matrix
- Returns:
df: (pd.DataFrame), the transformed dataframe containing generated features
y: (pd.Series), the target y-data
- generate_matminer_features: method to generate the composition or structure features of interest
- Args:
X: (pd.DataFrame), the X feature matrix
- Returns:
df: (pd.DataFrame), the transformed dataframe containing generated features
Methods Summary
fit
(X[, y])transform
(X)Methods Documentation