Code Documentation: Feature Generators

mastml.feature_generators Module

This module contains a collection of classes for generating input features to fit machine learning models to.

BaseGenerator:

Base class to provide MAST-ML type functionality to all feature generators. All other feature generator classes should inherit from this base class

CBFVGenerator:

Class that is used to create elemental-type features using the Composition-based feature vector (CBFV) package: https://github.com/Kaaiian/CBFV

ElementalFractionGenerator:

Class written to encode element fractions in materials compositions as full 118-element vector per material, where each element in the vector represents an element on the periodic table.

ElementalFeatureGenerator:

Class written for MAST-ML to generate features for material compositions based on properties of the elements comprising the composition. A number of mathematically derived variants are included, like arithmetic average, composition-weighted average, range, max, and min. This generator also supports sublattice-based generation, where the elemental features can be averaged for each sublattice as opposed to just the total composition together. To use the sublattice feature of this generator, composition strings must include square brackets to separate the sublattices, e.g. the perovskite material La0.75Sr0.25MnO3 would be written as [La0.75Sr0.25][Mn][O3]

PolynomialFeatureGenerator:

Class used to construct new features based on a polynomial expansion of existing features. The degree of the polynomial is given as input. For example, for two features x1 and x2, the quadratic features x1^2, x2^2 and x1*x2 would be generated if the degree is set to 2.

OneHotGroupGenerator:

Class used to create a set of one-hot encoded features based on a single feature containing assorted categories. For example, if a feature contains strings denoting each data point as belonging to one of three groups such as “metal”, “semiconductor”, “insulator”, then the generated one-hot features are three feature columns containing a 1 or 0 to denote which group each data point is in

OneHotElementEncoder:

Class used to create a set of one-hot encoded features based on elements present in a supplied chemical composition string. For example, if the data set contains alloys of materials with chemical formulas such as “GaAs”, “InAs”, “InP”, etc., then the generated one-hot features are four feature columns containing a 1 or 0 to denote whether a particular data point contains each of the unique elements, in this case Ga, As, In, or P.

MaterialsProjectFeatureGenerator:

Class used to search the Materials Project database for computed material property information for the supplied composition. This only works if the material composition matches an entry present in the Materials Project. Will return material properties like formation energy, volume, electronic bandgap, elastic constants, etc.

MatminerFeatureGenerator:

Class used to combine various composition and structure-based feature generation routines in the matminer package into MAST-ML. The use of structure-based features will require pymatgen structure objects in the input dataframe, while composition-based features require only a composition string. See the class documentation for more information on the different types of feature generation this class supports.

DataframeUtilities:

Collection of helper routines for various common dataframe operations, like concatentation, merging, etc.

Functions

copy(x)

Shallow copy operation on arbitrary Python objects.

make_composition(composition_string)

Classes

BaseEstimator()

Base class for all estimators in scikit-learn.

BaseGenerator()

Class functioning as a base generator to support directory organization and evaluating different feature generators

CBFVGenerator(featurize_df[, ...])

Class that is used to create elemental-type features using the Composition-based feature vector (CBFV) package: https://github.com/Kaaiian/CBFV

Composition(*args[, strict])

Represents a Composition, a mapping of {element/species: amount} with enhanced functionality tailored for handling chemical compositions.

CompositionToOxidComposition([...])

Utility featurizer to add oxidation states to a pymatgen Composition.

DataframeUtilities()

Class of basic utilities for dataframe manipulation, and exchanging between dataframes and numpy arrays

DeepChemFeatureGenerator(featurize_df, ...)

Class that is used to create molecular property features using the DeepChem and RDKit packages.

Element(value)

Enum representing an element in the periodic table.

ElementProperty(data_source, features, stats)

Class to calculate elemental property attributes.

ElementalFeatureGenerator(featurize_df[, ...])

Class that is used to create elemental-based features from material composition strings

ElementalFeatureGenerator_Extra(featurize_df)

ElementalFractionGenerator(featurize_df[, ...])

Class that is used to create 86-element vector of element fractions from material composition strings

MPRester([api_key, include_user_agent])

Pymatgen's implementation of MPRester.

MaterialsProjectFeatureGenerator(...)

Class that wraps MaterialsProjectFeatureGeneration, giving it scikit-learn structure

MatminerFeatureGenerator(featurize_df, ...)

Class to wrap feature generator routines contained in the matminer package to more neatly conform to the MAST-ML working environment, and have all under a single class

Miedema([struct_types, ss_types, ...])

Formation enthalpies of intermetallic compounds, from Miedema et al.

OneHotElementGenerator(featurize_df[, ...])

Class to generate new categorical features (i.e. values of 1 or 0) based on whether an input composition contains a certain designated element.

OneHotEncoder(*[, categories, drop, ...])

Encode categorical features as a one-hot numeric array.

OneHotGroupGenerator(featurize_df[, ...])

Class to generate one-hot encoded values from a list of categories using scikit-learn's one hot encoder method More info at: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

OxidationStates([stats])

Statistics about the oxidation states for each specie.

PolynomialFeatureGenerator([featurize_df, ...])

Class to generate polynomial features using scikit-learn's polynomial features method More info at: http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html

PolynomialFeatures([degree, ...])

Generate polynomial and interaction features.

StrToComposition([reduce, target_col_id, ...])

Utility featurizer to convert a string to a Composition

TransformerMixin()

Mixin class for all transformers in scikit-learn.

datetime(year, month, day[, hour[, minute[, ...)

The year, month and day arguments are required.

tqdm(*_, **__)

Decorate an iterable object, returning an iterator which acts exactly like the original iterable, but prints a dynamically updating progressbar every time a value is requested.

Class Inheritance Diagram

digraph inheritance2e684d11c1 { bgcolor=transparent; rankdir=LR; size="8.0, 12.0"; "BaseEstimator" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="Base class for all estimators in scikit-learn."]; "ReprHTMLMixin" -> "BaseEstimator" [arrowsize=0.5,style="setlinewidth(0.5)"]; "_HTMLDocumentationLinkMixin" -> "BaseEstimator" [arrowsize=0.5,style="setlinewidth(0.5)"]; "_MetadataRequester" -> "BaseEstimator" [arrowsize=0.5,style="setlinewidth(0.5)"]; "BaseGenerator" [URL="api/mastml.feature_generators.BaseGenerator.html#mastml.feature_generators.BaseGenerator",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Class functioning as a base generator to support directory organization and evaluating different feature generators"]; "BaseEstimator" -> "BaseGenerator" [arrowsize=0.5,style="setlinewidth(0.5)"]; "TransformerMixin" -> "BaseGenerator" [arrowsize=0.5,style="setlinewidth(0.5)"]; "CBFVGenerator" [URL="api/mastml.feature_generators.CBFVGenerator.html#mastml.feature_generators.CBFVGenerator",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Class that is used to create elemental-type features using the Composition-based feature vector (CBFV) package:"]; "BaseGenerator" -> "CBFVGenerator" [arrowsize=0.5,style="setlinewidth(0.5)"]; "DataframeUtilities" [URL="api/mastml.feature_generators.DataframeUtilities.html#mastml.feature_generators.DataframeUtilities",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Class of basic utilities for dataframe manipulation, and exchanging between dataframes and numpy arrays"]; "DeepChemFeatureGenerator" [URL="api/mastml.feature_generators.DeepChemFeatureGenerator.html#mastml.feature_generators.DeepChemFeatureGenerator",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Class that is used to create molecular property features using the DeepChem and RDKit packages. A list of featurizers"]; "BaseGenerator" -> "DeepChemFeatureGenerator" [arrowsize=0.5,style="setlinewidth(0.5)"]; "ElementalFeatureGenerator" [URL="api/mastml.feature_generators.ElementalFeatureGenerator.html#mastml.feature_generators.ElementalFeatureGenerator",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Class that is used to create elemental-based features from material composition strings"]; "BaseGenerator" -> "ElementalFeatureGenerator" [arrowsize=0.5,style="setlinewidth(0.5)"]; "ElementalFeatureGenerator_Extra" [URL="api/mastml.feature_generators.ElementalFeatureGenerator_Extra.html#mastml.feature_generators.ElementalFeatureGenerator_Extra",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top"]; "BaseGenerator" -> "ElementalFeatureGenerator_Extra" [arrowsize=0.5,style="setlinewidth(0.5)"]; "ElementalFractionGenerator" [URL="api/mastml.feature_generators.ElementalFractionGenerator.html#mastml.feature_generators.ElementalFractionGenerator",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Class that is used to create 86-element vector of element fractions from material composition strings"]; "BaseGenerator" -> "ElementalFractionGenerator" [arrowsize=0.5,style="setlinewidth(0.5)"]; "MaterialsProjectFeatureGenerator" [URL="api/mastml.feature_generators.MaterialsProjectFeatureGenerator.html#mastml.feature_generators.MaterialsProjectFeatureGenerator",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Class that wraps MaterialsProjectFeatureGeneration, giving it scikit-learn structure"]; "BaseGenerator" -> "MaterialsProjectFeatureGenerator" [arrowsize=0.5,style="setlinewidth(0.5)"]; "MatminerFeatureGenerator" [URL="api/mastml.feature_generators.MatminerFeatureGenerator.html#mastml.feature_generators.MatminerFeatureGenerator",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Class to wrap feature generator routines contained in the matminer package to more neatly conform to the"]; "BaseGenerator" -> "MatminerFeatureGenerator" [arrowsize=0.5,style="setlinewidth(0.5)"]; "OneHotElementGenerator" [URL="api/mastml.feature_generators.OneHotElementGenerator.html#mastml.feature_generators.OneHotElementGenerator",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Class to generate new categorical features (i.e. values of 1 or 0) based on whether an input composition contains a"]; "BaseGenerator" -> "OneHotElementGenerator" [arrowsize=0.5,style="setlinewidth(0.5)"]; "OneHotGroupGenerator" [URL="api/mastml.feature_generators.OneHotGroupGenerator.html#mastml.feature_generators.OneHotGroupGenerator",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Class to generate one-hot encoded values from a list of categories using scikit-learn's one hot encoder method"]; "BaseGenerator" -> "OneHotGroupGenerator" [arrowsize=0.5,style="setlinewidth(0.5)"]; "PolynomialFeatureGenerator" [URL="api/mastml.feature_generators.PolynomialFeatureGenerator.html#mastml.feature_generators.PolynomialFeatureGenerator",fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",target="_top",tooltip="Class to generate polynomial features using scikit-learn's polynomial features method"]; "BaseGenerator" -> "PolynomialFeatureGenerator" [arrowsize=0.5,style="setlinewidth(0.5)"]; "ReprHTMLMixin" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="Mixin to handle consistently the HTML representation."]; "TransformerMixin" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="Mixin class for all transformers in scikit-learn."]; "_SetOutputMixin" -> "TransformerMixin" [arrowsize=0.5,style="setlinewidth(0.5)"]; "_HTMLDocumentationLinkMixin" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="Mixin class allowing to generate a link to the API documentation."]; "_MetadataRequester" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="Mixin class for adding metadata request functionality."]; "_SetOutputMixin" [fillcolor=white,fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5),filled",tooltip="Mixin that dynamically wraps methods to return container based on config."]; }