Code Documentation: Feature Generators

mastml.feature_generators Module

This module contains a collection of classes for generating input features to fit machine learning models to.

BaseGenerator:

Base class to provide MAST-ML type functionality to all feature generators. All other feature generator classes should inherit from this base class

ElementalFractionGenerator:

Class written to encode element fractions in materials compositions as full 118-element vector per material, where each element in the vector represents an element on the periodic table.

ElementalFeatureGenerator:

Class written for MAST-ML to generate features for material compositions based on properties of the elements comprising the composition. A number of mathematically derived variants are included, like arithmetic average, composition-weighted average, range, max, and min. This generator also supports sublattice-based generation, where the elemental features can be averaged for each sublattice as opposed to just the total composition together. To use the sublattice feature of this generator, composition strings must include square brackets to separate the sublattices, e.g. the perovskite material La0.75Sr0.25MnO3 would be written as [La0.75Sr0.25][Mn][O3]

PolynomialFeatureGenerator:

Class used to construct new features based on a polynomial expansion of existing features. The degree of the polynomial is given as input. For example, for two features x1 and x2, the quadratic features x1^2, x2^2 and x1*x2 would be generated if the degree is set to 2.

OneHotGroupGenerator:

Class used to create a set of one-hot encoded features based on a single feature containing assorted categories. For example, if a feature contains strings denoting each data point as belonging to one of three groups such as “metal”, “semiconductor”, “insulator”, then the generated one-hot features are three feature columns containing a 1 or 0 to denote which group each data point is in

OneHotElementEncoder:

Class used to create a set of one-hot encoded features based on elements present in a supplied chemical composition string. For example, if the data set contains alloys of materials with chemical formulas such as “GaAs”, “InAs”, “InP”, etc., then the generated one-hot features are four feature columns containing a 1 or 0 to denote whether a particular data point contains each of the unique elements, in this case Ga, As, In, or P.

MaterialsProjectFeatureGenerator:

Class used to search the Materials Project database for computed material property information for the supplied composition. This only works if the material composition matches an entry present in the Materials Project. Will return material properties like formation energy, volume, electronic bandgap, elastic constants, etc.

MatminerFeatureGenerator:

Class used to combine various composition and structure-based feature generation routines in the matminer package into MAST-ML. The use of structure-based features will require pymatgen structure objects in the input dataframe, while composition-based features require only a composition string. See the class documentation for more information on the different types of feature generation this class supports.

DataframeUtilities:

Collection of helper routines for various common dataframe operations, like concatentation, merging, etc.

Classes

BaseEstimator()

Base class for all estimators in scikit-learn.

BaseGenerator()

Class functioning as a base generator to support directory organization and evaluating different feature generators

Composition(*args[, strict])

Represents a Composition, which is essentially a {element:amount} mapping type.

DataframeUtilities()

Class of basic utilities for dataframe manipulation, and exchanging between dataframes and numpy arrays

Element(value)

Enum representing an element in the periodic table.

ElementalFeatureGenerator(composition_df[, ...])

Class that is used to create elemental-based features from material composition strings

ElementalFractionGenerator(composition_df[, ...])

Class that is used to create 86-element vector of element fractions from material composition strings

MPRester([api_key, endpoint, ...])

A class to conveniently interface with the Materials Project REST interface. The recommended way to use MPRester is with the "with" context manager to ensure that sessions are properly closed after usage::.

MaterialsProjectFeatureGenerator(...)

Class that wraps MaterialsProjectFeatureGeneration, giving it scikit-learn structure

MatminerFeatureGenerator(featurize_df, ...)

Class to wrap feature generator routines contained in the matminer package to more neatly conform to the MAST-ML working environment, and have all under a single class

OneHotElementEncoder(composition_df[, ...])

Class to generate new categorical features (i.e.

OneHotEncoder(*[, categories, drop, sparse, ...])

Encode categorical features as a one-hot numeric array.

OneHotGroupGenerator(groups[, ...])

Class to generate one-hot encoded values from a list of categories using scikit-learn's one hot encoder method More info at: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

PolynomialFeatureGenerator([features, ...])

Class to generate polynomial features using scikit-learn's polynomial features method More info at: http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html

PolynomialFeatures([degree, ...])

Generate polynomial and interaction features.

TransformerMixin()

Mixin class for all transformers in scikit-learn.

datetime(year, month, day[, hour[, minute[, ...)

The year, month and day arguments are required.

Class Inheritance Diagram

Inheritance diagram of mastml.feature_generators.BaseGenerator, mastml.feature_generators.DataframeUtilities, mastml.feature_generators.ElementalFeatureGenerator, mastml.feature_generators.ElementalFractionGenerator, mastml.feature_generators.MaterialsProjectFeatureGenerator, mastml.feature_generators.MatminerFeatureGenerator, mastml.feature_generators.OneHotElementEncoder, mastml.feature_generators.OneHotGroupGenerator, mastml.feature_generators.PolynomialFeatureGenerator