Matminer

class mastml.legos.feature_generators.Matminer(structural_features, structure_col)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Class to generate structural features from matminer structure module Args:

structural_features: the structure feature(s) the user wants to instantiate and generate structure_col: the dataframe column that contains the pymatgen structure object. Matminer needs a pymatgen structure object in order to instantiate the structural feature
Methods:

fit: pass through, needed to maintain scikit-learn class structure Args:

df: (dataframe), dataframe of input x and y data

transform: main method that iterates through rows of dataframe to create pymatgen structure objects for matminer routines. Iterates through list of structural features from conf file and instantiates each structure; drops unused dataframe columns and returns the generated features dataframe Args:

df: (dataframe), dataframe containing the path of file to create pymatgen structure object which is under the structure_col column
Returns:
(dataframe), the generated features dataframe

Methods Summary

fit(df[, y])
retrieve_AFLOW(criteria, properties[, …])
retrieve_MDF(criteria[, anonymous, …])
retrieve_MPDS(criteria[, properties, …])
retrieve_citrine(criteria, properties, …) Gets a Pandas dataframe object from data retrieved from the Citrine API.
retrieve_mp(criteria[, properties, …]) Gets data from MP in a dataframe format.
transform(df[, y])

Methods Documentation

fit(df, y=None)[source]
retrieve_AFLOW(criteria, properties, files=None, request_size=10000, request_limit=0, index_auid=True)[source]
retrieve_MDF(criteria, anonymous=False, properties=None, unwind_arrays=True)[source]
retrieve_MPDS(criteria, properties=None, api_key=None, endpoint=None)[source]
retrieve_citrine(criteria, properties, common_fields, secondary_fields, print_properties_options, api_key)[source]

Gets a Pandas dataframe object from data retrieved from the Citrine API. Args:

criteria (dict): see get_data method for supported keys except
prop; prop should be included in properties.
properties ([str]): requested properties/fields/columns.
For example, [“Seebeck coefficient”, “Band gap”]. If unsure about the exact words, capitalization, etc try something like [“gap”] and “max_results”: 3 and print_properties_options=True to see the exact options for this field
common_fields ([str]): fields that are common to all the requested
properties. Common example can be “chemicalFormula”. Look for suggested common fields after a quick query for more info
secondary_fields (bool): if True, fields not included in properties
may be added to the output (e.g. references). Recommended only if len(properties)==1’
print_properties_options (bool): whether to print available options
for “properties” and “common_fields” arguments.
api_key: (str) Your Citrine API key, or None if
you’ve set the CITRINE_KEY environment variable

return: (object) Pandas dataframe object containing the results notes/bugs: criteria needs a dictionary, not specified in get_data() as mentioned,

and example on documentation webpage does not work. What to fix for dataframe integration into mastml?
retrieve_mp(criteria, properties=['band_gap', 'volume', 'density', 'formation_energy_per_atom'], index_mpid=True, api_key=None)[source]

Gets data from MP in a dataframe format. See api_link for more details. Args:

criteria (dict): (str/dict) see MPRester.query() for a description of this
parameter. String examples: “mp-1234”, “Fe2O3”, “Li-Fe-O’, “*2O3”. Dict example: {“band_gap”: {“$gt”: 1}}
properties ([str]): (list) see MPRester.query() for a description of this
parameter. Example: [“formula”, “formation_energy_per_atom”]
plus: “structure”, “initial_structure”, “final_structure”,

“bandstructure” (line mode), “bandstructure_uniform”, “phonon_bandstructure”, “phonon_ddb”, “phonon_bandstructure”, “phonon_dos”. Note that for a long list of compounds, it may

take a long time to retrieve some of these objects.
index_mpid (bool): (bool) Whether to set the materials_id as the dataframe
index.
api_key: (str) Your Materials Project API key, or None if you’ve
set up your pymatgen config.

Returns (pandas.Dataframe): containing results notes/bugs: works pretty great, API easy to use and accurate. What to fix for

dataframe integration into mastml?
transform(df, y=None)[source]