Matminer¶
-
class
mastml.legos.feature_generators.
Matminer
(structural_features, structure_col)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Class to generate structural features from matminer structure module Args:
structural_features: the structure feature(s) the user wants to instantiate and generate structure_col: the dataframe column that contains the pymatgen structure object. Matminer needs a pymatgen structure object in order to instantiate the structural feature- Methods:
fit: pass through, needed to maintain scikit-learn class structure Args:
df: (dataframe), dataframe of input x and y datatransform: main method that iterates through rows of dataframe to create pymatgen structure objects for matminer routines. Iterates through list of structural features from conf file and instantiates each structure; drops unused dataframe columns and returns the generated features dataframe Args:
df: (dataframe), dataframe containing the path of file to create pymatgen structure object which is under the structure_col column- Returns:
- (dataframe), the generated features dataframe
Methods Summary
fit
(df[, y])retrieve_AFLOW
(criteria, properties[, …])retrieve_MDF
(criteria[, anonymous, …])retrieve_MPDS
(criteria[, properties, …])retrieve_citrine
(criteria, properties, …)Gets a Pandas dataframe object from data retrieved from the Citrine API. retrieve_mp
(criteria[, properties, …])Gets data from MP in a dataframe format. transform
(df[, y])Methods Documentation
-
retrieve_AFLOW
(criteria, properties, files=None, request_size=10000, request_limit=0, index_auid=True)[source]¶
-
retrieve_citrine
(criteria, properties, common_fields, secondary_fields, print_properties_options, api_key)[source]¶ Gets a Pandas dataframe object from data retrieved from the Citrine API. Args:
- criteria (dict): see get_data method for supported keys except
- prop; prop should be included in properties.
- properties ([str]): requested properties/fields/columns.
- For example, [“Seebeck coefficient”, “Band gap”]. If unsure about the exact words, capitalization, etc try something like [“gap”] and “max_results”: 3 and print_properties_options=True to see the exact options for this field
- common_fields ([str]): fields that are common to all the requested
- properties. Common example can be “chemicalFormula”. Look for suggested common fields after a quick query for more info
- secondary_fields (bool): if True, fields not included in properties
- may be added to the output (e.g. references). Recommended only if len(properties)==1’
- print_properties_options (bool): whether to print available options
- for “properties” and “common_fields” arguments.
- api_key: (str) Your Citrine API key, or None if
- you’ve set the CITRINE_KEY environment variable
return: (object) Pandas dataframe object containing the results notes/bugs: criteria needs a dictionary, not specified in get_data() as mentioned,
and example on documentation webpage does not work. What to fix for dataframe integration into mastml?
-
retrieve_mp
(criteria, properties=['band_gap', 'volume', 'density', 'formation_energy_per_atom'], index_mpid=True, api_key=None)[source]¶ Gets data from MP in a dataframe format. See api_link for more details. Args:
- criteria (dict): (str/dict) see MPRester.query() for a description of this
- parameter. String examples: “mp-1234”, “Fe2O3”, “Li-Fe-O’, “*2O3”. Dict example: {“band_gap”: {“$gt”: 1}}
- properties ([str]): (list) see MPRester.query() for a description of this
- parameter. Example: [“formula”, “formation_energy_per_atom”]
- plus: “structure”, “initial_structure”, “final_structure”,
“bandstructure” (line mode), “bandstructure_uniform”, “phonon_bandstructure”, “phonon_ddb”, “phonon_bandstructure”, “phonon_dos”. Note that for a long list of compounds, it may
take a long time to retrieve some of these objects.- index_mpid (bool): (bool) Whether to set the materials_id as the dataframe
- index.
- api_key: (str) Your Materials Project API key, or None if you’ve
- set up your pymatgen config.
Returns (pandas.Dataframe): containing results notes/bugs: works pretty great, API easy to use and accurate. What to fix for
dataframe integration into mastml?