make_prediction

mastml.mastml_predictor.make_prediction(X_test, model, X_test_extra=None, preprocessor=None, calibration_file=None, featurize=False, featurizer=None, features_to_keep=None, featurize_on=None, **kwargs)[source]

Method used to take a saved preprocessor, model and calibration file and output predictions and calibrated uncertainties on new test data

Args:
X_test: (pd.DataFrame or str), dataframe of featurized test data to be used to make prediction, or string of path

containing featurized test data in .xlsx or .csv format ready for import with pandas. Only the features used to fit the original model should be included, and they should be in the same order as the training data used to fit the original model.

model: (str), path of saved model in .pkl format (e.g., RandomForestRegressor.pkl)

X_test_extra: (pd.DataFrame, list or str), dataframe containing the extra data associated with X_test, or a

list of strings denoting extra columns present in X_test not to be used in prediction. If a string is provided, it is interpreted as a path to a .xlsx or .csv file containing the extra column data

preprocessor: (str), path of saved preprocessor in .pkl format (e.g., StandardScaler.pkl)

calibration_file: path of file containing the recalibration parameters (typically recalibration_parameters_average_test.xlsx)

featurize: (bool), whether or not featurization of the provided X_test data needs to be performed

featurizer: (str), string denoting a mastml.feature_generators class, e.g., “ElementalFeatureGenerator”

features_to_keep: (list), list of strings denoting column names of features to keep for running model prediction

featurize_on: (str), string of column name in X_test to perform featurization on

**kwargs: additional key-value pairs of parameters for feature generator, e.g., composition_df=composition_df[‘Compositions’] if

running ElementalFeatureGenerator

Returns:
pred_df: (pd.DataFrame), dataframe containing column of model predictions (y_pred) and, if applicable, calibrated uncertainties (y_err).

Will also include any extra columns denoted in extra_columns parameter.