make_prediction
- mastml.mastml_predictor.make_prediction(X_test, X_train, y_train, model, preprocessor=None, calibration_file=None, featurizers=None, featurize_on=None, domain=None, composition_column=None, *args, **kwargs)[source]
Method used to take a saved preprocessor, model and calibration file and output predictions and calibrated uncertainties on new test data
- Args:
- X_test: (pd.DataFrame or str), dataframe of featurized test data to be used to make prediction, or string of path
containing featurized test data in .xlsx or .csv format ready for import with pandas. If passing an already featurized dataframe, only the features used to fit the original model should be included, and they should be in the same order as the training data used to fit the original model.
- X_train: (pd.DataFrame or str), dataframe of training data used to train original model, or string of path
containing featurized training data in .xlsx or .csv format ready for import with pandas. Used to extract the features used in training, to downselect from newly generated features on test data.
- y_train: (pd.DataFrame or str), dataframe of training target data used to train original model, or string of path
containing training target data in .xlsx or .csv format ready for import with pandas. Used to return the true value of a test data point if that point is present in the training data.
model: (str), path of saved model in .pkl format (e.g., RandomForestRegressor.pkl)
preprocessor: (str), path of saved preprocessor in .pkl format (e.g., StandardScaler.pkl)
calibration_file: path of file containing the recalibration parameters (typically recalibration_parameters_average_test.xlsx)
featurizers: (list), list of strings denoting paths to saved mastml feature generators, e.g., [“myfolder/ElementalFeatureGenerator.pkl”, “myfolder/PolynomialFeatureGenerator.pkl”]
- featurize_on: (list), list of strings of column name in X_test to perform featurization on, needs to be same length and in
same order as featurizers listed above, e.g., [‘Composition’, [‘feature1’, ‘feature2’] ]
domain: (list), list of strings denoting filenames of saved domain.pkl objects, e.g., [‘domain_gpr.pkl’]
composition_column: (str), string denoting name of X_test column denoting material compositions. Will be needed if assessing domain with “elemental” method.
- Returns:
- pred_df: (pd.DataFrame), dataframe containing column of model predictions (y_pred) and, if applicable, calibrated uncertainties (y_err).
Will also include any extra columns denoted in extra_columns parameter.