Overview of MAST-ML tutorials and examples¶

MAST-ML tutorials¶

There are numerous MAST-ML tutorial and example Jupyter notebooks. These notebooks can be found in the mastml/examples folder. Here, a brief overview of the contents of each tutorial is provided:

Tutorial 1: Getting Started (MASTML_Tutorial_1_GettingStarted.ipynb):

Tutorial 1 link: https://colab.research.google.com/github/uw-cmg/MAST-ML/blob/master/examples/MASTML_Tutorial_1_GettingStarted.ipynb

In this notebook, we will perform a first, basic run where we:

Import example data of Boston housing prices

Define a data preprocessor to normalize the data

Define a linear regression model and kernel ridge model to fit the data

Evaluate each of our models with 5-fold cross validation

Add a random forest model to our run and compare model performance

Tutorial 2: Data Import and Cleaning (MASTML_Tutorial_2_DataImport.ipynb):

Tutorial 2 link: https://colab.research.google.com/github/uw-cmg/MAST-ML/blob/master/examples/MASTML_Tutorial_2_DataImport.ipynb

In this notebook, we will learn different ways to download and import data into a MAST-ML run:

Import model datasets from scikit-learn

Conduct different data cleaning methods

Import and prepare a real dataset that is stored locally

Download data from various materials databases

Tutorial 3: Feature Generation and Selection (MASTML_Tutorial_3_FeatureEngineering.ipynb):

Tutorial 3 link: https://colab.research.google.com/github/uw-cmg/MAST-ML/blob/master/examples/MASTML_Tutorial_3_FeatureEngineering.ipynb

In this notebook, we will learn different ways to generate, preprocess, and select features:

Generate features based on material composition

Generate one-hot encoded features based on group labels

Preprocess features to be normalized

Select features using an ensemble model-based approach

Generate learning curves using a basic feature selection approach

Select features using forward selection

Tutorial 4: Model Fits and Data Split Tests (MASTML_Tutorial_4_Models_and_Tests.ipynb):

Tutorial 4 link: https://colab.research.google.com/github/uw-cmg/MAST-ML/blob/master/examples/MASTML_Tutorial_4_Models_and_Tests.ipynb

In this notebook, we will learn how to run a few different types of models on a select dataset, and conduct a few different types of data splits to evaluate our model performance. In this tutorial, we will:

Run a variety of model types from the scikit-learn package

Run a bootstrapped ensemble of neural networks

Compare performance of scikit-learn’s gradient boosting method and XGBoost

Compare performance of scikit-learn’s neural network and Keras-based neural network regressor

Compare model performance using random k-fold cross validation and leave out group cross validation

Explore the limits of model performance when up to 90% of data is left out using leave out percent cross validation

Tutorial 5: Left-out data, Nested cross-validation, and Optimized models (MASTML_Tutorial_5_NestedCV_and_OptimizedModels.ipynb):

Tutorial 5 link: https://colab.research.google.com/github/uw-cmg/MAST-ML/blob/master/examples/MASTML_Tutorial_5_NestedCV_and_OptimizedModels.ipynb

In this notebook, we will perform more advanced model fitting routines, including nested cross validation and hyperparameter optimization. In this tutorial, we will learn how to use MAST-ML to:

Assess performance on manually left-out test data

Perform nested cross validation to assess model performance on unseen data

Optimize the hyperparameters of our models to create the best model

Tutorial 6: Model Error Analysis, Uncertainty Quantification (MASTML_Tutorial_6_ErrorAnalysis_UncertaintyQuantification.ipynb):

Tutorial 6 link: https://colab.research.google.com/github/uw-cmg/MAST-ML/blob/master/examples/MASTML_Tutorial_6_ErrorAnalysis_UncertaintyQuantification.ipynb

In this notebook tutorial, we will learn about how MAST-ML can be used to:

Assess the true and predicted errors of our model, and some useful measures of their statistical distributions

Explore different methods of quantifying and calibrating model uncertainties.

Compare the uncertainty quantification behavior of Bayesian and ensemble-based models.

Tutorial 7: Model predictions with calibrated error bars on new data, hosting on Foundry/DLHub (MASTML_Tutorial_7_ModelPredictions_with_CalibratedErrorBars_HostModelonFoundry.ipynb):

Tutorial 7 link: https://colab.research.google.com/github/uw-cmg/MAST-ML/blob/master/examples/MASTML_Tutorial_7_ModelPredictions_with_CalibratedErrorBars_HostModelonFoundry.ipynb

In this notebook tutorial, we will learn about how MAST-ML can be used to:

Fit a model and use it to predict on new data.

Use our model to predict on new data using only composition as input.

Use nested CV to obtain error bar recalibration parameters and get predictions with calibrated error bars.