Overview of MAST-ML tutorials and examples¶
MAST-ML tutorials¶
There are numerous MAST-ML tutorial and example Jupyter notebooks. These notebooks can be found in the mastml/examples folder. Here, a brief overview of the contents of each tutorial is provided:
Tutorial 1: Getting Started (MASTML_Tutorial_1_GettingStarted.ipynb):
Tutorial 1 link: https://colab.research.google.com/github/uw-cmg/MAST-ML/blob/master/examples/MASTML_Tutorial_1_GettingStarted.ipynb
- In this notebook, we will perform a first, basic run where we:
Import example data of Boston housing prices
Define a data preprocessor to normalize the data
Define a linear regression model and kernel ridge model to fit the data
Evaluate each of our models with 5-fold cross validation
Add a random forest model to our run and compare model performance
Tutorial 2: Data Import and Cleaning (MASTML_Tutorial_2_DataImport.ipynb):
Tutorial 2 link: https://colab.research.google.com/github/uw-cmg/MAST-ML/blob/master/examples/MASTML_Tutorial_2_DataImport.ipynb
- In this notebook, we will learn different ways to download and import data into a MAST-ML run:
Import model datasets from scikit-learn
Conduct different data cleaning methods
Import and prepare a real dataset that is stored locally
Download data from various materials databases
Tutorial 3: Feature Generation and Selection (MASTML_Tutorial_3_FeatureEngineering.ipynb):
Tutorial 3 link: https://colab.research.google.com/github/uw-cmg/MAST-ML/blob/master/examples/MASTML_Tutorial_3_FeatureEngineering.ipynb
- In this notebook, we will learn different ways to generate, preprocess, and select features:
Generate features based on material composition
Generate one-hot encoded features based on group labels
Preprocess features to be normalized
Select features using an ensemble model-based approach
Generate learning curves using a basic feature selection approach
Select features using forward selection
Tutorial 4: Model Fits and Data Split Tests (MASTML_Tutorial_4_Models_and_Tests.ipynb):
Tutorial 4 link: https://colab.research.google.com/github/uw-cmg/MAST-ML/blob/master/examples/MASTML_Tutorial_4_Models_and_Tests.ipynb
In this notebook, we will learn how to run a few different types of models on a select dataset, and conduct a few different types of data splits to evaluate our model performance. In this tutorial, we will:
Run a variety of model types from the scikit-learn package
Run a bootstrapped ensemble of neural networks
Compare performance of scikit-learn’s gradient boosting method and XGBoost
Compare performance of scikit-learn’s neural network and Keras-based neural network regressor
Compare model performance using random k-fold cross validation and leave out group cross validation
Explore the limits of model performance when up to 90% of data is left out using leave out percent cross validation
Tutorial 5: Left-out data, Nested cross-validation, and Optimized models (MASTML_Tutorial_5_NestedCV_and_OptimizedModels.ipynb):
Tutorial 5 link: https://colab.research.google.com/github/uw-cmg/MAST-ML/blob/master/examples/MASTML_Tutorial_5_NestedCV_and_OptimizedModels.ipynb
In this notebook, we will perform more advanced model fitting routines, including nested cross validation and hyperparameter optimization. In this tutorial, we will learn how to use MAST-ML to:
Assess performance on manually left-out test data
Perform nested cross validation to assess model performance on unseen data
Optimize the hyperparameters of our models to create the best model
Tutorial 6: Model Error Analysis, Uncertainty Quantification (MASTML_Tutorial_6_ErrorAnalysis_UncertaintyQuantification.ipynb):
Tutorial 6 link: https://colab.research.google.com/github/uw-cmg/MAST-ML/blob/master/examples/MASTML_Tutorial_6_ErrorAnalysis_UncertaintyQuantification.ipynb
- In this notebook tutorial, we will learn about how MAST-ML can be used to:
Assess the true and predicted errors of our model, and some useful measures of their statistical distributions
Explore different methods of quantifying and calibrating model uncertainties.
Compare the uncertainty quantification behavior of Bayesian and ensemble-based models.
Tutorial 7: Model predictions with calibrated error bars on new data, hosting on Foundry/DLHub (MASTML_Tutorial_7_ModelPredictions_with_CalibratedErrorBars_HostModelonFoundry.ipynb):
- In this notebook tutorial, we will learn about how MAST-ML can be used to:
Fit a model and use it to predict on new data.
Use our model to predict on new data using only composition as input.
Use nested CV to obtain error bar recalibration parameters and get predictions with calibrated error bars.