Overview of MAST-ML tutorials and examples¶

MAST-ML tutorials¶

There are numerous MAST-ML tutorial and example Jupyter notebooks. These notebooks can be found in the mastml/examples folder. Here, a brief overview of the contents of each tutorial is provided:

Tutorial 1: Getting Started (MASTML_Tutorial_1_GettingStarted.ipynb):

In this notebook, we will perform a first, basic run where we:

Import example data of Boston housing prices
Define a data preprocessor to normalize the data
Define a linear regression model and kernel ridge model to fit the data
Evaluate each of our models with 5-fold cross validation
Add a random forest model to our run and compare model performance

Tutorial 2: Data Import and Cleaning (MASTML_Tutorial_2_DataImport.ipynb):

In this notebook, we will learn different ways to download and import data into a MAST-ML run:

Import model datasets from scikit-learn
Conduct different data cleaning methods
Import and prepare a real dataset that is stored locally
Download data from various materials databases

Tutorial 3: Feature Generation and Selection (MASTML_Tutorial_3_FeatureEngineering.ipynb):

In this notebook, we will learn different ways to generate, preprocess, and select features:

Generate features based on material composition
Generate one-hot encoded features based on group labels
Preprocess features to be normalized
Select features using an ensemble model-based approach
Generate learning curves using a basic feature selection approach
Select features using forward selection

Tutorial 4: Model Fits and Data Split Tests (MASTML_Tutorial_4_Models_and_Tests.ipynb):

In this notebook, we will learn how to run a few different types of models on a select dataset, and conduct a few different types of data splits to evaluate our model performance. In this tutorial, we will:

Run a variety of model types from the scikit-learn package

Run a bootstrapped ensemble of neural networks

Compare performance of scikit-learn’s gradient boosting method and XGBoost

Compare performance of scikit-learn’s neural network and Keras-based neural network regressor

Compare model performance using random k-fold cross validation and leave out group cross validation

Explore the limits of model performance when up to 90% of data is left out using leave out percent cross validation

Tutorial 5: Left-out data, Nested cross-validation, and Optimized models (MASTML_Tutorial_5_NestedCV_and_OptimizedModels.ipynb):

In this notebook, we will perform more advanced model fitting routines, including nested cross validation and hyperparameter optimization. In this tutorial, we will learn how to use MAST-ML to:

Assess performance on manually left-out test data

Perform nested cross validation to assess model performance on unseen data

Optimize the hyperparameters of our models to create the best model

Tutorial 6: Model Error Analysis, Uncertainty Quantification (MASTML_Tutorial_6_ErrorAnalysis_UncertaintyQuantification.ipynb):

In this notebook tutorial, we will learn about how MAST-ML can be used to:

Assess the true and predicted errors of our model, and some useful measures of their statistical distributions
Explore different methods of quantifying and calibrating model uncertainties.
Compare the uncertainty quantification behavior of Bayesian and ensemble-based models.