Overview of MAST-ML tutorials and examples¶
MAST-ML tutorials¶
There are numerous MAST-ML tutorial and example Jupyter notebooks. These notebooks can be found in the mastml/examples folder. Here, a brief overview of the contents of each tutorial is provided:
- Tutorial 1: Getting Started (MASTML_Tutorial_1_GettingStarted.ipynb):
- In this notebook, we will perform a first, basic run where we:
- Import example data of Boston housing prices
- Define a data preprocessor to normalize the data
- Define a linear regression model and kernel ridge model to fit the data
- Evaluate each of our models with 5-fold cross validation
- Add a random forest model to our run and compare model performance
- Tutorial 2: Data Import and Cleaning (MASTML_Tutorial_2_DataImport.ipynb):
- In this notebook, we will learn different ways to download and import data into a MAST-ML run:
- Import model datasets from scikit-learn
- Conduct different data cleaning methods
- Import and prepare a real dataset that is stored locally
- Download data from various materials databases
- Tutorial 3: Feature Generation and Selection (MASTML_Tutorial_3_FeatureEngineering.ipynb):
- In this notebook, we will learn different ways to generate, preprocess, and select features:
- Generate features based on material composition
- Generate one-hot encoded features based on group labels
- Preprocess features to be normalized
- Select features using an ensemble model-based approach
- Generate learning curves using a basic feature selection approach
- Select features using forward selection
- Tutorial 4: Model Fits and Data Split Tests (MASTML_Tutorial_4_Models_and_Tests.ipynb):
In this notebook, we will learn how to run a few different types of models on a select dataset, and conduct a few different types of data splits to evaluate our model performance. In this tutorial, we will:
- Run a variety of model types from the scikit-learn package
- Run a bootstrapped ensemble of neural networks
- Compare performance of scikit-learn’s gradient boosting method and XGBoost
- Compare performance of scikit-learn’s neural network and Keras-based neural network regressor
- Compare model performance using random k-fold cross validation and leave out group cross validation
- Explore the limits of model performance when up to 90% of data is left out using leave out percent cross validation
- Tutorial 5: Left-out data, Nested cross-validation, and Optimized models (MASTML_Tutorial_5_NestedCV_and_OptimizedModels.ipynb):
In this notebook, we will perform more advanced model fitting routines, including nested cross validation and hyperparameter optimization. In this tutorial, we will learn how to use MAST-ML to:
- Assess performance on manually left-out test data
- Perform nested cross validation to assess model performance on unseen data
- Optimize the hyperparameters of our models to create the best model
- Tutorial 6: Model Error Analysis, Uncertainty Quantification (MASTML_Tutorial_6_ErrorAnalysis_UncertaintyQuantification.ipynb):
- In this notebook tutorial, we will learn about how MAST-ML can be used to:
- Assess the true and predicted errors of our model, and some useful measures of their statistical distributions
- Explore different methods of quantifying and calibrating model uncertainties.
- Compare the uncertainty quantification behavior of Bayesian and ensemble-based models.