Baseline_tests

class mastml.baseline_tests.Baseline_tests[source]

Bases: object

Methods:
test_mean: Compares the score of the model with a constant test value
Args:

X: (dataframe), dataframe of X features

y: (dataframe), dataframe of y data

metrics: (list), list of metric names to evaluate true vs. pred data in each split

Returns:

A dataframe of the results of the model for the selected metrics

test_permuted: Compares the score of the model with a permuted test value
Args:

X: (dataframe), dataframe of X features

y: (dataframe), dataframe of y data

metrics: (list), list of metric names to evaluate true vs. pred data in each split

Returns:

A dataframe of the results of the model for the selected metrics

test_nearest_neighbour_kdTree: Compares the score of the model with the test value of the nearest neighbour found using kdTree
Args:

X: (dataframe), dataframe of X features

y: (dataframe), dataframe of y data

metrics: (list), list of metric names to evaluate true vs. pred data in each split

Returns:

A dataframe of the results of the model for the selected metrics

test_nearest_neighbour_cdist: Compares the score of the model with the test value of the nearest neighbour found using cdist
Args:

X: (dataframe), dataframe of X features

y: (dataframe), dataframe of y data

metrics: (list), list of metric names to evaluate true vs. pred data in each split

d_metric: Metric to use to calculate the distance. Default is euclidean

Returns:

A dataframe of the results of the model for the selected metrics

test_classifier_random: Compares the score of the model with a test value of a random class
Args:

X: (dataframe), dataframe of X features

y: (dataframe), dataframe of y data

metrics: (list), list of metric names to evaluate true vs. pred data in each split

Returns:

A dataframe of the results of the model for the selected metrics

test_classifier_dominant: Compares the score of the model with a test value of the dominant class (ie highest count)
Args:

X: (dataframe), dataframe of X features

y: (dataframe), dataframe of y data

metrics: (list), list of metric names to evaluate true vs. pred data in each split

Returns:

A dataframe of the results of the model for the selected metrics

print_results: Prints the comparison between the naive score and the real score
Args:

real_score: The actual score of the model

naive_score: The naive score of the model tested with fake_test

Methods Summary

test_classifier_dominant(X_train, X_test, ...)

test_classifier_random(X_train, X_test, ...)

test_mean(X_train, X_test, y_train, y_test, ...)

test_nearest_neighbour_cdist(X_train, ...[, ...])

test_nearest_neighbour_kdtree(X_train, ...)

test_permuted(X_train, X_test, y_train, ...)

to_excel(real_score, naive_score)

Methods Documentation

test_classifier_dominant(X_train, X_test, y_train, y_test, model, metrics=['mean_absolute_error'])[source]
test_classifier_random(X_train, X_test, y_train, y_test, model, metrics=['mean_absolute_error'])[source]
test_mean(X_train, X_test, y_train, y_test, model, metrics=['mean_absolute_error'])[source]
test_nearest_neighbour_cdist(X_train, X_test, y_train, y_test, model, metrics=['mean_absolute_error'], d_metric='euclidean')[source]
test_nearest_neighbour_kdtree(X_train, X_test, y_train, y_test, model, metrics=['mean_absolute_error'])[source]
test_permuted(X_train, X_test, y_train, y_test, model, metrics=['mean_absolute_error'])[source]
to_excel(real_score, naive_score)[source]