Baseline_tests

class mastml.baseline_tests.Baseline_tests[source]

Bases: object

Methods:

test_mean: Compares the score of the model with a constant test value

Args:

X: (dataframe), dataframe of X features

y: (dataframe), dataframe of y data

metrics: (list), list of metric names to evaluate true vs. pred data in each split

Returns:

A dataframe of the results of the model for the selected metrics

test_permuted: Compares the score of the model with a permuted test value

Args:

X: (dataframe), dataframe of X features

y: (dataframe), dataframe of y data

metrics: (list), list of metric names to evaluate true vs. pred data in each split

Returns:

A dataframe of the results of the model for the selected metrics

test_nearest_neighbour_kdTree: Compares the score of the model with the test value of the nearest neighbour found using kdTree

Args:

X: (dataframe), dataframe of X features

y: (dataframe), dataframe of y data

metrics: (list), list of metric names to evaluate true vs. pred data in each split

Returns:

A dataframe of the results of the model for the selected metrics

test_nearest_neighbour_cdist: Compares the score of the model with the test value of the nearest neighbour found using cdist

Args:

X: (dataframe), dataframe of X features

y: (dataframe), dataframe of y data

metrics: (list), list of metric names to evaluate true vs. pred data in each split

d_metric: Metric to use to calculate the distance. Default is euclidean

Returns:

A dataframe of the results of the model for the selected metrics

test_classifier_random: Compares the score of the model with a test value of a random class

Args:

X: (dataframe), dataframe of X features

y: (dataframe), dataframe of y data

metrics: (list), list of metric names to evaluate true vs. pred data in each split

Returns:

A dataframe of the results of the model for the selected metrics

test_classifier_dominant: Compares the score of the model with a test value of the dominant class (ie highest count)

Args:

X: (dataframe), dataframe of X features

y: (dataframe), dataframe of y data

metrics: (list), list of metric names to evaluate true vs. pred data in each split

Returns:

A dataframe of the results of the model for the selected metrics

print_results: Prints the comparison between the naive score and the real score

Args:

real_score: The actual score of the model

naive_score: The naive score of the model tested with fake_test

Methods Summary

`test_classifier_dominant`(X_train, X_test, ...)
`test_classifier_random`(X_train, X_test, ...)
`test_mean`(X_train, X_test, y_train, y_test, ...)
`test_nearest_neighbour_cdist`(X_train, ...[, ...])
`test_nearest_neighbour_kdtree`(X_train, ...)
`test_permuted`(X_train, X_test, y_train, ...)
`to_excel`(real_score, naive_score)

Methods Documentation

test_classifier_dominant(X_train, X_test, y_train, y_test, model, metrics=['mean_absolute_error'])[source]

test_classifier_random(X_train, X_test, y_train, y_test, model, metrics=['mean_absolute_error'])[source]

test_mean(X_train, X_test, y_train, y_test, model, metrics=['mean_absolute_error'])[source]

test_nearest_neighbour_cdist(X_train, X_test, y_train, y_test, model, metrics=['mean_absolute_error'], d_metric='euclidean')[source]

test_nearest_neighbour_kdtree(X_train, X_test, y_train, y_test, model, metrics=['mean_absolute_error'])[source]

test_permuted(X_train, X_test, y_train, y_test, model, metrics=['mean_absolute_error'])[source]

to_excel(real_score, naive_score)[source]