Mastodon hachyterm.io

Let’s say we made some predictions with a machine-learning model using scikit-learn.

We want to evaluate how our model performs, and create a confusion matrix:

from sklearn.metrics import confusion_matrix

## make predictions with the scikit-learn model on the test data set
y_preds = model.predict(X_test)

## Create confusion matrix on test data and predictions
cm = confusion_matrix(y_test, y_preds)
cm

You’ll get an array like this:

array([[24,  5],
       [ 4, 28]])

We can visualize it with pandas:

import pandas as pd

pd.crosstab(y_test,
            y_preds,
            rownames=["Actual"],
            colnames=["Predicted"])

The output looks similar like this:

Predicted01
Actual
0245
1428

What does that mean?

The table shows where the predictions and the actual data are the same and where they differ.

For example, there were 29 occurences with the answer 0. In 24 cases, the predictions were on spot. The machine-learning model predicted 0 where the actual value was 0 (true negative). But in 5 cases the model predicted 1 where the actual data was 0 (false positive).

The model correctly predicted a 1 in 28 cases (true positive). But there are 4 cases where the model’s output was 0, the actual value was 1 (false negative).

We want to show the confusion matrix as a plot using seaborn heatmap:

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()

## Visualize seaborn heatmap

def plot_conf_mat(confusion_matrix):
    """
    Plots a confusion matrix using Seaborn's heatmap()
    """
    fig, ax = plt.subplots()
    ax = sns.heatmap(confusion_matrix, fmt='d', annot=True, cbar=False)
    plt.xlabel('Predicted label')
    plt.ylabel('Actual label');

Now, when you run the function (plot_conf_mat(cm)), you should see a pretty graphic like the table above.