Let’s say we made some predictions with a machine-learning model using scikit-learn.
We want to evaluate how our model performs, and create a confusion matrix:
from sklearn.metrics import confusion_matrix
## make predictions with the scikit-learn model on the test data set
y_preds = model.predict(X_test)
## Create confusion matrix on test data and predictions
cm = confusion_matrix(y_test, y_preds)
cm
You’ll get an array like this:
array([[24, 5],
[ 4, 28]])
We can visualize it with pandas:
import pandas as pd
pd.crosstab(y_test,
y_preds,
rownames=["Actual"],
colnames=["Predicted"])
The output looks similar like this:
Predicted | 0 | 1 |
---|---|---|
Actual | ||
0 | 24 | 5 |
1 | 4 | 28 |
What does that mean?
The table shows where the predictions and the actual data are the same and where they differ.
For example, there were 29 occurences with the answer 0
. In 24 cases, the predictions were on spot. The machine-learning model predicted 0
where the actual value was 0
(true negative). But in 5 cases the model predicted 1
where the actual data was 0
(false positive).
The model correctly predicted a 1
in 28 cases (true positive). But there are 4 cases where the model’s output was 0
, the actual value was 1
(false negative).
We want to show the confusion matrix as a plot using seaborn heatmap:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
## Visualize seaborn heatmap
def plot_conf_mat(confusion_matrix):
"""
Plots a confusion matrix using Seaborn's heatmap()
"""
fig, ax = plt.subplots()
ax = sns.heatmap(confusion_matrix, fmt='d', annot=True, cbar=False)
plt.xlabel('Predicted label')
plt.ylabel('Actual label');
Now, when you run the function (plot_conf_mat(cm)
), you should see a pretty graphic like the table above.