Confusion Matrix

Confusion Matrix – Simple, Effective

Although it’s named the ‘Confusion’ matrix, this simple table is remarkably easy to understand, especially for binary classification problems.

Things can become a bit trickier when we extend the number of possible classes we’re trying to predict, but this is a performance measurement you’ll definitely want to know.

We’ll be discussing the principles of this table with a focus on the two class problem.

What is a confusion matrix?

When we are modelling a binary outcome, such as whether or not a person will respond No or Yes to a question, there are four possible scenarios in the accuracy of our model.

  • The actual result is No, and our model predicts No.
  • The actual result is No, but our model predicts Yes.
  • The actual result is Yes, but our model predicts No.
  • The actual result is Yes, and our model predicts Yes.

As you can see, there are two cases where the model predictions are correct.

Lets define Yes as our Positive result, and No as our Negative result.

When our model is correct, the prediction is True.  If the model is incorrect, the prediction is False.

Now lets revisit our four possible scenarios above.

  • Actual No, model No -> True Negative.
  • Actual No, model Yes -> False Positive.
  • Actual Yes, model No -> False Negative.
  • Actual Yes, model Yes -> True Positive

False positives are also known as Type 1 errors.

False negatives are Type 2 errors.

Below is a labelled confusion matrix.

We will go into some very helpful formulae that you can use to calculate performance metrics to assess your model in later posts, all using these values at their core.

But without going that far, it should be pretty obvious that this provides us with a great way of seeing where our model is performing well, and where it is performing not so well.

If we have no observations that our model is falsely predicting, then it is 100% accurate, both for negative and positive outcomes. This is the ideal situation, as long as it can continue the performance in real life!

If there are a lot of true positives, but also a lot of false positives, it may indicate that our model is too heavily weighted to label an observation as positive, at the detriment of overall performance.

Next Up – Precision and Recall

So that’s the confusion matrix. A very simple, but very helpful tool to visualise where our classification model is performing. Especially easy to use when there is a binary outcome!

We’ll take a deeper look into the uses of true and false, negative and positive results in the following posts. Precision and recall, these two will become your dear friends!

See you there!