# Confusion Matrix

## Confusion Matrix – Simple, Effective

Although it’s named the ‘Confusion’ matrix, this simple table is remarkably easy to understand, especially for binary classification problems.

Things can become a bit trickier when we extend the number of possible classes we’re trying to predict, but this is a performance measurement you’ll definitely want to know.

We’ll be discussing the principles of this table with a focus on the two class problem.

### What is a confusion matrix?

When we are modelling a binary outcome, such as whether or not a person will respond **No **or **Yes** to a question, there are four possible scenarios in the accuracy of our model.

- The actual result is
**No**, and our model predicts**No**. - The actual result is
**No**, but our model predicts**Yes**. - The actual result is
**Yes**, but our model predicts**No**. - The actual result is
**Yes**, and our model predicts**Yes**.

As you can see, there are two cases where the model predictions are correct.

Lets define **Yes** as our **Positive** result, and **No** as our **Negative **result.

When our model is correct, the prediction is **True**. If the model is incorrect, the prediction is **False.**

Now lets revisit our four possible scenarios above.

- Actual
**No**, model**No**->**True Negative**. - Actual
**No**, model**Yes**->**False Positive**. - Actual
**Yes**, model**No**->**False Negative**. - Actual
**Yes**, model**Yes**->**True Positive**

False positives are also known as *Type 1 errors.*

False negatives are *Type 2 errors*.

Below is a labelled confusion matrix.

We will go into some very helpful formulae that you can use to calculate performance metrics to assess your model in later posts, all using these values at their core.

But without going that far, it should be pretty obvious that this provides us with a great way of seeing where our model is performing well, and where it is performing not so well.

If we have no observations that our model is falsely predicting, then it is 100% accurate, both for negative and positive outcomes. This is the ideal situation, as long as it can continue the performance in real life!

If there are a lot of true positives, but also a lot of false positives, it may indicate that our model is too heavily weighted to label an observation as positive, at the detriment of overall performance.

### Next Up – Precision and Recall

So that’s the confusion matrix. A very simple, but very helpful tool to visualise where our classification model is performing. Especially easy to use when there is a binary outcome!

We’ll take a deeper look into the uses of true and false, negative and positive results in the following posts. Precision and recall, these two will become your dear friends!

See you there!