Model Accuracy

Welcome to our first model evaluations post of the series!

We’ll be starting with the most simple, perhaps most obvious technique; accuracy.

Model accuracy can be informally defined as the proportion of predictions our model gets right.

At first thought, this is all we need, right? If our model is accurate, then we can put it into production and pat ourselves on the back.

Turns out… things aren’t quite that simple.

Accuracy in Continuous Output

When we have an output on a continuous scale, such as predicting height, it’s actually pretty difficult to get a prediction exactly right.

Think about it, if your height is 182.5 cm and our model predicts you are 182 cm tall, our model will be classed as inaccurate. However, to only be 0.5 cm out from the exact result is a really good prediction! If we were only ever 0.5 cm out in every  prediction for height, this would most likely be a well performing model.

As you can see in the example above, the model goes reaaaaaaaaaally close to all the sample points we have, and the R-Squared (how much of the variation is explained by the model) of 96.54% is incredibly high. However, it doesn’t exactly fit all of the points, and therefore the overall accuracy is low.

Clearly, there is more to model performance than accuracy!

But that’s not all…

Accuracy in Classification Algorithms

In a classification problem, such as predicting whether or not someone clicks a link, or if a photo is of a cat or a dog, it’s a bit easier to decide whether or not the model is right or not.

It’s a simple yes/no problem.

However, consider the following situation:

We’re trying to predict a no outcome, but we know that the no outcome is a really rare event. In almost all cases, the outcome is a yes.

If we just set our model to only ever predict yes, we’d have a really accurate model!

But, when you actually consider the reality, we are not capturing any of the no events. We have a high accuracy model, but in actual fact, this model is useless.

 Possible Model Outcomes

In a binary classification problem, there are 4 possible outcomes:

  • Model predicts yes, true outcome is yes.
  • Model predicts yes, true outcome is no.
  • Model predicts no, true outcome is no.
  • Model predicts no, true outcome is yes.

As you can see, there’s more to it than just getting a high proportion of correctly predicted events!
We’ll move onto the mixture of true and false outcomes in our next post!