Why SQL is the Essential Data Science Language Everyone Needs to Know

Speak to any data scientist, and they’ll tell you about the cool stuff they’re doing in python, R, building neural net models across multiple clusters or training an image classification algorithm to identify Gen 1 Pokémon.

Rarely do they actually speak about the work that goes into getting that data into a workable format, and all the effort dealing with huge volumes of data.

In most corporate environments, that work is done with SQL, or ‘structured query language’. You’ve probably heard of it, but thought that it’s an aging language, no one wants it anymore in this new and wonderful world of fancy new data science tools.

Thing is, SQL isn’t going anywhere, at least not anytime soon. You’ll want it on your CV, that’s for sure.

In this post, we’re going to just quickly give you an explanation of why this language is still incredibly beneficial to have in your toolbox as a data science professional.

Ubiquity

SQL is the most commonly used database language. Some ridiculous proportion of companies that have any sort of data, will have warehouses and data lakes, basically just how they store their data. Chances are, you’ll be using some form of SQL to query that data.

If you do data, you’ll use SQL in one way or another.

It’s very in demand

TIOBE – the software quality company – have an index that ranks programming languages by popularity (loose definition of programming). Currently in 2019, SQL is ranked 9th, above ‘popular’ languages such as R and Ruby, and way ahead of SAS (all the way down in 22nd). Companies need people who have this skill – data is ever more important and valuable, and if you can work with it, you’re already a more valuable prospect than someonewho cant.

Cool stuff starts with data

Whatever sort of job you do, whether that’s building machine learning models or tableau visualizations, it all starts with the data. If you’re able to work with the data and wrangle, manipulate it into a workable format, you can iterate so much quicker and provide so much more value.

It’s supported by the big companies

Google has BigQuery, Microsoft has Azure, Amazon has AWS. They all support the use of SQL within their systems. Microsoft even has SQL server, a particular system that they market and support, which other companies such as Amazon make use of.

If they’re using it, you should be using it too.

It’s SO easy to read

SQL is an incredibly intuitive language to read, understand, and write. It writes as you would think. Look at the following very simple snippet of SQL code:

SELECT

name, age, income, expenses

FROM

Accounting_table

WHERE

Age > 21

Line by line, you’re telling the processor what you want to retrieve from the ‘accounting_table’. SQL works with keywords that are nice and simple, fantastically understandable. You just put a few of them together, in the correct (but fairly natural) order, and bam, results!

So there.

SQL is a simple, widely used language that forms a core part of your data science work. You may look down on it as not being as complex as other languages, but we think that’s a relative bonus in some cases. Sometimes, a potential employer may actually be suspicious if you claim to know a lot of great data science techniques but haven’t worked with SQL, as it might show that you’re only experienced in dealing with super clean datasets. In the normal work environment, nothing is ever as perfect as you’d like, and SQL is a fantastic tool to deal with some of that mess and turn it into a usable, workable dataset.


We’d love to get your opinion on the topic, so please, leave a comment below!