spotlight

Dr Julio Amador Diaz Lopez

Assistant Professorial Lecturer, LSE Data Science Institute

Simply doing data science does not necessarily translate into doing good data science.

Dr Julio Amador Diaz Lopez

As businesses, governments and academic institutions increasingly rely on data when making decisions, a key issue for these bodies is addressing the skills gap that exists in the field of data science. 

The LSE Data Science Institute (DSI) is proud to offer new undergraduate courses for 2021/22 as part of a range of data science study opportunities at LSE that equip students, researchers and executives with the skills they need to tackle business, science and social questions from a data perspective. 

One of these new DSI undergraduate modules is DS202 - Data Science for Social Scientists, which is led by Dr Julio Amador Diaz Lopez.

This module extends the foundation of probability and statistics with an introduction to the most important concepts in data science and applied machine learning, with social science examples.

In this data science spotlight, Julio offers a reflection on teaching DS202 so far, asking the question "What makes a good data scientist?".

Julio_Amador-Diaz-Lopez

What makes a good data scientist?

Dr Julio Amador Diaz Lopez


Nowadays, data science is everywhere.

As businesses, governments and academic institutions increasingly rely on data-driven decision-making, many people from different backgrounds and formations now consider themselves data scientists.

However, as is the case with everything, simply doing data science does not necessarily translate into doing good data science. 

So, what makes good data science? I think that this can be explained in three points:

Understand your task.
The main problem with data science is that, quite often, people do not understand what to do. Many practitioners use the 'spaghetti on the wall' procedure: they throw all the spaghetti and simply see what sticks. However, this method is unlikely to even begin to address their problems, as it does not consider the question of 'what is my aim?'.

Understand your data.
Most of the time spent as a data scientist is spent wrangling with data. In fact, 80% of the work of a good data scientist is to 'get intimate' with your data. Knowing your data allows you to understand what types of questions you might be able to use it to answer. Most importantly however, it also allows you to understand which answers you cannot extract from it.

Have an intuition to what your algorithms are doing.
The reason for many practitioners to resort to using the 'spaghetti-on-the-wall' procedure, is because they know how to implement it with code. Indeed if they do it repeatedly, they might obtain some answers (although whether those are the right answers or the right questions is another story). As the old data-scientific adage says: if you torture data long enough, it will speak. Not having an intuition of what your algorithms are doing or how they operate misses the mark on applying data science.

These principles align with LSE's motto rerum cognoscere causas, taken from Virgil's Georgics. Its English translation is "to Know the Causes of Things" which provides the foundation to the work of a good data scientist. We have applied these principles to ouur new undergraduate course DS202 in order to educate in good data science. We have been very impressed with students' progress thus far.