Skip to main content

Applications for Summer School 2025 will open in late November. Please sign-up to our mailing list for updates.

Sign up for updates

ME204: Data Engineering for the Social World

Subject Area: Research Methods, Data Science, and Mathematics

Course details

  • Department
    Data Science Institute
  • Application code
    SS-ME204

Apply

Applications are closed

We are not currently accepting applications for this course. Register your interest below to be notified when applications open again.

Overview

Data science has unlocked exciting possibilities for social scientists through its diverse toolkit, including big data analysis, visualisation, and machine learning models, enabling them to extract valuable insights from their data. 

Yet, the success of a data-driven project hinges on data quality. This is where data engineering plays a pivotal role. Professionals must ensure that their acquired data is sufficient and accurate and must be adaptable to handle 'messy data' effectively.

A substantial portion of time in data-driven projects (anecdotally 80%) is dedicated to cleaning and preprocessing data, with only 20% said to be devoted to building, evaluating, and deploying machine learning models. Despite the emergence of new AI technologies, which promise to automate many coding tasks, data manipulation is likely to remain an indispensable skill due to the inherent messiness of real-world data.

By the end of this course, you will be proficient in producing stunning web reports and visual dashboards to display your collected data and showcase your newly acquired data-wrangling abilities.

Key information

Prerequisites: Students should already be familiar with computer programming at an introductory level (variables, if-else, loops, functions). If you are not using R, we strongly encourage you to familiarise yourself before the start of the course. Suggestions: R for Data Science book, chapters 1-8.

Level: 200 level. Read more information on levels in our FAQs

Fees: Please see Fees and payments

Lectures: 36 hours

Classes: 18 hours

Assessment: A mid-term problem set (25%) and a final project (75%). 

Typical credit: 3-4 credits (US) 7.5 ECTS points (EU)

Please note: Assessment is optional but may be required for credit by your home institution. Your home institution will be able to advise how you can meet their credit requirements. For more information on exams and credit, read Teaching and assessment

Is this course right for you?

This course is ideal for those seeking a hands-on experience with a data science project, whether you want to pursue a career in data science or to experience the data science way of doing things. It is also recommended if you want to strengthen your programming skills. This course will also be relevant if you are starting an MSc or MBA programme of study and wish to learn introductory concepts in the area.

Outcomes

Aims of this course:

Develop the skills to collect public data from the Web or from APIs, connect multiple data sources and build dashboards to communicate insights obtained from data.

Learning Objectives:

In this course, you will learn the fundamentals of data engineering, including:

  • Reasoning about the structure and format of data
  • Collecting data from real websites and APIs
  • Best practices for efficient data storage
  • Basics of the SQL language
  • Tools available in the programming language R for data pre-processing and reshaping
  • Using AI tools (ChatGPT and GitHub Copilot) to write and debug code efficiently
  • Organizing data into a 'tidy' format, suitable for future analysis
  • Conducting exploratory data analysis, including static and dynamic visualisations
  • Building simple websites to report and communicate your findings effectively

Content

Prachin Patel, India

I enjoyed that the course was practical. All of the theory we learned in lectures was then applied in classes, and the reinforcement of the ideas really helped me to learn.

Faculty

The design of this course is guided by LSE faculty, as well as industry experts, who will share their experience and in-depth knowledge with you throughout the course.

Jonathan Cardoso-Silva

Dr Jonathan Cardoso-Silva

Assistant Professor (Education)

Department

The Data Science Institute (DSI) forms the institutional cornerstone of data science activity at the London School of Economics and Political Science. Working alongside the academic departments across the School, the DSI's mission is to foster the study of data science and new forms of data with a focus on their social, economic, and political aspects.

The DSI aims to host, facilitate and promote research in social and economic data science through an annual programme of seminars, workshops and research projects delivered by a range of academic experts and research students.

Apply

Applications are closed

We are not currently accepting applications for this course. Register your interest below to be notified when applications open again.