Course details
- DepartmentData Science Institute
- Application codeSS-ME204
Apply
Applications are open
We are accepting applications. Apply early to avoid disappointment.
Overview
Data science has unlocked exciting possibilities for social scientists through its diverse toolkit, including big data analysis, visualisation, and machine learning models, enabling them to extract valuable insights from their data.
Yet, the success of a data-driven project hinges on data quality. This is where data engineering plays a pivotal role. Professionals must ensure that their acquired data is sufficient and accurate and must be adaptable to handle 'messy data' effectively.
A substantial portion of time in data-driven projects (anecdotally 80%) is dedicated to cleaning and preprocessing data, with only 20% said to be devoted to building, evaluating, and deploying machine learning models. Despite the emergence of new AI technologies, which promise to automate many coding tasks, data manipulation is likely to remain an indispensable skill due to the inherent messiness of real-world data.
By the end of this course, you will be proficient in producing stunning web reports and visual dashboards to display your collected data and showcase your newly acquired data-wrangling abilities.
Key information
Prerequisites: Students should already be familiar with computer programming at an introductory level (variables, if-else, loops, functions). If you are not using R, we strongly encourage you to familiarise yourself before the start of the course. Suggestions: R for Data Science book, chapters 1-8.
Level: 200 level. Read more information on levels in our FAQs
Fees: Please see Fees and payments
Lectures: 36 hours
Classes: 18 hours
Assessment: A mid-term problem set (25%) and a final project (75%).
Typical credit: 3-4 credits (US) 7.5 ECTS points (EU)
Please note: Assessment is optional but may be required for credit by your home institution. Your home institution will be able to advise how you can meet their credit requirements. For more information on exams and credit, read Teaching and assessment
Is this course right for you?
This course is ideal for those seeking a hands-on experience with a data science project, whether you want to pursue a career in data science or to experience the data science way of doing things. It is also recommended if you want to strengthen your programming skills. This course will also be relevant if you are starting an MSc or MBA programme of study and wish to learn introductory concepts in the area.
Outcomes
Aims of this course:
Develop the skills to collect public data from the Web or from APIs, connect multiple data sources and build dashboards to communicate insights obtained from data.
Learning Objectives:
In this course, you will learn the fundamentals of data engineering, including:
- Reasoning about the structure and format of data
- Collecting data from real websites and APIs
- Best practices for efficient data storage
- Basics of the SQL language
- Tools available in the programming language R for data pre-processing and reshaping
- Using AI tools (ChatGPT and GitHub Copilot) to write and debug code efficiently
- Organizing data into a 'tidy' format, suitable for future analysis
- Conducting exploratory data analysis, including static and dynamic visualisations
- Building simple websites to report and communicate your findings effectively
Content
Faculty
The design of this course is guided by LSE faculty, as well as industry experts, who will share their experience and in-depth knowledge with you throughout the course.
Dr Jonathan Cardoso-Silva
Assistant Professor (Education)
Department
The Data Science Institute (DSI) forms the institutional cornerstone of data science activity at the London School of Economics and Political Science. Working alongside the academic departments across the School, the DSI's mission is to foster the study of data science and new forms of data with a focus on their social, economic, and political aspects.
The DSI aims to host, facilitate and promote research in social and economic data science through an annual programme of seminars, workshops and research projects delivered by a range of academic experts and research students.
Join our mailing list
Sign up to get more information
Apply
Applications are open
We are accepting applications. Apply early to avoid disappointment.