PP422     
Data Science for Public Policy

This information is for the 2024/25 session.

Teacher responsible

Dr Casey Kearney

Availability

This course is compulsory on the MPA in Data Science for Public Policy. This course is not available as an outside option.

Pre-requisites

Students must have completed Pre-Sessional Coding and Mathematics Bootcamp (PP407).

This will ensure that students have basic fluency in Maths and Statistics along with Python and its main Data Science libraries. 

Course content

This course covers the theory and practice of the Data Science project lifecycle in Python for Public Policy, from problem definition and data sourcing/cleaning to exploration, visualization, and modelling. Emphasis will be placed on identifying problems that are suitable for different Data Science techniques and on good practices for managing data. Linear and logistic models and regularization techniques will be covered in the AT and Machine Learning, Clustering and introductory text analysis models will be left for the WT. Key concepts and ideas underlying modelling (bias vs. variance, types of error, training vs. test data) and data ethics and data science ethics will be illustrated and implemented with examples from healthcare, education, urban policy, international development, and other policy areas. By the end of the course, students will have a strong coding workflow and will be able to source and experiment with data for analysis and research, both individually and in a collaborative environment.

Teaching

15 hours of lectures and 15 hours of seminars in the AT. 15 hours of lectures and 15 hours of seminars in the WT.

Formative coursework

Students will be expected to produce weekly problem sets throughout the AT and WT.

Indicative reading


These books provide an excellent starting point and can be used as the main reference for many topics. A full reading list will be provided at the beginning of the course.

  1. James, Gareth, et al. An introduction to statistical learning: With applications in python. Springer Nature, 2023.
  2. Chen, Jeffrey C., Edward A. Rubin, and Gary J. Cornwall. Data science for public policy. Springer, 2021.
  3. Géron, Aurélien. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow. " O'Reilly Media, Inc.", 2022.
  4. Müller, Andreas C., and Sarah Guido. Introduction to machine learning with Python: a guide for data scientists. " O'Reilly Media, Inc.", 2016.
  5. Wilke, Claus O. Fundamentals of data visualization: a primer on making informative and compelling figures. O'Reilly Media, 2019.

Assessment

Exam (40%, duration: 3 hours, reading time: 15 minutes) in the spring exam period.
Coursework (30%) in the AT and WT.
Group presentation (30%) in the WT.

Coursework is comprised of weekly coding notebooks to be completed by the student and in-class participation. Students will also prepare a group presentation and take a final exam for the course. 

Key facts

Department: School of Public Policy

Total students 2023/24: 18

Average class size 2023/24: 18

Controlled access 2023/24: No

Value: One Unit

Guidelines for interpreting course guide information

Course selection videos

Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.

Personal development skills

  • Leadership
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Specialist skills