Not available in 2020/21
ST115      Half Unit
Managing and Visualising Data

This information is for the 2020/21 session.

Teacher responsible

Prof Milan Vojnovic

Availability

This course is available on the BSc in Actuarial Science. This course is available with permission as an outside option to students on other programmes where regulations permit and to General Course students.

This course will be core (paper 3) on the new BSc in Data Science programme.

Course content

The primary focus of the course is on explaining the fundamental principles for effective manipulation and visualisation of data. This will cover the key steps of a data analytics pipeline, starting with formulation of a data science problem, going through manipulation and visualisation of data, and, finally, creating actionable insights. The topics covered include methods for data cleaning and transformation, manipulation of data using tabular data structures, relational database models, structured query languages (e.g. SQL), processing of various human-readable data formats (e.g. JSON and XML), data visualisation methods for explanatory data analysis, using various statistical plots such as histograms and boxplots, data visualisation plots for time series data, multivariate data, dimensionality reduction methods for visualisation of high-dimensional data, graph data visualisation methods, and metrics and plots for evaluation of accuracy of classification algorithms. 

The course will cover basic concepts and principles and will enable students to gain hands-on experience in using Python programming for manipulation and visualisation of data. This will include use of standard modules and libraries such as numpy, pandas, matplotlib, ggplot2, and sci-kit learn and programming environments such as Jupyter notebooks.

The course will use examples drawn from a wide range of applications, including those that arise in online services, social media, social networks, finance, and machine learning. The principles and methods learned will enable students to effectively derive insights from data and communicate results to end users.  

Teaching

20 hours of lectures and 15 hours of seminars in the LT.

Students are required to install Python on their own laptops and use their own laptops in the seminar sessions.

Students not having a laptop of their own, which can be used for the purpose of the course, will be offered to use personal computers available in seminar rooms.

Formative coursework

Students will be expected to produce 10 exercises in the LT.

Weekly exercises will be given, using Python and various libraries to apply various data manipulation and visualisation methods to data.  

Indicative reading


Essential Reading:

  1. W. Mckinney, Python for Data Analysis, 2nd Edition, O’Reilly 2017
  2. H. Wickham, Ggplot2: Elegant Graphics for Data Analysis, Springer, 2009
  3. A. C. Muller and S. Guido, Introduction to Machine Learning with Python, O’Reilly, 2016
  4. A. Geron, Hands-on Machine Learning with Scikit-Learn & TensorFlow, O’Reilly, 2017
  5. R. Ramakrishnan and J. Gehrke, Database Management Systems, 3rd Edition, McGraw Hill, 2002

Additional Reading: 

  1. NumPy, https://numpy.org/
  2. Python Data Analysis Library, https://pandas.pydata.org/
  3. Matplotlib, https://matplotlib.org
  4. Seaborn: statistical data visualization https://seaborn.pydata.org
  5. Sci-kit learn, Machine learning in Python, http://scikit-learn.org

Assessment

Coursework (40%) and project (60%) in the LT.

Students are required to hand in solutions to 4 sets of exercises using Python (or R), each accounting for 10% of the final assessment, and hand in a report for an individual project (accounting for 60% of the final assesment). The project consists of applying data manipulation and visualisation methods to a particular dataset. 

Important information in response to COVID-19

Please note that during 2020/21 academic year some variation to teaching and learning activities may be required to respond to changes in public health advice and/or to account for the situation of students in attendance on campus and those studying online during the early part of the academic year. For assessment, this may involve changes to mode of delivery and/or the format or weighting of assessments. Changes will only be made if required and students will be notified about any changes to teaching or assessment plans at the earliest opportunity.

Key facts

Department: Statistics

Total students 2019/20: Unavailable

Average class size 2019/20: Unavailable

Capped 2019/20: No

Value: Half Unit

Guidelines for interpreting course guide information

Personal development skills

  • Self-management
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Specialist skills