ST443 Half Unit
Machine Learning and Data Mining
This information is for the 2023/24 session.
Teacher responsible
Dr Xinghao Qiao
Availability
This course is compulsory on the MSc in Data Science. This course is available on the MPA in Data Science for Public Policy, MSc in Applied Social Data Science, MSc in Econometrics and Mathematical Economics, MSc in Geographic Data Science, MSc in Health Data Science, MSc in Quantitative Methods for Risk Management, MSc in Statistics, MSc in Statistics (Financial Statistics), MSc in Statistics (Financial Statistics) (Research), MSc in Statistics (Research), MSc in Statistics (Social Statistics) and MSc in Statistics (Social Statistics) (Research). This course is available with permission as an outside option to students on other programmes where regulations permit.
This course has a limited number of places (it is controlled access) and demand is typically high. This may mean that you’re not able to get a place on this course.
Pre-requisites
The course will be taught from a statistical perspective and students must have a very solid understanding of linear regression models
Students are not permitted to take this course alongside Algorithmic Techniques for Data Mining (MA429)
Course content
Machine learning and data mining are emerging fields between statistics and computer science which focus on the statistical objectives of prediction, classification and clustering and are particularly orientated to contexts where datasets are large, the so-called world of 'big data'. This course will start from the classical statistical methodology of linear regression and then build on this framework to provide an introduction to machine learning and data mining methods from a statistical perspective. Thus, machine learning will be conceived of as 'statistical learning', following the titles of the books in the essential reading list. The course will aim to cover modern non-linear methods such as spline methods, generalised additive models, decision trees, random forests, bagging, boosting and support vector machines, as well as more advanced linear approaches, such as ridge regression, the lasso, linear discriminant analysis, k-means clustering, nearest neighbours.
Teaching
The first part of the course reviews regression methods and covers, logsitic regression, linear and quadratic discriminant analysis, cross-validation, variable selection, nearest neighbours and shrinkage methods. The second part of the course introduces non-linear models and covers, splines, generalized additive models, tree methods, bagging, random forest, boosting, support vector machines, principal components analysis, k-means, hierarchical clustering.
This course will be delivered through a combination of classes, lectures and Q&A sessions totalling a minimum of 35 hours across Autumn Term. This course includes a reading week in Week 6 of Autumn Term.
Formative coursework
Students will be expected to produce 5 problem sets in the AT.
The problem sets will consist of some theory questions and data problems that require the implementation of different methods in class using a computer package.
Indicative reading
James, G., Witten, D., Hastie, T. and Tibshirani, R. An Introduction to Statistical Learning. 2nd Edition, Springer, 2021. Available online at https://www.statlearning.com/
Hastie, T., Tibshirani, R. and Friedman, J. The Elements of Statistical Learning: Data Mining, Inference and Prediction. 2nd Edition, Springer, 2009. Available online at http://statweb.stanford.edu/~tibs/ElemStatLearn/index.html
Assessment
Exam (70%, duration: 2 hours) in the spring exam period.
Project (30%) in the AT Week 11.
Student performance results
(2019/20 - 2021/22 combined)
Classification | % of students |
---|---|
Distinction | 52 |
Merit | 35.3 |
Pass | 8.1 |
Fail | 4.5 |
Key facts
Department: Statistics
Total students 2022/23: 78
Average class size 2022/23: 20
Controlled access 2022/23: Yes
Lecture capture used 2022/23: Yes (MT)
Value: Half Unit
Course selection videos
Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.
Personal development skills
- Self-management
- Team working
- Problem solving
- Application of information skills
- Communication
- Application of numeracy skills
- Specialist skills