MA429 Half Unit
Algorithmic Techniques for Data Mining
This information is for the 2017/18 session.
Teacher responsible
Professor Gregory Sorkin
Availability
This course is available on the MSc in Applicable Mathematics, MSc in Marketing and MSc in Operations Research & Analytics. This course is available as an outside option to students on other programmes where regulations permit.
The course will be capped to 36 students.
Pre-requisites
Students are not permitted to take this course alongside ST443, Machine Learning and Data Mining.
Students must have knowledge of Statistics and the programming language R to the level of ST447, Data Analysis and Statistical Methods.
Course content
Data Mining is an interdisciplinary field developed over the last three decades. Vast quantities of data are available today in all areas of business, science, and technology. The main goal of data mining is to extract previously unknown, useful information from such massive scale data. The aim of the course is to equip the students with a theoretically founded and practically applicable knowledge of data mining. The theoretical foundations of the field come from mathematics, statistics, computer science and artificial intelligence.
The course introduces fundamental machine learning methods and algorithms for basic data analytics problems. These methods include algorithms for classification and regression problems, such as tree construction, support vector machines, nearest-neighbour methods, Bayesian networks. The course will also cover unsupervised learning methods such as association rule mining association rule mining and clustering.
The methods are illustrated on practical problems arising from various fields. The course will use data mining packages in R.
Teaching
20 hours of lectures and 15 hours of seminars in the LT. 2 hours of lectures in the ST.
Formative coursework
There will be weekly homework assignments, some of which will be submitted for formative feedback, and some for summative assessment (10% of the course mark). A mock project will be given, as preparation for the summative group project.
Indicative reading
James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning: with Applications in R (2016)
Torgo, Data Mining with R: Learning with Case Studies (2010)
Hastie, Tibshirani, Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction, Second Edition (2009)
Assessment
Exam (40%, duration: 2 hours) in the main exam period.
Project (50%) in the ST.
Coursework (10%) in the LT.
Key facts
Department: Mathematics
Total students 2016/17: Unavailable
Average class size 2016/17: Unavailable
Controlled access 2016/17: No
Value: Half Unit
Personal development skills
- Self-management
- Team working
- Problem solving
- Application of information skills
- Communication
- Application of numeracy skills
- Commercial awareness
- Specialist skills