ST309      Half Unit
Elementary Data Analytics

This information is for the 2021/22 session.

Teacher responsible

Prof Qiwei Yao Col.7.16

Availability

This course is available on the BSc in Accounting and Finance. This course is available as an outside option to students on other programmes where regulations permit. This course is available with permission to General Course students.

This course is available as an outside option to the students who are interested in data analytics and who have statistical background at least equivalent to ST107 or ST108. No prior knowledge in programming is required. However students who have no previous experience in R are required to take on an online pre-sessional R course from the Digital Skill Lab (https://moodle.lse.ac.uk/course/view.php?id=7022).

This course is capped at 60 for the 2019/20 session. 

This course cannot be taken with ST310 Machine Learning.

Pre-requisites

Students must have completed a statistical course at least equivalent to Quantitative Methods (Statistics) (ST107) or Statistical Methods for the Social Sciences (ST108).

Students who have no previous experience in R are required to take on an online pre-sessional R course from the Digital Skill Lab (https://moodle.lse.ac.uk/course/view.php?id=7745).

Course content

The primary focus of this course is to help students view various problems from business, economy/finance, and social domains from a data perspective and understand the principles of extracting useful information and knowledge from data. Students will also gain the hands-on experience using R -- a programming language and software environment for data analysis and visualisation. Learning basic data analytic methods and techniques is combined with real-life examples. 

The core contents of the course include data cleansing, data transformation, data visualisation, R-programming,  classification, regression, clustering, over-fitting avoidance and model evaluation. The course also covers a subset of the following topics: illustration of R-access of databases and big data platforms,  illustration of parallel computing in R, similarity matching, market-basket analysis, link prediction, text mining, network analysis, causal modelling. 

This is not a course on algorithms and IT technologies required for handling massive data, which deserve separate courses. The focus is on the fundamental principles and concepts of data analytics or data science. It becomes ever-increasingly important in this information age to gain adequate understanding of data science even if one never intends to apply it oneself.

Teaching

This course will be delivered through a combination of classes, lectures and Q&A sessions totalling a minimum of 30 hours in Michaelmas Term. This year, some of this teaching may be delivered through a combination of virtual classes and flipped-lectures delivered as short online videos.

Students are encouraged to install R in their own laptops, and to use their own laptops in the workshops.

 

Formative coursework

Students will be expected to produce 6 exercises in the MT.

Studeents are expected to complete siix sets of exercises involving substantial data analysis using R. 

Indicative reading

Wickham, H, and Grolemund, G. (2017). R for Data Science. O'Reilly. Available online at http://r4ds.had.co.nz

James, G., Witten, D., Hastie, T. and Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. Springer. Available online at http://www-bcf.usc.edu/~gareth/ISL

Provost, F. and Fawcett, T. (2013). Data Science for Business. O'Reilly. 


Zuur, A., Ieno, E. and Meesters, E. (2009). A Beginner’s Guide to R. Springer. Available online from LSE Library.

Hastie, T., Tibshirani, R and Friedman, R. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd Edition. Springer. Available online at https://web.stanford.edu/~hastie/Papers/ESLII.pdf

Silge, J. and Robinson, D. (2017). Text Mining with R: a tidy approach. O’Reilly. Available online at https://www.tidytextmining.com

Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer. Available online at http://moderngraphics11.pbworks.com/f/ggplot2-Book09hWickham.pdf

Assessment

Coursework (30%) in the MT.
Project (70%) in the LT.

The project will be a group project with maximum 3 members per group. The detailed instruction will be handed out in Week 5 of Michaelmas term, and students need to submit a written report by Week 5 of Lent term.

Students are required to hand in the solutions for 3 sets of exercises which account for the total 30% of the final grade.

Course selection videos

Some departments have produced short videos to introduce their courses. Please refer to the course selection videos index page for further information.

Student performance results

(2018/19 - 2020/21 combined)

Classification % of students
First 63.8
2:1 25.5
2:2 7.4
Third 3.4
Fail 0

Important information in response to COVID-19

Please note that during 2021/22 academic year some variation to teaching and learning activities may be required to respond to changes in public health advice and/or to account for the differing needs of students in attendance on campus and those who might be studying online. For example, this may involve changes to the mode of teaching delivery and/or the format or weighting of assessments. Changes will only be made if required and students will be notified about any changes to teaching or assessment plans at the earliest opportunity.

Key facts

Department: Statistics

Total students 2020/21: 64

Average class size 2020/21: 20

Capped 2020/21: Yes (70)

Value: Half Unit

Guidelines for interpreting course guide information

Personal development skills

  • Self-management
  • Problem solving
  • Application of information skills
  • Communication
  • Application of numeracy skills
  • Specialist skills