The Data Science Institute (DSI) proudly works alongside the academic departments across LSE to foster the study of data science with a focus on social, economic, and political aspects.
The Department of Statistics is an example of one of these active academic departments and is home to internationally respected experts in statistics and data science. One of these experts is Dr Joshua Loftus, whose research in social and economic data science is the focus of this data science spotlight.
Joshua, who recently became a DSI Affiliate, is interested in improving practices in data science and machine learning. The aim of this research is to reduce the impact of bias, particularly biases associated with social harms and scientific reproducibility. Joshua outlines this in greater detail below.
The only data science worth doing is good data science. But what does it mean to be good? Should it use rigorous mathematics and the fastest and most scalable algorithms, accurately model the real world, and get us closer to some economic or social goals? In experience we often find we can't have it all. For example, a mathematically justified algorithm may perform worse on real data than one that we do not understand as well. How should we decide?
Doing good data science requires making judgment calls on these and other philosophically challenging questions. But as a new discipline, data science has not yet developed the sort of conventions for ethical training and practice that exist in other professions such as medicine, engineering, or statistics. The Engineers' Creed of the National Society of Professional Engineers includes the pledge, "To place service before profit, the honor and standing of my profession before personal advantage, and the public welfare above all other considerations." Contrast this to Mark Zuckerberg's (supposedly former) motto of "move fast and break things." We can see the consequences in news headlines and our own lives: as the information economy innovates and disrupts, it can also scale up existing social problems and create new ones along the way.
My research and teaching are dedicated to changing this. I work on methods that do not try to minimize the amount of human judgment involved, but rather make those judgments transparent and subject to scrutiny and discourse. With my collaborators I have developed definitions and methods to understand the fairness and discrimination of algorithmic decision processes by focusing on causality and causal inference. It may take some time, but I am hopeful that regulations for information systems can be better informed by an understanding of their causes and effects.
I'm a pragmatist who believes we can evolve institutions and culture for the better, and that's the purpose of education. I have taught courses from introductory to advanced statistics and machine learning, and in all these I emphasize ethical practice and values like reproducibility and social beneficence. Recently I have been developing an exciting new Ethical Data Science course at LSE. The course will draw on knowledge about statistics, philosophy, and social science, including from my own research. Students will learn from historical and current case studies, dive deep into the philosophical aspects of data science methods and applications, and apply rigorous and quantitative reasoning to their ethical implications.
I am fortunate and grateful to be able to work on such topics. It's a great responsibility, and I am honored and even a bit daunted by its importance. I believe all data scientists have a responsibility to do their jobs ethically. But this responsibility is also an opportunity, because everyone can find great meaning in actively engaging with the questions of how to live a good life and do good work.