Research_Fest_4628

Research Showcase 2024

Thursday 20 June - Friday 21 June 

Lee Kuan Yew Room, 5th floor, Arundel House
6 Temple Place, London WC2R 2PG

Please note: the location of this event has been changed from the Marshall Building (LSE Campus) to Arundel House (near Temple Station).

This two-day in-person event is an opportunity to find out more about the Department's research. It takes the form of a series of short talks and will provide an overview of the research activities of the Department's four research groups: Data Science, Probability in Finance and Insurance, Social Statistics, and Time Series and Statistical Learning.

The presentations will be accompanied by a poster session on the evening of 20 June, combined with a reception.

This event takes place from 10.00am to 8.00pm on Thursday 20 June and from 10.00am to 3.30pm on Friday 21 June 2024. 

Please register your place!

Thursday 20 June

9.57am-10.00am Zoltan Szabo
Opening Words

10.00am-10.30am Wicher Bergsma (Social Statistics)
Model-based estimation of a Gaussian covariance or precision kernel

Considering a Gaussian setting, a variety of useful models involve linear restrictions on the covariance kernel or the precision matrix. A key example is graphical models involving patterns of zeroes in the precision matrix. Alternatively, stationary Gaussian distributions involve linear restrictions on the covariance kernel, or, equivalently, the precision kernel. Furthermore, covariate information can be encoded via linear restrictions, in order to improve both estimation and understanding of the population distribution.

As a mathematical framework for sets of linearly restricted positive definite kernels, incorporating the aforementioned examples, we introduce a class of families of reproducing kernel Krein spaces. For each family, a generalized Wishart/inverse-Wishart prior can serve as a prior on the convex cone of positive definite kernels, allowing an (empirical) Bayes estimator for the covariance or precision kernel. This approach also addresses the difficulty of ensuring that the estimated covariance/precision kernel is positive definite.

10.30am-11.00am Tom Dorrington Ward (Engage Smarter)
Evaluating and assuring AI Agents for financial services

The arrival of ChatGPT in November 2022 started a new wave of “generative AI” applications. One area with huge potential for generative AI to make a difference is in helping people to make better financial decisions. However, using AI Agents to provide consumer financial guidance requires assurance: there are risks with providing incorrect guidance and advice is a regulated activity. In this short talk, Tom Dorrington Ward, CTO & Co-Founder of Engage Smarter AI will survey emerging techniques – including AI architectures, and evaluation processes – for evaluating and assuring AI Agents. He will also highlight elements which make expert financial guidance a particularly complex use case. Finally, he will describe how Engage Smarter AI’s own framework for evaluating and assuring AI Agents in financial services brings together these different elements.  

11.00am-11.30am Ieva Kazlauskaite (Data Science - from August)
Calculating exposure to extreme sea level risk will require high resolution ice sheet models

The West Antarctic Ice Sheet (WAIS) is losing ice and its annual contribution to sea level is increasing. The future behaviour of WAIS will impact societies worldwide, yet deep uncertainty remains in the expected rate of ice loss. High-impact low-likelihood scenarios of sea level rise are needed by risk-averse stakeholders but are particularly difficult to constrain. In this work, we combine traditional model simulations of the Amundsen Sea sector of WAIS with Gaussian process emulation to show that ice-sheet models capable of resolving kilometre-scale basal topography will be needed to assess the probability of extreme scenarios of sea-level rise. This resolution exceeds many state-of-the-art continent-scale simulations. Our ice-sheet model simulations show that coarser resolutions tend to project higher sea-level contributions than finer resolutions, inflating the tails of the distribution. We therefore caution against relying purely upon simulations coarser than 4-5km when assessing the potential for societally important high-impact sea level rise. 

(11.30am-12.00pm Break)

12.00pm-12.30pm Despoina Makariou (St. Gallen)
Estimation of heterogeneous treatment effects in the primary catastrophe bond market using causal forests

A causal machine learning approach for estimating heterogeneous treatment effects in the primary catastrophe bond market 

We introduce a causal random forest approach to predict treatment heterogeneity in alternative capital markets. We focus on predicting the effect of issuance timing in the spreads of an insurance linked security called catastrophe bond. Studying the issuance timing is important for optimising the cost of capital and ensuring the success of the bond offering. We construct a causal random forest and we find that issuing a catastrophe bond in the first half of a calendar year is associated to a lower spread and this result varies according to several factors such as market conditions, type of the underlying asset and size of the issuance. 

12.30pm-1.00pm Zoltan Szabo (Data Science)
Minimax Rate of HSIC Estimation

Kernel techniques (such as Hilbert-Schmidt independence criterion - HSIC; also called distance covariance) are among the most powerful approaches in data science and statistics to measure the statistical independence of M ≥ 2 random variables. Despite various existing HSIC estimators designed since its introduction close to two decades ago, the fundamental question of the rate at which HSIC can be estimated is still open; this forms the focus of the talk for translation-invariant kernels on R^d. [This is joint work with Florian Kalinke. Preprint: https://arxiv.org/abs/2403.07735

(1.00pm-2.30pm Lunch)

2.30pm-3.00pm Yining Chen (Data Science / Time Series and Statistical Learning)
Detecting changes in production frontiers

3.00pm-3.30pm Kostas Kardaras (Probability in Finance and Insurance)
Equilibrium models of production and capacity expansion

We consider a model with producers making decisions on how much to produce and how much to invest in expansion of capacity of future production. With demand functions exogenously given, we study a multi-agent setting where prices are formed within equilibrium. Depending on the form of the production function, this leads to either a singular or standard control problem. The solutions to the latter are either given explicitly, or characterised via a second-order non-linear ODE. (Based on works with Junchao Jia, Alexander Pavlis and Michael Zervos.) 

3.30pm-4.00pm Dima Karamshuk (Meta)
Content Moderation at Scale – Protecting Integrity of Online Communities on Meta Platforms

To enable content moderation on large social media platforms it is important to timely detect harmful viral content. The detection problem is difficult because content virality results from interactions between user interests, content characteristics, feed ranking, and community structure. 

This talk will shed the light on the design of the algorithms which can efficiently solve this problem at Meta scale. 

(4.00pm-4.30pm Break)

4.30pm-5.00pm Giulia Livieri (Probability in Finance and Insurance)
On Mean Field Games and Applications

Mean field games theory is a branch of game theory, namely a set of concepts, mathematical tools, theorems, and algorithms, which, like all game theory, helps (micro- or macro-) economists, sociologists, engineers, and even urban planners, to model situations of agents who take decisions in a context of strategic interactions. In this talk, I will introduce mean field games through some “toy models” to progressively discover the concepts and the mathematics behind this theory.  I will possibly conclude with the presentation of some very preliminary results on a mean field game model of shipping, where the model is also calibrated on real data (co-authors: Michele Bergami, Simone Moawad, Barath Raaj Suria Narayanan (PG students, LSE); Evan Chien Yi Chow (ADIA); Charles-Albert Lehalle). 

5.00pm-5.30pm Chengchun Shi (Data Science / Time Series and Statistical Learning)
Switchback designs can enhance policy evaluation in reinforcement learning

Time series experiments, in which experimental units receive a sequence of treatments over time, are prevalent in technological companies, including ride-sharing platforms and trading companies. These companies frequently employ such experiments for A/B testing, to evaluate the performance of a newly developed policy, product, or treatment relative to a baseline control. Many existing solutions require that the experimental environment be fully observed to ensure the data collected satisfies the Markov assumption. This condition, however, is often violated in real-world scenarios. Such gap between theoretical assumptions and practical realities challenges the reliability of existing approaches and calls for more rigorous investigations of A/B testing procedures.  

In this paper, we study the optimal experimental design for A/B testing in partially observable environments. We introduce a controlled (vector) autoregressive moving average model to effectively capture a rich class of partially observable environments. Within this framework, we derive closed-form expressions, i.e., efficiency indicators, to assess the statistical efficiency of various sequential experimental designs in estimating the average treatment effect (ATE). A key innovation of our approach lies in the introduction of a weak signal assumption, which significantly simplifies the computation of the asymptotic mean squared errors of ATE estimators in time series experiments. We next proceed to develop two data-driven algorithms to estimate the optimal design: one utilizing constrained optimization, and the other employing reinforcement learning. We demonstrate the superior performance of our designs using a dispatch simulator and two real datasets from a ride-sharing company. 

6.00pm-8.00pm Poster Session & Reception

Friday 21 June

10.00am-10.30am Joshua Loftus (Data Science)
Model-agnostic explanation tools and their limitations

Tools for interpretable machine learning or explainable artificial intelligence can be used to audit algorithms for fairness or other desired properties. In a "black-box" setting--one without access to the algorithm's internal structure--the methods available to an auditor may be model-agnostic. These methods are based on varying inputs while observing differences in outputs, and include some of the most popular interpretability tools like Shapley values and Partial Dependence Plots. Such explanation methods have important limitations. Moreover, their limitations can impact audits with consequences for outcomes such as fairness. This talk will highlight key lessons that regulators, auditors, or other users of model-agnostic explanation tools must keep in mind when interpreting their output. Although we focus on a selection of tools for interpretation and on fairness as an example auditing goal, our lessons generalize to many other applications of model-agnostic explanations. These tools are increasing in popularity, which makes understanding their limitations an important research direction. That popularity is driven largely by their ease of use and portability. In high-stakes settings like an audit, however, it may be worth the extra work to use tools based on causal modeling that can incorporate background information and be tailored to each specific application. 

10.30am-11.00am Anoushka Gupta (Illuma Technology)
Application of Contextual Advertising in a Cookieless World

Anoushka Gupta, an alumna of the MSc Data Science program, Class of 2022, is currently a Senior Data Analyst at Illuma Technology Ltd, a pioneering British AI company specializing in contextual ad targeting. Illuma's innovative technology operates without relying on cookies or identifiers, instead using real-time insights from audience browsing behaviour to identify relevant new audiences at scale. 
 
In her upcoming talk at LSE, Anoushka will delve into the application of Illuma's technology in a cookieless world. She will discuss how Illuma leverages advanced AI to optimize advertising campaigns across the EMEA region. Anoushka's expertise in data science plays a crucial role in enhancing campaign performance, ensuring advertisements reach relevant audiences effectively and efficiently. Her insights will shed light on how businesses can navigate the challenges posed by the phasing out of cookies, demonstrating the potential of AI-driven solutions for successful ad targeting without traditional tracking methods. 

11.00am-11.30am Sara Geneletti (Social Statistics)
Using an interrupted time series design to understand the impact of austerity measures in the UK

This talk gives an overview of the research projects I am currently involved in. The Welcome grant aims to quantitatively assess the impact of austerity policies, including cuts to welfare such as the introduction of Universal credit on mental health and the impact of the hostile environment policy on the mental health of minority communities in England. This research uses the Understanding society data set.  The ESRC grant uses causal inference methods such as causal DAGs to formally describe and explore how racial inequalities affects sentencing and remand. This research uses HMCTS data which is collected for every instance of a court appearance. 

(11.30am-12.00pm Break)

12.00pm-12.30pm Xuewen Yu (Social Statistics)
Causal inference in continuous time

Despite randomised control trials are desired to assess real-world medication use, they are costly and sometime unethical. With increasingly available real-world data, e.g. ‘minute-by-minute' electronic health records, causal inference methods like g-methods have become crucial in evaluating treatment effects in observational studies. G-methods, including inverse probability weighting with marginal structural models, parametric g- formula, and g-estimation for structural nested models, are well-developed for longitudinal data, where changes in treatment and confounding occur at a grid of time points common to all individuals. But real-world scenarios often involve sporadic changes at irregular intervals. Although several continuous-time g-methods have been proposed, literature is dispersed and involves technical complexities. This talk will give a summary of these methods and demonstrate their application using the UK ‘Towards A CurE for rheumatoid arthritis’ cohort data.

12.30pm-1.00pm Qiwei Yao (Time Series and Statistical Learning)
Autoregressive dynamic networks

We give a brief introduction on the autoregressive (AR) model for dynamic network processes. The model depicts the dynamic changes explicitly. It also facilitates simple and efficient statistical inference such as MLEs and a permutation test for model diagnostic checking. We illustrate how this AR model can serve as a building block to accommodate more complex structures such as stochastic latent blocks, change-points. We also elucidate how some stylized features often observed in real network data, including node heterogeneity, edge sparsity, persistence, transitivity and density dependence, can be embedded in the AR framework. Then the framework needs to be extended for dynamic networks with dependent edges, which poses new technical challenges. Illustration with real network data for the practical relevance of the proposed AR framework is also presented. 

(1.00pm-2.30pm Lunch)

2.30pm-3.00pm Kostas Kalogeropoulos (Data Science / Social Statistics / Time Series and Statistical Learning)
Bayesian sequential learning for hidden semi-Markov models

In this work, we explore the class of the hidden semi-Markov models (HSMMs), a flexible extension of the popular hidden Markov models (HMMs) that allows the underlying stochastic process to be a semi-Markov chain. HSMMs are typically used less frequently than HMMs due to the increased computational challenges in the evaluation of the likelihood function. Moreover, despite both families of models being sequential in nature, existing inference methods mainly target batch data settings.  We address these issues by developing a computational scheme for Bayesian inference on HSMMs that allows for (1) estimation in a computationally feasible time, (2) in an exact manner, i.e. only subject to Monte Carlo error, and (3) in a sequential setting. Additionally, we explore the performance of HSMMs in two settings: a financial time series application on the VIX index, and stochastic epidemic models on data from COVID-19 pandemic. In both cases we demonstrate how the developed methodology can be used for tasks such  as regime switching, model selection and clustering purposes.

3.00pm-3.30pm Tengyao Wang (Data Science / Time Series and Statistical Learning)
Multiple-output composite quantile regression via optimal transport

Composite quantile regression has been used to obtain robust estimators of regression coefficients in linear models with good statistical efficiency. By revealing an intrinsic link between the composite quantile regression loss function and the Wasserstein distance from the residuals to the set of quantiles, we establish a generalization of the composite quantile regression to the multiple-output settings. Theoretical convergence rates of the proposed estimator are derived both under the setting where the additive error possesses only a finite q-th moment (for q > 2) and where it exhibits a sub-Weibull tail. In doing so, we develop novel techniques for analyzing the M-estimation problem that involves Wasserstein-distance in the loss. Numerical studies confirm the practical effectiveness of our proposed procedure. 

How to find us:

 

Have a question? Please contact the event organisers:
Professor Zoltan Szabo
Penny Montague