Statistics is a way of seeing things. It is not really a subject, but a way of looking at the world

Monday 12 and Tuesday 13 May 2025

Matrix- and Tensor-Valued Time-varying Main Effects Factor Model (MEFM): Identifiability, Sparsity, and Regression

Matrix- and tensor-valued factor models with explicit time-varying main effects capture fibre-specific weak factors and enhance interpretability in high-dimensional matrix and tensor time series (Lam and Cen, 2024). We propose two complementary frameworks: the Sparse Main Effect Factor Model (SMEFM), and the Main Effect generalized TEnsor Regression (METER) in two separate projects. New identification condition is introduced in both projects in order to cope with main effects sparsity. In SMEFM, we introduce a doubly adaptive fused lasso estimator (DAFL) that enforces sparsity and temporal coherence in the main-effects blocks, and we prove its consistency and oracle-style guarantees. Simulation and real data analysis are attempted. In METER, it generalises MEFM structure from matrix to general tensor time series, applying adaptive lasso penalties to jointly estimate sparse main effects and tensor factors, which serve as predictors in regression models for diverse outcome types, aiming at improved interpretability and predictive accuracy.

Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning

Reinforcement learning from human feedback (RLHF) has emerged as a key technique for aligning the output of large language models (LLMs) with human preferences. To learn the reward function, most existing RLHF algorithms use the Bradley-Terry model, which relies on assumptions about human preferences that may not reflect the complexity and variability of real-world judgments. In this paper, we propose a robust algorithm to enhance the performance of existing approaches under such reward model misspecifications. Theoretically, our algorithm reduces the variance of reward and policy estimators, leading to improved regret bounds. Empirical evaluations on LLM benchmark datasets demonstrate that the proposed algorithm consistently outperforms existing methods, with 77-81% of responses being favoured over baselines on the Anthropic Helpful and Harmless dataset.

Pricing Parisian Options Using Lattice-Based Methods

Parisian options are path-dependent options first introduced by Chesney et al. (1997), which are activated when the time spent by an underlying asset in a pre-defined excursion region exceeds some window length. This pricing problem is well studied and solved using Laplace transforms, recursion, Monte Carlo simulation, finite-difference and lattice-based methods. Most of the literature on Parisian option pricing problems consider at most two risky assets. If we wanted to formulate a Parisian pricing problem with m-assets such as basket options with a Parisian feature, then some of these methods might not be suitable. We consider a multi-asset lattice model which is combined with the forward-shooting grid algorithm to price Parisian options with m underlying assets.

Causal Dependence Plots for Fairness

Causal explainability is way to improve understanding of how machine learning models interact with the world by providing an explanation of how they depend causally on data inputs. This is particularly important for applications such as algorithmic fairness, recourse and scientific machine learning. Based on the causal limitations of partial dependence plots (PDPs), we develop causal variants called causal dependence plots (CDPs). CDPs visualize how a model’s predicted outcome depends on changes in a variable, along with consequent causal changes in other predictor variables. These plots are a generalization of PDPs and include measures of total, natural direct and natural indirect effects. Our experiments with simulations and real data experiments show CDPs can be combined in a modular way with methods for causal learning or sensitivity analysis. We demonstrate how in fairness applications CDPs provide more useful information than non-causal interpretability tools, and we encourage causal explainability more generally.

Change region detection on $d$-dimensional manifolds

While change point detection in time series data has been extensively studied, little attention has been given to its generalization to data observed on manifolds, where changes may occur within spatially complex regions with irregular boundaries, posing significant challenges. A new class of estimators is proposed to locate changes in the mean function of a signal-plus-noise model defined on d-dimensional spheres. This approach applies to scenarios with a single change region and multiple change regions. The convergence rate of the estimator is shown to depend on the VC dimension of the hypothesis class that characterizes the change regions. The results extend to data observed on d-dimensional manifolds under further assumptions. Simulations confirm the consistency of this approach, and the estimator's practical applicability is demonstrated through a global temperature dataset.

Discrete latent representations and transformer-based return forecasting

Modelling high-frequency limit order book (LOB) dynamics to predict future log-returns remains a core challenge in financial modelling. We address this by introducing a two-stage framework: first, we learn a compact, discrete representation of raw LOB events; then, we leverage a transformer-based sequence model to forecast returns. In the first stage, a vector-quantized autoencoder compresses tick-level LOB snapshots into a codebook of latent tokens. By jointly training the encoder and decoder, the model filters out high-frequency noise and distills the market’s underlying drivers. In the second stage, we train a transformer to predict future returns from historical token sequences. Our approach combines robust noise filtering with sequence modelling, offering a potential solution for return prediction high-frequency markets.

Bayesian Inference Methods for Quantifying Uncertainty, Designing Optimized Experiments, and Conducting Robust Dose-Response Modeling

Dose-response assessment is an important component of pharmacology, toxicology, and public health, but faces significant statistical challenges including sparse, noisy data, model uncertainty, biological heterogeneity, and the need for efficient experimentation and computation. We explore Bayesian inference to create more robust, efficient, and scalable solutions. Our work includes research to understand and simulate variability by characterizing noise structures for realistic simulation used in method validation; the development of a novel ensemble method that propagates uncertainty about model weights; exploration of the impact of experimental design, data handling decisions, and the sensitivity of results to prior specifications, using both simulation and real-world toxicology data; and non-parametric and machine learning methods for signal denoising and data generation.

Controlled Forward-Backward Dynamics in Interbank Systems

In this talk, I will introduce a dynamic model of interbank borrowing and lending that takes place due to targeted liquidity levels, akin to the work of Capponi, Sun, and Yao. Departing from the existing literature, we will consider a given finite horizon with a target for the terminal time. Banks are then borrowing or lending according to the expected deviations from their targets conditionally on the current information, leading to a system of forward-backward dynamics. On top of this, we finally devise a linear-quadratic control problem, whereby banks can adjust their drift at a cost. We first study the existence of Nash equilibria in the case of a finite network and then we proceed to discuss existence and uniqueness in the simplified mean field regime. Several interesting insights are obtained when comparing with control problems for corresponding forward-only systems.

Device-constrained offline policy optimization

We propose a novel offline reinforcement learning (RL) framework for adaptive deep brain stimulation (DBS) in Parkinson's disease, addressing key challenges of device constraints. Our method learns deterministic, non-smooth policies through trust-region optimization, with theoretical guarantees of cube-root asymptotics and super-efficiency.

Pairwise Comparisons without Stochastic Transitivity: Model, Theory and Applications

Most statistical models for pairwise comparisons, including the Bradley-Terry (BT) and Thurstone models and many extensions, make a relatively strong assumption of stochastic transitivity. This assumption imposes the existence of an unobserved global ranking among all the players/teams/items and monotone constraints on the comparison probabilities implied by the global ranking.

However, the stochastic transitivity assumption does not hold in many real-world scenarios of pairwise comparisons, especially games involving multiple skills or strategies. As a result, models relying on this assumption can have suboptimal predictive performance. In this paper, we propose a general family of statistical models for pairwise comparison data without a stochastic transitivity assumption, substantially extending the BT and Thurstone models.

In this model, the pairwise probabilities are determined by a (approximately) low-dimensional skew-symmetric matrix. Likelihood-based estimation methods and computational algorithms are developed, which allow for sparse data with only a small proportion of observed pairs. Theoretical analysis shows that the proposed estimator achieves minimax-rate optimality, which adapts effectively to the sparsity level of the data. The spectral theory for skew-symmetric matrices plays a crucial role in the implementation and theoretical analysis. The proposed method’s superiority against the BT model, along with its broad applicability across diverse scenarios, is further supported by simulations and real data analysis.

Application of Causal Inference Methods to Longitudinal Data

Causal inference has vast application in a variety of research areas, particularly those that deal with longitudinal data. Much of the literature thus far has focused on epidemiological contexts, however the goal of our project to see how applicable these methods are when treating with complex educational and criminological data. In preparation for this, a detailed knowledge of causal inference is required which will enable us to seek methods that can deal with the presence of issues that induce bias in the chosen causal effect measure. We will first discuss the basics of causal inference in relation to cross-sectional data, extending these ideas to a longitudinal setting via the use of causal directed acyclic graphs (DAGs). We will then proceed to explain the main methods that have been focused on thus far in our research - namely the g-formula and IP weighting - as well as the problems that they can help rectify. To conclude, a brief overview will be given on the data that we will apply these methods to, as well as the conclusions that we hope to draw from it.

Variable Selection for Gaussian Process Regression

The main thrust of this study is to develop a framework that uses Gaussian Process Regression (GPR) to select significant variables. This framework involves setting spike and slab priors on the inverse length-scale parameters in the Automatic Relevance Determination kernel, incorporating a binary inclusion parameter to determine if the variable is substantial enough to be included in the model. Firstly, we propose a novel Metropolis independence sampler with Laplace approximation, a method not previously used within GPR to sample from the posterior distribution. A simulation study is conducted to validate the predictive performance of the proposed method. To further address the challenge posed by multimodal posterior distributions, we develop a Sequential Monte Carlo method for parameter inference.

Autoregressive Hypergraphs

We propose a first‐order autoregressive (AR(1)) model for dynamic hypergraph processes that generalizes traditional network models to capture higher‐order interactions among multiple nodes. In this framework, hyperedges evolve over time while the node set remains fixed. Building on this, we introduce an AR(1) stochastic block model for hypergraphs in which latent communities are characterized by time‐varying transition probabilities. To uncover these communities, we develop a new spectral clustering algorithm based on the hypergraph Laplacian. We also integrate change‐point inference into the AR(1) hypergraph stochastic block model to detect potential structural shifts. Finally, applications to two real‐world datasets demonstrate the practical value and effectiveness of both the model and its associated inference techniques.

Estimating a Kernel Exponential Family

This talk addresses the problem of estimating a kernel exponential family (KEF) model within a Bayesian framework. We show that the natural parameter of a KEF in an infinite-dimensional reproducing kernel Hilbert space (RKHS) can be decomposed into a finite-dimensional projection and a residual component. We demonstrate that the essential information can be captured by the finite-dimensional projection using a finite number of weights. A prior is placed on these weights, and they are estimated via a maximum a posteriori (MAP) approach. We explore hyperparameter selection using various methods, including maximum marginal likelihood. The effectiveness of the proposed approach is demonstrated through both simulation studies and real data analysis.

Deep Learning Approaches for Short-Term Load Prediction

Accurate load forecasting is essential for ensuring the stability and efficiency of power systems, especially in the context of increasing renewable integration and dynamic consumer demand. This project explores data-driven approaches to short-term load prediction using the EDF dataset, which includes a mix of numerical and categorical features. We evaluate several machine learning models, including deep neural networks and transformer-based architectures, to capture both temporal patterns and complex feature interactions. Our results demonstrate that model performance can be significantly enhanced through appropriate feature engineering, model selection, and device-aware training strategies. The insights gained from this study provide practical guidance for deploying scalable and robust load forecasting systems in real-world energy markets.

Doubly Robust Preference Optimization for Large Language Models Fine-Tuning

Reinforcement Learning from Human Feedback (RLHF) is crucial for aligning Large Language Models (LLMs) with human preferences. 1 However, existing methods suffer from challenges such as potential reward model misspecification (e.g., reliance on the Bradley-Terry model), sensitivity and inconsistency in PPO-style policy optimization, and strong dependence on behaviour policy accuracy in DPO-style approaches. This paper introduces Double Robust Preference Optimization (DRPO), a novel RLHF framework inspired by the principles of doubly robust estimation. DRPO aims to mitigate these drawbacks by synergistically combining offline direct policy optimization from human preference data with on-policy optimization guided by a preference model, without mandating restrictive Bradley-Terry model assumptions. A key feature of DRPO is its enhanced robustness; under Bradley-Terry assumptions, it requires only accurate specification of either the reward function or the reference policy, not necessarily both, and it is designed to remain effective even when these underlying assumptions are violated. We propose the DRPO pipeline to enhance optimization stability and improve sample efficiency. Furthermore, we will provide a rigorous analysis of DRPO's statistical properties and validate its performance through extensive experiments on LLM benchmarks, thereby demonstrating the novel application of doubly robust techniques to LLM fine-tuning.

The joint distribution of Parisian and hitting times of the CIR process with application to digit Asian option pricing

study the joint law of Parisian time and hitting time of a Bessel Process by using a three-state semi-Markov model, obtained through perturbation. We obtain a martingale, to which we can apply the optional sampling theorem and derive the double Laplace transform. This general result is applied to address problems in option pricing. We introduce a new option related to Parisian options and Asian, being triggered when the age of an excursion exceeds a certain time or/and a barrier is hit. We obtain an explicit expression for the Laplace transform of its fair price;

Monday 10 and Tuesday 11 June 2024

Learning High-dimensional Latent Variable Models via Doubly Stochastic Optimisation by Unadjusted Langevin

Latent variable models are widely used in social and behavioural sciences, including education, psychology, and political science. High-dimensional latent variable models have become increasingly common for analysing large and complex data in recent years. The marginal maximum likelihood estimation of high-dimensional latent variables is computationally demanding due to the high complexity of integrals of the latent variables. Stochastic optimisation, which combines stochastic approximation and sampling techniques, has been proven powerful for tackling this computational challenge. This method iterates between two steps -- (1) sampling the latent variables from their posterior distribution determined by the current parameter estimate and (2) updating the fixed parameters using an approximate stochastic gradient constructed by plugging in the latent variable samples. In this paper, we propose a computationally more efficient stochastic optimisation algorithm. Improvement is achieved via two ingredients -- (1) using a minibatch of observations when sampling latent variables and constructing stochastic gradients, and (2) an unadjusted Langevin sampler that utilises the gradient of the negative log complete-data likelihood to sample latent variables. Theoretical results are established for the proposed algorithm, showing that the iterative parameter update converges to the marginal maximum likelihood estimate as the number of iterations goes to infinity. The proposed algorithm is shown to scale well to high-dimensional settings via simulation studies and a personality test application that involves 30,000 respondents, 300 items, and 30 latent dimensions.

A Latent Variable Approach to Learning High-dimensional Multivariate longitudinal Data

High-dimensional multivariate longitudinal data, which arise when many outcome variables are measured repeatedly over time are becoming increasingly common in social, behavioural and health sciences. We propose a latent variable model for drawing statistical inferences on covariate effects and predicting future outcomes based on high-dimensional multivariate longitudinal data. This model introduces unobserved factors to account for the between-variable and across-time dependence and assist the prediction. Statistical inference and prediction tools are developed under a general setting that allows outcome variables to be of mixed types and possibly unobserved for certain time points, for example, due to right censoring. A central limit theorem is established for drawing statistical inferences on regression coefficients. Additionally, an information criterion is introduced to choose the number of factors. The proposed model is applied to customer grocery shopping records to predict and understand shopping behaviour.

Matrix-valued Factor Model with Time-varying Main Effects

Factor models for matrix-valued time series have been well studied by researchers, where the existence of weak factors undermines the estimation of such latent factor structure. We propose a generalised matrix factor model with main effects that can potentially vary over time, so that weak factors in the form of row effect or column effect are featured explicitly, with the additional benefits of better interpretation. Convergence rates for estimators are spelt out, and asymptotic normality is constructed, which is used to feasibly test whether considering traditional factor models are sufficient or not. Our theoretical results are demonstrated by extensive simulation, and an NYC taxi traffic data set is also analysed.

Change Region Detection on d-Dimensional Spheres

Change point detection in time series data has been extensively studied, but little attention has been given to its generalization to higher dimensional spaces, where changes may occur in different regions with irregular boundaries, posing significant challenges. This paper introduces a method to locate changes in the mean function of a signal-plus-noise model on d-dimensional spheres. We find that the convergence rate depends on the VC dimension of the hypothesis class that characterizes the underlying change regions. Our results extend to data lying on manifolds, under the assumption of a single change region. Furthermore, we adapt the method to address scenarios with multiple change regions. Simulation studies confirm the consistency of our approach for both single and multiple change scenarios with varying mean values across regions.

Systemic risk among banks: an FBSDE approach

Systemic risk has become a more central research topic since the financial crisis, the default of one bank can lead to a contagion effect among banks and wipe out the whole existing system. Thus, it is crucial to monitor the risk and banks should learn how to act strategically. We proposed a model where each bank has a certain target, they want to meet at the final time T (say, end of year, which is suggested by Tobias Adrian and Hyun Song Shin (2008)). Before the final time, banks will try to borrow and lend money among themself and from the central bank based on the difference between the target and the final cash flow projected using today’s information subject to a quadratic cost. Our model can be treated as a generalization of the model put forward by Rene Carmona et al (2013), however, the crucial difference is we introduce a forward and backward stochastic differential equation (FBSDE) model instead of a stochastic differential equation (SDE). Due to the technicality of FBSDE, some important theorems are not in place. Prerequisite theorems and further results needed to be established will be discussed briefly.

Topics on the weak convergence of stochastic processes.

I shall introduce the concepts of weak error convergence and convergence in distribution of random variables taking values in Polish spaces – and especially sequences of Rd-valued continuous-path and càdlàg-path stochastic processes. As an application, I will present the theorem of continuous mapping for functionals of these random variables with notable examples from standard literature and my own research. In particular, I present the alpha-quantiles of stochastic processes as functionals with explicit continuity sets over the Skorokhod space. I shall then explain the concept of functional scaling limits with further examples, like the small-time functional central limit theorem of semi martingales of (Gerhold et al., 2015). I conclude by explaining the importance of weak errors and their rates of convergence in stochastic simulation for financial applications. I present notable examples, such as the killed diffusion studied in (Gobet, 1999) and (Cetin, Hok, 2022).

Autoregressive Networks: Sparsity and Degree Heterogeneity

This paper proposes a new dynamic network model with sparsity and degree heterogeneity. Sparsity means the expected edge density tending to 0, while degree heterogeneity captures the node-level difference in forming and dissolving an edge. We develop a new concentration inequality for the dependent sequence while the $\alpha$-mixing coefficient goes to 0 as the number of nodes $p$ goes to infinity.

We proposed a two-step estimation strategy, where the first step is the Maximum Likelihood Estimation by ignoring the degree heterogeneity structure, and the second step is an M-estimation based on the maximum likelihood estimator in the first step.

To evaluate the estimator's performance, we give theoretical results regarding the upper bound for the estimation error for the proposed estimator. The theory of the model is supported by extensive simulation.

Autoregressive Hypergraphs

We propose a first-order autoregressive (AR(1)) model for dynamic hypergraph processes, which extends traditional network models to accommodate higher-order interactions among multiple nodes. This model allows for dynamic changes in hyperedges over time while the nodes remain fixed. It facilitates efficient statistical inference with maximum likelihood estimators that are consistent and asymptotically normal, and model diagnostic checking can be conducted using permutation tests. The proposed model can apply to any Erdos-Renyi hypergraph processes with various underlying structures. As an illustration, an autoregressive stochastic block model for hypergraphs is explored in detail, characterising latent communities through time-varying transition probabilities. Such latent communities and be identified based on the hypergraph Laplacian.

Feature Effect and Importance Explainability Methods for Fairness and Discrimination: Evaluation and Applications

Tools for interpretable machine learning or explainable artificial intelligence can be used to audit algorithms for fairness or other desired properties. In a "black-box" setting--one without access to the algorithm's internal structure--the interpretation methods available to an auditor may only be model-agnostic methods. But these explanation methods have important limitations. We start with studying how such limitations can impact audits with important consequences for outcomes such as fairness. We highlight key lessons that regulators or auditors must keep in mind when interpreting the output of such model explanation tools. By using causal models for our simulated data, we can know the ground truth to compare to the model explanations. We demonstrate these lessons with a selection of the most popular interpretability tools including Shapley values and Partial Dependence Plots--as well as newer causal variants of both. We then show initial results for using machine learning and explanation methods to analyse UK sentencing discrimination, inspired by similar work on asylum cases in the US (Raman 2022). Future directions for this research look to use causal explainability methods for analysing UK sentencing discrimination and looking at model-specific feature effect and importance methods rather than just model-agnostic methods.

Generalized Fitted Q Iterations with Application in Cluster Data

In this study, we tackle the application of Reinforcement Learning (RL) to clustered data, which consists of offline data collected from a population segmented into several distinct clusters. Each cluster encapsulates a group of individuals exhibiting potentially correlated characteristics.

Recognizing the complexities introduced by clustered data structures and the limited data availability typical in healthcare research, we propose an innovative approach that integrates Generalized Estimating Equations (GEE) with Fitted-Q iterations. This integration aims to optimize the individual policy by accounting for intra-cluster correlations, thus enhancing the efficacy of interventions.

We show that, under certain conditions, this approach ensures the variance of the estimated Q function coefficients, adjusted for GEE correlations, remains competitive with or surpasses traditional estimations.

G-Computation: An Application of Causal Inference to Longitudinal Data

Causal inference in the realm of longitudinal data has been well studied up to this point and the errors that arise in using standard, regression-based methods to assess the strength of a given causal relationship are well known. In this presentation, I will be outlining the focus of my preliminary research – g-Computation. This is a sequential modelling and simulation technique designed to circumvent the biases that can arise when we condition on any confounders in our data. Details of how we can neatly summarise the direct causal relationship between the explanatory variables (Interventions) and the outcome variable via Marginal Structural Models will be discussed. Finally, the concept of dynamic regimes (Scenarios in which the intervention depends on past interventions and confounders, rather than being determined prior to any simulation) will be also presented.

Methods in Policy Evaluation and Learning in Reinforcement Learning

Reinforcement learning (RL) has emerged as an important machine learning framework, with its success in modeling and solving sequential decision problems. Several essential areas of implementations include games (like the game of Go), autonomous driving and financial investment. In this talk we begin with introducing the basic terminology in RL, then provide a quick review on both online and offline policy learning problems, and later discuss several current challenges. To deal with some of the highlighted challenges, we introduce the proposed new methods in both policy evaluation and learning.

Controlling the False Discovery Rate of Exploratory Factor Analysis Model

Exploratory factor analysis models aim to accurately identify indicators that load onto each latent trait. However, controlling the false discovery rate (FDR) for the loading matrix is a crucial issue in variable selection, as it aims to limit the proportion of falsely identified links between indicators and latent factors. The proposed method, based on mirror statistics, overcomes limitations of popular methods, such as low power and the requirement for accurate estimation of the marginal distribution of latent factors.

The method uses data splitting and applies different estimation methods to each subset of data, which provides theoretical guarantees for controlling FDR when the true loading matrix is sparse. To further stabilize variable selection, multiple data splitting is used, and results are aggregated using e-values.

Numerical experiments demonstrate the effectiveness of the proposed method in controlling FDR while achieving high statistical power. We further illustrate the method using an application to the Big Five personality assessment. This method has broad applicability to a wide range of studies that use exploratory factor analysis models.

Estimating a kernel exponential family

Kernel Exponential Family (KEF) estimation offers a novel approach to estimating exponential family densities by targeting the mean value parameter rather than the natural parameter. Unlike traditional density estimation methods that use kernel smoothing, our method focuses on the kernel mean embedding of the KEF probability density. This allows us to derive an estimator for the natural parameter as a weighted sum of reproducing kernel Hilbert space (RKHS) basis functions, where the weights are determined by an I-prior. This approach results in a smoother estimation, particularly in regions with sparse observations, compared to conventional kernel smoothing density estimation techniques.

Topics on the weak convergence of stochastic processes.

I shall introduce the concepts of weak error convergence and convergence in distribution of random variables taking values in Polish spaces – and especially sequences of Rd-valued continuous-path and càdlàg-path stochastic processes. As an application, I will present the theorem of continuous mapping for functionals of these random variables with notable examples from standard literature and my own research. In particular, I present the alpha-quantiles of stochastic processes as functionals with explicit continuity sets over the Skorokhod space. I shall then explain the concept of functional scaling limits with further examples, like the small-time functional central limit theorem of semimartingales of (Gerhold et al., 2015). I conclude by explaining the importance of weak errors and their rates of convergence in stochastic simulation for financial applications. I present notable examples, such as the killed diffusion studied in (Gobet, 1999) and (Cetin, Hok, 2022).

Variational Deep Learning

We introduce probabilistic activation functions in Bayesian neural networks (BNNs), which extend beyond the Gaussian assumptions of traditional BNNs. These probabilistic activation functions are retrieved by introducing augmented or latent variables. Typically, moving beyond Gaussian assumptions in Bayesian inferences introduces intractability issues. To address these challenges, we implement semiparametric mean field variational approximations to manage the intractable posteriors in the parameter learning processes.

Tuesday 30 and Wednesday 31 May 2023

Imputation for Tensor Time Series

Missing data is ubiquitous in areas such as econometrics and finance. It is less efficient to only consider the complete data set, which also induces information loss. Imputation is one of the main approaches to tackle the problem and we focus on missing value imputation for tensor time series, where tensor time series becomes more and more common. We introduce the tensor structure first, before discussing on the imputation approach by studying the latent factor structure. More specifically, we adopt Chen and Lam (2022) to estimate the tensor factor model and generalise from Bai and Ng (2021) to perform imputation. Iterative imputation is performed with a better simulation result. Further results to be established will be discussed briefly, involving testing for structural breaks and extension to more general missing patterns.

Methods for Ethical Machine Learning: Fairness, Explainability and Auditing

Machine learning algorithms are increasingly used in scenarios which greatly impact people such as policing and legal systems, housing, banking, insurance, education and employment. Fairness is an area that aims to reduce bias and discrimination proliferated by algorithms, and explainability aims to create interpretable explanations to explain why decisions were made by the algorithm. Although distinct research fields, they are both related in being necessary tools to allow the public to question decisions made about them, namely through algorithmic audits. We look at combining the motives of these fields and use causal modelling, from interventions to counterfactuals, to improve on existing model-agnostic explainability methods, allowing for better understanding to inform a discrimination audit of an algorithm. We highlight improving the partial dependence plot with several causal dependence plots, but the concept can be extended to other model-agnostic explainability methods such as LIME and SHAP.

Permutation Tests for Identifying Number of Factors for High-Dimensional Time Series

Factor modelling for high-dimensional time series data has been used as an effective dimension reduction method. This project studies the latent lower-dimensional factor process within the factor model, and focus on developing new method for identifying number of factors. The proposed estimator allows the dimension P of time series to be greater than length N of the observed series, and is robust when strength of factors are not at the same level. The proposed method involves eigenanalysis on N by N non-negative definite matrix, and permutation tests under multiple testing setting. We present the properties of this estimator in a simulation study.

Doubly Inhomogeneous Reinforcement Learning

We studied reinforcement learning in doubly inhomogeneous environments under temporal non- stationarity and subject heterogeneity. In many applications, it is commonplace to encounter datasets generated by system dynamics that change over time and population, challenging high-quality sequential decision making. In this paper, we propose an original algorithm to determine the “best data chunks” that display similar dynamics over time and population for policy learning, which alternates between change point detection and cluster identification. Our method is general, multiply robust and efficient.

Empirically, we demonstrate the usefulness of our method through extensive simulations and a real data application.

Factor Models for High-dimensional Multi-type Discrete-Time-to-Event Data

We propose a new class of factor models for high-dimensional discrete-time-to-event data, designed to extract factors that affect hazard rates of different event types. Both static and time-dependent observed covariates are included to allow for broader applicability. Our approach involves a joint maximum likelihood method, treating both item and person parameters as fixed yet unknown, and estimating them via alternating optimization. Preliminary simulation results and theoretical results on the convergence of parameters will be discussed.

Equilibrium and price impact in limit order market

We consider a one-period Nash equilibrium among informed traders and competitive liquidity suppliers in the limit order market, where the number of informed traders is random. In particular, we allow the case that there may or may not be informed traders (i.e. the existence of asymmetric information) in the market. In equilibrium, We further show that fat tail trading asset returns lead to power law asymptotic market impact, while light tails cause logarithmic market impact. The exponents and parameters in power law impact and logarithmic impact satisfy their corresponding fixed point equations.

Lp Rotation and False Discovery Control for Exploratory Factor Analysis

Exploratory factor analysis (EFA) is a widely adopted technique for uncovering the underlying latent structure in multivariate data. To enhance interpretability, rotation methods are commonly employed for the factors. In this talk, we present a novel family of oblique rotations based on component-wise Lp loss functions (0 < p ≤ 1) that offer statistical consistency in estimating the true loading matrix with a sparse structure. Moreover, under sparsity conditions, Lp rotation enables effective control of the false discovery rate (FDR) through the construction of mirror statistics. This method overcomes limitations encountered by other popular approaches for FDR control in the EFA model, such as low power or the requirement for accurate estimation of the marginal distribution of latent factors.
We demonstrate the superiority of the Lp rotation method over traditional rotation and regularized estimation techniques through various experiments, showcasing its statistical accuracy and computational efficiency. Additionally, numerical experiments highlight the efficacy of mirror statistics in controlling FDR for the EFA model and achieving high statistical power.

Sequential Knockoffs for Variable Selection in Reinforcement Learning

In real-world applications of reinforcement learning, it is often challenging to obtain a state representation that is parsimonious and satisfies the Markov property without prior knowledge. Consequently, it is common practice to construct a state which is larger than necessary, e.g., by concatenating measurements over contiguous time points. However, needlessly increasing the dimension of the state can slow learning and obfuscate the learned policy. We introduce the notion of a minimal sufficient state in a Markov decision process (MDP) as the smallest subvector of the original state under which the process remains an MDP and shares the same optimal policy as the original process. We propose a novel sequential knockoffs (SEEK) algorithm that estimates the minimal sufficient state in a system with high- dimensional complex nonlinear dynamics. In large samples, the proposed method controls the false discovery rate, and selects all sufficient variables with probability approaching one. As the method is agnostic to the reinforcement learning algorithm being applied, it benefits downstream tasks such as policy optimization. Empirical experiments verify theoretical results and show the proposed approach outperforms several competing methods in terms of variable selection accuracy and regret.

Local / Global Maximas of Gaussian Process Regression

Gaussian Process Regression (GPR) is a powerful tool in Statistics and Machine Learning. There are a large number of literatures focusing on the hyperparameter estimation and optimization. However, the existence of local and global maximas in marginal likelihood receives little attention. The aim of the project is to explore a systematic way to locate local / global maximas and find a reliable numerical methodology to search it under different settings.

Topics on weak convergence in the Skorokhod/J1 topology: from ultra-high frequency price modelling to Euler-Maruyama schemes

UHF asset prices data appear as near-continuous time pure-jump (thus piecewise-constant) tick-valued processes: this is fundamentally at odds with the established corpus of stochastic modelling based on processes with (jump-)diffusive paths. We can reconcile these successful models utilized in practice with the reality of UHF prices by arguing that appropriate scaling limits of UHF price models converge weakly to familiar stochastic models with particular stylized facts. The functional space of càdlàg functions over the positive real line equipped with the Skorokhod J1 topology is a natural space in which we can study the weak convergence of (the laws of) stochastic processes and their path functionals, such as exit times from open regions, barrier hitting times and time integrals. As an example, I will present some of my results on notable classes of processes, on sequences of processes stopped at their own exit times and some further research questions I am exploring. The same functional space (more accurately, a subset of it) is the natural framework for studying the weak convergence of interpolated Euler-Maruyama schemes, which appear as approximate UHF models at high frequency observation scales. I will also present some of my research questions in this direction.

Minimax gradient boosting in causal inference

Estimating causal effects is a critical task in numerous scientific and practical fields, as it allows for a deeper understanding of the impact of interventions or treatments. While a variety of machine learning techniques have been employed to determine causal effects, there is a pressing need for more versatile and dependable methodologies. In this study, we propose an innovative, generalized gradient boosting framework specifically designed for tackling minimax modelling in causal inference.

Autoregressive Networks: sparsity and node heterogeneity

Abstract: The dynamics of the network are modelled by an edge-independent AR process. Several time- series properties (stationarity, strong $\alpha$-mixing, bound of variance, etc.) are constructed accordingly. For a further parametrisation which takes sparsity and node heterogeneity into account, we proposed an estimator and derived the tail bound using the moment-generating function arguments. To bind the moment-generating function, a novel 'Cantor set construction' argument was applied.

Multiple-output quantile regression via Optimal Transport

We present a novel extension of the celebrated composite quantile regression (CQR) method proposed by Zou and Yuan (2008) to handle multiple-output cases. Building upon the concept of multivariate quantiles introduced by Chernozhukov et al. (2017) and Hallin et al. (2021), we generalize the univariate CQR estimator. We show that the univariate CQR estimator can be formulated as an optimal transport problem, and this formulation naturally extends to the multivariate case. We establish the consistency of the proposed estimator and provide the rate of convergence. Additionally, we highlight the robustness of our method by showing that it remains valid even when the random errors follow heavy-tail distributions or when the support of the random errors is non-convex. To validate our findings, we conduct comprehensive simulations that illustrate the effectiveness of the proposed approach.

Monday 9 and Friday 13 May 2022

Generalised additive latent variable models for location, shape, and scale

Latent variable models are used to analyse multivariate data using a small number of factors. There are well established modelling frameworks for such objective, but often higher order characteristics of the observed items are ignored. In this talk, we extend the Generalized Additive Models for Location, Shape and Scale framework (GAMLSS, Rigby and Stasinopoulos, 2005) to models with latent variables (Bartholomew et al., 2011). The proposed framework allows for linear and nonlinear predictors, as well as heteroscedastic error terms. More specifically, we assume different regression equations for the location, scale, and shape parameters of the univariate (conditional) distributions for each of the observed variables. Modelling the mean, scale, and shape as a function of latent variables and covariates allows for a more flexible and general modelling framework than the classical factor analysis model. A computationally efficient penalised maximum likelihood estimation is proposed. Examples from large scale surveys are used to demonstrate its applicability.

Rank and Factor Loadings Estimation in Time Series Tensor Factor Model by Pre-averaging

Due to modern data collection capabilities, analysis of tensor time series has become one of the most active research areas in statistics and machine learning. One effective approach for dimension reduction of high-dimensional tensor time series is to use a factor model structure similar to Tucker decomposition. In this paper, we propose a pre-averaging procedure to estimate the factor loading directions of tensor factor models, assuming the noise has both serial and cross-sectional dependence. Based on the initial estimated directions, we introduce an algorithm for iterative projection direction refinement, which improves the accuracy of the factor loading estimators. The same projected method can be further utilized to estimate the rank of core tensor by correlation thresholding through bootstrap. The empirical performances of the proposed estimation procedure are illustrated and compared with other competitors through simulation studies.

INAR Approximation of Bivariate Linear Birth and Death Process

In this presentation, we propose a new type of univariate and bivariate Integer-valued autoregressive model of order one (INAR(1)) to approximate univariate and bivariate linear birth and death process with constant rates. Under a specific parametric setting, the dynamic of transition probabilities and probability generating function of INAR(1) will converge to that of birth and death process as the length of subintervals goes to 0. Due to the simplicity of Markov structure, maximum likelihood estimation is feasible for INAR(1) model, which is not the case for bivariate and multivariate birth and death process. This means that the statistical inference of bivariate birth and death process can be achieved via the maximum likelihood estimation of a bivariate INAR(1) model.

Bootstrap Inference on Tensor Factor Model

Tensor, as a multi-dimensional array data structure, naturally extends matrices to accommodate data in ever growing complexity. Tensor factor models can be estimated through Tucker decomposition. In estimating the mode-k column space, a crucial problem is how to aggregate information of fibres in other modes to avoid signal cancellation. One solution is to synthesize a fibre in mode-k after projections by the largest eigenvector in all other modes and then iteratively apply PCA on such fibres in each mode. To assess its performance, we could Bootstrap more fibres, while preserving both the serial and (within-mode) cross-sectional correlations. This presentation will demonstrate some of the results and issues.

High Frequency Trading in Kyle-Back Model

We consider a special version of the Kyle-Back model where not only the insider receives a private signal that converges to the true value of the asset, but also there is another public signal also converging to the true value of the asset that is shared with the market makers. We are going to show that the expectation of the true value of the asset given the insider’s information is given by a linear combination of both the public and private signals and it is a martingale the insider’s filtration. Furthermore, we will show some discussions on the behaviour of the market makers and what is expected from the equilibrium(a) of the model.

Robust Inference for Change Points in Piecewise Polynomials using Confidence Sets

Multiple change point detection has become popular with the routine collection of complex non stationary time series. An equally important but comparatively neglected question concerns quantifying the level of uncertainty around each putative change point. Though a handful of procedures exist in the literature, most all make assumptions on the density of the contaminating noise which are impossible to verify in practice. Moreover, most procedures are only applicable in the canonical piecewise-constant mean (median, or quantile) setting. We present a simple procedure which, under minimal assumptions, returns localised regions of a data sequence which must contain a change point at some global significance level chosen by the user. Our procedure is based on properties of confidence sets for the underlying regression function obtained by inverting certain multi-resolution tests, and is immediately applicable to change points in higher order polynomials. We will discuss some appealing theoretical properties of our procedure, and show its good practical performance on real and simulated data.

Blind Source Separation over Space

Modelling multivariate spatial data, due to its dependence over space, is both computationally and theoretically complex. Recently a blind source separation model for spatial data was introduced. For this model, it is assumed that the multivariate observations are linear mixtures of an underlying latent field with independent components. We propose a new estimation method on the latent components. The new estimation is based on an eigenanalysis of a positive definite matrix defined in terms of multiple spatial local covariance matrices, and, therefore, can handle moderately high-dimensional random fields.

Regression with Gaussian Process Prior for spatial and spatio-temporal data

Regression with Gaussian Process (GP) prior is a powerful statistical tool for modelling a wide variety of data with both Gaussian and non-Gaussian likelihood. In the spatial statistics community, GP regression, also known as Kriging, has a long-standing history. It has been proven useful since its introduction, due to its capability of modelling autocorrelation of spatial and spatio-temporal data. Other than space and time, real-life applications often contain additional information with different characteristics. In applied research, interests often lie in exploring whether there exists a space-time interaction or investigating relationships with covariates and the outcome while controlling for space and time effect. Additive GP regression allows to model such flexible relationships by exploiting the structure of the GP covariance function (kernel) by adding and multiplying different kernels for different types of covariates. This has only partially be adapted in spatial and spatio-temporal analysis. In this study, we use ANOVA decomposition of kernels and introduce a unified approach to model spatiotemporal data, using the full flexibility of additive GP models. Not only does this permit modelling of main effects and interactions of space and time, but furthermore to include covariates, and let the effects of the covariates vary with time and space. We consider various types of outcomes including, continuous, categorical and counts. By exploiting kernels for graphs and networks, we show that areal data can be modelled in the same manner as the data that are geo-coded using coordinates.

Rotation to Sparse Loadings using $L^p$

Functions We propose a family of loss functions, the component-wise $L^p$ loss, for oblique rotations in exploratory factor analysis. The proposed loss functions take the form of the sum of the $p^{th}$ power of the absolute loadings, for $p\leq 1$. They are special cases of the concave component-wise loss functions (Jennrich, 2006), but the cases when $p<1$ have been overlooked in the past. We establish the connection between the proposed rotation method and regularized estimation based on $L^p$ penalty functions, showing that the former is a limiting case of the latter when the tuning parameter in regularized estimation converges to zero. The statistical consistency of the rotation-based estimator is established. In addition, procedures are developed for drawing statistical inference on the sparse true loading matrix, such as hypothesis testing and constructing confidence intervals. It is worth noting that since the objective function is non-smooth, classical statistical inference methods for rotation-based procedures fail (as the delta method is no longer applicable). A computationally efficient iteratively reweighted least square algorithm is developed that is suitable for the entire family of loss functions. The proposed method is evaluated via simulations and compared with the regularized estimation methods.

TBC

Improved Euler Schemes for Killed Diffusions

It is a well-known problem in the area of diffusion processes that the presence of killing on boundaries introduces a loss of accuracy and reduces the weak convergence rate by 1√. We discuss the introduction of a class of recurrent transformations accompanied by a drift-implicit Euler scheme which brings the convergence rate back to 1, i.e the optimal rate in the absence of killing. Consequently, we discuss how these ideas could be extended in the case of the Cox-Ingersoll-Ross model in the recurrent domain.

Wavelet-based Long Run Variance (LRV) Estimation

Change Point Detection (CPD) problem is an important question arising from the sudden occurrence of variations in data observed over time. In this research, we are particularly interested in finding the “features” in levels/trends of the process with a piecewise-parametric signal plus dependent noise. Due to dependent error process, many existing studies conducted primarily based on the assumption of independent noises are not applicable in this case. Therefore, we shall start with modifying existing estimation methods under more general assumptions. To test for a break in signal, it is common to obtain a consistent estimator of the so-called long run variance (LRV). In this presentation, we shall introduce the definition and properties of several novel wavelet-based LRV estimators which are built on discrete or maximal-overlap Haar wavelets, and show their performance by comparing with several existing estimators. Then we may show their performance when applied in some existing CPD methods.

Vector Composite Quantile Regression via Optimal Transport

Composite quantile regression (CQR) is a celebrated coefficient estimation and variable selection method proposed by Zou and Yuan (2008). Based on the novel concept of multivariate quantiles introduced by Chernozhukov et al. (2017) and Hallin et al. (2021), we are able to generalize the univariate CQR to multiple-output case. We demonstrated that the univariate CQR is essentially equivalent to an optimal transport formulation which can be naturally generalize to the multivariate case. Moreover, we further proposed a vector quantile regression by concentrating the weights of CQR on a single vector quantile point. Some numerical simulations has been carried out to demonstrated the proposed method.

Monday 17 and Tuesday 18 May; Tuesday 25 and Wednesday 26 May 2021

Sequential Bayesian Learning on State Space Models

Particle MCMC (PMCMC), developed in Andrieu et al. (2010), is a new class of sampling algorithms designed for models with intractable likelihoods that allows jointly sampling the underlying parameter of a model and the filtered trajectory of the hidden states. This framework can be extended to be of sequential nature by estimating model parameter and latent trajectories for each time point, introduced as Sequential Monte Carlo Squared (SMC2) in Chopin et al. (2012). Traditionally, PMCMC — and by extension SMC2 — are challenging to use for models with higher dimensional parameter space.

We discuss ways and methods to make these algorithms more suitable in such scenarios and explore possible use cases for state space models, such as hidden (semi) Markov and stochastic volatility models, on financial times series data.

A flexible class of Latent Variable Models

Factor analysis type models are used to analyse multivariate correlated observed variables using a small number of latent variables. In this talk, we extend the Generalized Additive Models for Location, Shape and Scale (GAMLSS) framework (Rigby and Stasinopoulos, 2005) to models with latent variables (Bartholomew et al., 2011).

The proposed framework allows for linear and nonlinear predictors for not only the mean, but higher order moments. More specifically, we assume different regression equations for the location, scale, and shape parameters of the univariate (conditional) distributions for each of the observed variables. Modelling mean, scale, and shape as a function of latent variables and covariates provide a more flexible and general model framework than the classical factor analysis model.

A computationally efficient penalised estimation method is proposed. Some simulation results and empirical examples from educational and attitudinal surveys are used to demonstrate its applicability.

A New Form of Consistency for Large Covariance Matrix Estimators

Over the past decade, the problem of estimating large covariance matrices has attracted a great deal of attention. Two major branches of regularization methods have been proposed. The first branch assumes special structures on the covariance or precision matrix, and the second branch shrinks the eigenvalues of the sample covariance matrix.

In general, structural assumptions are needed for consistent estimations, while eigenvalue-regularized estimators lack theoretical guarantees compared to structural ones.

In this project, we prove that a recently proposed matrix convergence criterion, normalized consistency, holds for the Nonparametric Eigenvalue-Regularized Covariance Matrix Estimator (NERCOME) under certain conditions. Similarly, the corresponding precision matrix estimator also satisfies normalized consistency. Normalized consistency can be applied in high-dimensional hypothesis testing, and our simulation results show that NERCOME performs well in practice.

The last part of the presentation will briefly discuss a new project for the next stage: threshold factor models for high-dimensional tensor time series.

Cluster point processes and Poisson thinning INARMA

We consider Poisson thinning Integer-valued time series models, namely Integer-valued moving average model (INMA) and Integer-valued Autoregressive Moving Average model (INARMA), and their relationship with cluster point processes, the Cox point process and the dynamic contagion process.

We derive the probability generating functionals of INARMA models and compare to that of cluster point processes.

The main aim of this work is to prove that, under a specific parametric setting, INMA and INARMA models are just discrete versions of continuous cluster point processes and hence converge weakly when the length of subintervals goes to zero.

Tensor factor model

Tensors, as a multi-dimensional array data structure, naturally extend beyond matrices to accommodate data collected with growing complexity. To better analyse tensors of high orders, it is essential to extract and exploit their low-rank structures, for instance, through Tucker decomposition that resembles SVD. For the purpose of dimension reduction, statistical factor models have long been used in many areas. The goal of the project is to study the properties of statistical factor models on tensors, particularly when certain degrees of serial correlations present. The presentation will share findings from the readings done so far and motivate future developments.

Adaptive functional thresholding for sparse covariance function estimation in high dimensions

Covariance function estimation is a fundamental task in multivariate functional data analysis and arises in many applications. In this paper, we consider estimating sparse covariance functions for high-dimensional functional data, where the number of random functions p is comparable to, or even larger than the sample size n.

Aided by the Hilbert--Schmidt norm of functions, we introduce a new class of functional thresholding operators that combine functional versions of thresholding and shrinkage, and propose the adaptive functional thresholding of the sample covariance function capturing the variability of individual functional entries.

We investigate the convergence and support recovery properties of our proposed estimator under high-dimensional regime where p can grow exponentially with n. Our simulations demonstrate that the adaptive functional thresholding estimators significantly outperform the competing estimators.

Finally, we illustrate the proposed method by the analysis of brain functional connectivity using two neuroimaging datasets.

Tail Risk in US Credit Markets

Theory suggests that heavy-tailed shocks have a significant impact on asset prices and vary over time. However, modelling time-varying tail risk and testing the effects on asset prices remain challenging due to the rare nature of extreme events. This problem is particularly acute in univariate time-series of relatively new assets with short historical data, e.g. credit default swaps (CDS) on US sovereign or corporate debt. Firstly, to overcome this problem, we devise a new measure of time-varying tail risk in credit markets that is directly determinable from returns across different maturities of US sovereign CDS. We measure aggregate tail risk dynamics with a dynamic power-law model and estimate the credit tail risk exponent with the Hill (1975) estimator. We find that a one-standard-deviation increase in tail risk forecasts an average increase in US sovereign credit default swap spreads of 7.6 bps, which is highly significant. We explore the robustness of the forecasting power of the credit tail risk measure to controlling for a large set of alternative predictors.

We conclude that increases in credit tail risk significantly predict increases in CDS spreads for different maturities. Secondly, we model the term structure of tail risk to measure the relationship among extreme returns of CDS that differ only in their maturity. Using a broad cross-country sample, we exploit sovereign CDS price crashes, month-by-month, to identify common tail risk variations for a specific maturity. The term-structure of credit tail risk exhibits the same level of credit tail risk independently of the maturity. Thirdly, we construct a measure for corporate credit tail risk. While tail risk estimates are highly correlated across industries, the US financial sector has the highest correlation of 89% with the US economy. Furthermore, we find that the cross-section of US corporate CDS returns reflects a premium for tail risk sensitivity. Cross-sectionally, firms that highly covary with tail risk earn average expected annual returns 8.1% higher than CDSs with low tail risk covariation. We show that the credit tail risk premium is different from the premiums on market risk, idiosyncratic volatility and coskewness, and robust to controlling for these alternative risk factors. We conclude that sovereign and corporate tail risk is persistent and sellers of protection demand additional compensation for default insurance contracts with high sensitivities to extreme events.

High Frequency Modelling in Kyle-Back Model

We are going to show that the expectation of the true value of the asset given the insider's information is given by a linear combination of both the public and private signals and it is a martingale in its own filtration. Furthermore, we will show some discussions on the behaviour of the market makers and what is expected from the equilibrium(a) of the model.

Detecting Misaligned Changes in Multivariate Data using the Narrowest Signicance Pursuit

This talk will be concerned with the setting in which a large number of time series are observedtogether as a panel; the time series means are known to follow a linear model possibly driven bycovariates, and it is suspected that the parameters change an unknown number of times at unknown locations. This setting is very general, and for examples nests the canonical changepoint detection problem which has been studied in the statistics literature as well as the panel data regression with piecewise constant parameters which is popular in applied work.

We propose a generic algorithm for detecting time intervals which must, at a prescribed significance level, contain one or more changepoint-like events. Our procedure is based on multi-resolution tests first proposed by Fryzlewicz (2020) in the univariate setting, and extends the literature on multivariate change detection in two directions. Firstly, compared to locals tests common in the changepoint detection literature such as the cusum and mosum statistics, our multi-resolution tests are automatically applicable in any scenario where the data is generated by a linear model with piecewise constant parameters. Secondly, by focusing on interval rather than point estimates for the changepoint locations we are able to relax the common assumption of aligned channel-wise changepoint locations; which is rarely satisfied in real data settings, and can lead to misleading statistical results when unknowingly violated.

Conditionally Heteroskedastic Dynamic Factor Models

Dynamic factor models are more and more used in data rich settings. Several procedures, such as PCA and Kalman Filter, have been developed for the correct estimation of latent factors and corresponding model parameter with both large numbers of cross-section units (N) and time series observations (T). In previous application, however, the dynamics of the factors have been modelled as i.i.d. white noise. Yet, we know that many financial variables exhibit a non-zero correlation.

In literature, this feature is modelled by GARCH models which assume that returns are conditionally heteroskedastic, i.e., their variance and covariance conditional on the past evolve with time. Firstly, Diebold and Nerlove (1989) specified a multivariate time series model applied to exchange rates with an underlying latent variable whose innovations were ARCH. Harvey, Ruiz, and Sentana (1992) derived subsequently the Kalman Filter estimator in the finite N case, arguing against the good capacity of the estimator to correctly extract the unobserved state. However, in the setting of high N dimension - proper of factor analysis - the estimator can be shown to be efficient. First of all, the model is converted in state-space form explicitly taking into account heteroschedasticity. Subsequently, the Kalman Filter is applied to jointly estimate the parameter by Expectation Maximization algorithm.

Blind Source Separation over Space

Modelling multivariate spatial data, due to its dependence over space, is both computationally and theoretically complex. Recently a blind source separation model for spatial data was introduced. For this model, it is assumed that the multivariate observations are linear mixtures of an underlying latent field with independent components. An estimator for recovering the latent components, obtained by jointly diagonalizing two or more local covariance matrices was suggested. Several simulations are conducted for investigating properties and efficiency of this method.

In this project, we propose a new method on recovering the latent components. The new method pre-processes the multivariate observations by standardizing it. This makes the covariance matrix of the multivariate observations an identity matrix, which allows a direct estimation on the mixing matrix. Based on eigen-analysis only, this new approach should be more direct and effective both theoretically and computationally.

Dimensional Linear Regression with Correlated Errors

Linear regression is commonly used for modeling the relationship between response and regressors. It is common practice to include many covariates in the model specification for confounding control.

The goal of this project is to present an inference method that is valid in a high-dimensional setting, where the dimension $d$ of control variables is not necessary to be a vanishing fraction of the sample size $n$, with the presence of serial correlation. The proposed method is demonstrated in simulations.

The next stage is to extend our findings to one-way fixed effects panel data models.

Signal estimation, multiple testing and estimating the proportion of false null hypotheses

This talk has two parts. In the first one, we consider signal estimation and multiple testing problem. The two are related and widely considered in the literature. We propose a simple procedure called Tail Summed Scores and analyse it from both perspectives.

In the second part, we consider the problem of estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. We propose a method referred to as Difference of Slopes and consider its theoretical properties under different assumptions on the distribution of p-values.

Cheating Detection in Educational Tests Using double explanatory IRT Mode

Fairness is essential to the validity of educational tests. Test scores are no longer a fair indicator of examinees' true ability if some test questions favour some test takers over others. Therefore, it is necessary to detect compromised items and test takers who benefit from prior access to these items.

We have developed a double mixture Item Response Theory (IRT) Model for simultaneous detecting potential cheaters and compromised items based on item response data, without any prior knowledge of person or item compromise status. The model adds a latent class model component to an IRT model. The additional latent class component is designed to capture the cheating effect due to item pre-knowledge while the IRT model represents normal item response behaviour.

Now we aim to extend the model to an explanatory framework, where observed and latent predictors on both person and item sides are used to predict the status of test takers and items. The new doubly explanatory mixture IRT model is estimated under a full Bayesian framework. The proposed model is used to re-examine the licensure test data. The detections will be compared with previous results based on the double mixture IRT model. Our focuses are placed on two areas: (a) improving the detection of compromised examinees and items and (b) investigating person- and item-level contributing factors to cheating.

Interrupted Brownian Motion

We find the joint distribution of the process and the number of interruptions, N(t) using martingale methodology. The excursion of the interrupted Brownian motion using perturbation method with exponential time is also studied.

We then identify the joint Laplace transforms of occupation times until the first time the occupation time above the level a reaches a certain level for Brownian motion.

Sketching stochastic set utility functions

In this presentation, we look at the problem of finding good sketch (representation) of item distributions for approximation of stochastic set utility function. This problem has practical applications in gaming, crowdsourcing, digital advertising, etc.

Our goal is to approximate the set utility function everywhere and find an algorithm to select the best set. Specifically, we would like to propose a systematic way of sketching such that we have certain guarantees for approximation and controls on the size of individual representation. Moreover, we are also interested in the trade-off between approximation accuracy and complexity of the representation.

We will discuss the related past studies on this topic using test-score based and quantization method, and then we will show some preliminary results based on the quantization method. We conclude with a brief discussion on possible frameworks to be used for the remaining open questions.

Automatic Model Selection Method for Time Series

To describe the dependence structure of models, we should deal with the important issue of finding a consistent estimator of the long run variance (LRV). This presentation will first provide an introduction to some wavelet-based estimators, which lie between the two broad classes of LRV estimators.

They are proposed using the traditional or maximal overlapping Haar wavelets and the structure of an existing difference-based estimator while replying on different measures. We will show the comparison results of the performance of several LRV estimators, including our wavelet-based ones.

Then we will discuss their performance when they are applied in existing change point detection methods.

Dynamics and Inference for Voter Model Processes

In this talk, we consider a discrete-time voter model process where each node state takes one of two possible values, either 0 or 1. In each time step, each node adopts the state of a randomly sampled neighbour, according to sampling probabilities which we refer to as node interaction parameters. We study the maximum likelihood estimation of node interaction parameters from observed states of nodes over time for a given number of realizations of a voter model process. In contrast to previous work on parameter estimation of network autoregressive processes, whose long-run behaviour is according to a stationary stochastic process, the voter model process we study eventually reaches a consensus state. This requires developing a framework for deriving sampling complexity bounds for parameter estimation by using observations of node states from several realizations of a voter model process.

We present sampling complexity upper bounds for estimation of node interaction parameters within a given accuracy by analysis of M-estimators with decomposable regularizers. This is achieved by interpreting the observation data as being generated according to an extended voter model that consists of cycles, each corresponding to a realization of a voter model process until absorption to a consensus state. In order to obtain these results, we derived new bounds for the expected value and probability tail bounds of consensus time, which may be of independent interest. We also present a sampling complexity lower bound by using the framework of locally stable estimators. Finally, we will discuss some future works for voter model processes and other network models by adding edge signs into the network.

Research posters

Wang, Yiliu (2019) Representation Learning for relational data

Vamvourellis, Konstantinos, Kalogeropoulos, Kostas, and Phillips, L. (2019)
Bayesian Modelling for Benefit-Risk Balance Analysis: Rosiglitazone for Type ii Diabetes

De Santis, Davide (2018)
Mixed Impulse/Stopping Nonzero-Sum Stochastic Differential Games

Pedraza Ramírez, José Manuel (2018)
Predicting the last zero of a spectrally negative Lévy process

Vamvourellis, Konstantinos, Kalogeropoulos, Kostas, and Phillips, L. (2017)
Bayesian Modelling for Benefit-Risk Balance Analysis: Rosiglitazone for Type ii Diabetes

Schröder, Anna Louise and Ombao, Hernando (2015)
FreSprD: frequency-specific change-point detection in multichannel EEG seizure recordings

Habibnia, Ali (2014)
Nonlinear forecasting with many predictors by neural network factor models

Huang, Na and Fryzlewicz, Piotr (2014)
NOVELIST estimator for large covariance matrix

Terzi, Tayfun (2014)
Methods for the identification of semi-plausible response patterns

Sienkiewicz, Ewelina, Thompson, E. L. and Smith, Leonard (2014)
Consistency of regional climate projections with the global conditions that stimulated them

Doretti, Marco, Geneletti, Sara and Stanghellini, Elena (2014)
Measuring the efficacy of the counterweight programme via g-computation algorithm

Yan, Yang, Shang, Dajing and Linton, Oliver (2013)
Efficient estimation of risk measures in a semiparametric GARCH model

Hafez, Mai (2013)
Multivariate longitudinal data subject to dropout and item non-response: a latent variable approach

Sienkiewicz, Ewelina, Smith, Leonard and Thompson, E. L. (2013)
How to quantify the predictability of a chaotic system

Jarman, Alex and Smith, Leonard (2013)
Forecasting the probability of tropical cyclone formation - the reliability of NHC forecasts from the 2012 hurrican season

Wheatcroft, Edwards and Smith, Leonard (2013)
Will it rain tomorrow? Improving probalistic forecasts

Higgins, Sarah and Smith, Leonard (2013)
The impact of weather on maize wheat

Huang, Na and Fryzlewicz, Piotr (2012)
Large precision matrix estimation via pairwise tilting

Korkas, Karolos and Fryzlewicz, Piotr (2012)
Adaptive estimation for locally stationary autoregressions

Dureau, Joseph and Kalogeropoulos, Konstantinos (2011)
Inference on epidemic models with time-varying parameters: methodology and preliminary applications

Giammarino, Flavia and Barrieu, Pauline (2011)
Indifference pricing with uncertainty averse preferences

Zhao, Hongbiao and Dassios, Angelos (2011)
A dynamic contagion process and an application to credt risk

Read an article about learning through posters.

Monday 1 and Tuesday 2 June; Monday 8 and Tuesday 9 June 2020

Bayesian Inference on State Space Models

State space models (SSM) are a flexible class of models that can describe various time series phenomena. However, analytical forms of their likelihood functions are only available in special cases, and thus standard parameter optimization routines might be unfeasible. Particle MCMC (PMCMC) is a new class of sampling algorithms, designed for models with intractable likelihoods, that allows to jointly sample the underlying parameter of a model as well as a filtered trajectory of the hidden states. We discuss ways to make the algorithm more suitable for longer time series data, and for models with many parameter. Moreover, after explaining some potential use cases for a hidden semi-Markov model, a SSM where the latent state is a semi-Markov chain, we fit this model on financial times series data using the PMCMC algorithm, and discuss ways to do further inference.

On a (more) flexible class of Latent Variable Model

The Generalized Linear Latent Variable Model (GLLVM) is a general class of statistical model commonly used in multivariate data analysis to explain the underlying relationships between observed random variables and to reduce data dimensionality. To do so, associations between $p$ observed variables are explained through a set of $q$ latent variables (factors) that belong in a lower dimensional vector space, i.e., $q \leq p$. The GLLVM has two main parts: the measurement part (which models the linear relationships between manifest random variables in the exponential family and latent variables) and the structural part (which models the relationships among factors). Albeit powerful, the GLLVM faces limitations when the common modelling assumptions (e.g. linearity or homoscedasticity) fail. We expand the class of GLLVMs by allowing for greater flexibility in the measurement part of the model and by going beyond the exponential family of distributions. We achieve this by accommodating nonlinear relationships between manifest and latent variables in the measurement equations through a generalized additive model specification on the location, scale and shape (GAMLSS) parameters of the univariate conditional distribution of the manifest variables. This aims to provide a better understanding of (possibly nonlinear) associations between observed and latent variables and to improve model fit by expanding the modelling framework to include higher order moments. We present the a more flexible semiparametric latent variable model (SPLVM) framework, and discuss the challenges that arise under a more general (yet complex) modelling setting (estimation, parameter identification, computational burden, etc.).

A Two-Phase Dynamic Contagion Model for COVID-19

We propose a continuous-time stochastic intensity model, namely, two-phase dynamic contagion process, for modelling the epidemic contagion of COVID-19 and investigating the lockdown effect based on the dynamic contagion model introduced by Dassios and Zhao (2011). It allows randomness to the infectivity of individuals rather than a constant reproduction number as assumed by standard models. Key epidemiological quantities, such as the distribution of final epidemic size and expected epidemic duration, have been derived and estimated based on real data for various regions and countries. The associated time lag of the effect of intervention in each country or region is estimated, and our results are consistent to the incubation time of COVID-19 for most people found by existing medical study. We demonstrate that our model could potentially be a valuable tool in the modeling of COVID-19. More importantly, this kind of variations of the dynamic contagion model could also be used an important tool in epidemiological modelling as this type of contagion models with very simple structures is adequate to describe the evolution of regional epidemic and worldwide pandemic.

Robust Estimation of High Dimensional Covariance Matrix

Covariance matrices are handy tools to summarise how random variables vary together. Such relationships are of particular interest to high dimensional datasets that are often embedded with complex structures and noises. Typical sample covariance estimators, however, rely on: 1) the dimensionality of smaller order than the sample size and 2) sub-Gaussianality. We build on recent developments in non-parametric eigenvalue shrinkage and robust measures to guarantee performance for a wider family of distributions when p/n -> c. Simulations show encouraging improvements over sample covariances in measure of Frobenius norm.

Zero-sum stochastic differential games with impulse controls: a stochastic Perron’s method approach

The main object of this work is to apply the stochastic Perron method to zerosum stochastic differential games with impulse controls. We consider a symmetric game in which the two agents are playing feedback impulse control strategies instead of defining them in an Elliot-Kalton fashion. The upper value and lower value functions are naturally associated to two double obstacle partial differential equation (PDE) and, in particular, it turns out that the upper value function’s PDE, the upper Isaacs, coincide with the case in which P1, the maximiser, has precedence of intervention over P2, the minimiser, whereas the lower value function’s PDE, the lower Isaacs, coincide with the case in which P2 has precedence. Once the upper and lower value functions are defined we suitably characterise the stochastic sub and super solutions of the upper and lower Isaacs as those functions dominated by and dominating the respective Isaacs equation. Then, we show that the infimum of stochastic supersolutions of the upper Isaacs is its viscosity subsolution whereas the supremum of stochastic subsolutions of the upper Isaacs is its viscosity supersolution so that a viscosity comparison result will give us the unique and continuous viscosity solution characterising the value of the game.

A New Perspective on Dependence in High-Dimensional Functional/Scalar Time Series: Finite Sample Theory and Applications

Statistical analysis of high-dimensional functional times series arises in various applications. Under this scenario, in addition to the intrinsic infinite-dimensionality of functional data, the number of functional variables can grow with the number of serially dependent functional observations. In this paper, we focus on the theoretical analysis of relevant estimated cross-(auto)covariance terms between two multivariate functional time series or a mixture of multivariate functional and scalar time series beyond the Gaussianity assumption. We introduce a new perspective on dependence by proposing functional cross-spectral stability measure to characterize the effect of dependence on these estimated cross terms, which are essential in the estimates for additive functional linear regressions. With the proposed functional cross-spectral stability measure, we develop useful concentration inequalities for estimated cross-(auto)covariance matrix functions to accommodate more general sub-Gaussian functional linear processes and, furthermore, establish finite sample theory for relevant estimated terms under a commonly adopted functional principal component analysis framework. Using our derived non-asymptotic results, we investigate the convergence properties of the regularized estimates for two additive functional linear regression applications under sparsity assumptions including functional linear lagged regression and partially functional linear regression in the context of high-dimensional functional/scalar time series.

Prospective Novelties in Inside Trading

In 1985 Kyle proposes a discrete-time model of inside trading which was extended to continuous-time by Back in 1992 originating the Kyle-Back model research agenda. Since then many contributions to this agenda have been made by considering models more general than the previously developed. With this in mind, this talk will present two possible research topics to be developed as extensions of the current literature on inside trading using Kyle-Back models.

The first extension would be to consider, instead of a single insider, many interacting inside traders. The number of insiders could either be random or deterministic with asymmetric dynamic signals about the true value of the as- set. Another possible extension would be to consider a setting in which the asymmetry of information between the market participants would lead to a high-frequency trading by the insider.

Uses of High Dimensional Trend Segmentation

In many settings prominent features of univariate time series can be described well by a stylised model in which weakly dependent noise fluctuates around a piecewise linear trend. When experimental setups lead to many such time series being collected together as a panel, subgroups will often exhibit visual similarity. To motivate high dimensional trend segmentation, the first part of the talk will introduce a data example in which classical time series clustering methods fail the “visual similarity test” while a simple procedure based on changepoints performs surprisingly well.

The second part of the talk will review existing methods for univariate trend segmentation, and discuss strategies for extending to the multivariate setting. While the problem of high dimensional trend segmentation is relatively new, the problem of identifying prominent features or landmarks in a sample of curves dates back to Kneip & Gasser (1992). In functional data analysis landmarks are used to perform curve registration. In the context of high dimensional time series, feature misalignment may in fact be informative as it can be interpreted as reflecting components of a connected system responding to impulses at different speeds; analogous to dynamic factor models.

The final part of the talk will propose a factor model framework for high dimensional trend segmentation problems. We expect such a model to exhibit a blessing of dimensionality, in the sense that estimates for the locations of aligned changepoints improve with T and n. Estimating the model amounts to recovering a common basis for piecewise linear functions on [1,T]. This will be com- pared to the associated eigen-basis problem which arises when estimating parameters in a standard factor model, as well as to the problem of principal component analysis for functional data.

References

Kneip, A. & Gasser, T. (1992), ‘Statistical tools to analyze data representing a sample of curves’, The Annals of Statistics pp. 1266–1305.

Gaussian Process for Spatial and Spatio-temporal analysis

Spatial and spatio-temporal point patterns are data which typically contain information of event locations and time, i.e., disease, crime or natural disaster data. When data consist of more than one type of event, they are referred to as multivariate spatial or spatio-temporal point patterns. Whilst common scientific interest in the former is to estimate event rates, the latter often focuses on modelling relative rates and classification of the event type. For example, we may be interested in classifying disease with different strains of a virus such as flu or tuberculosis.

Despite the increasing demand for more efficient models for analysing this type of data, existing models are often difficult to implement. This is especially problematic for spatio-temporal analysis. In this paper, we propose alternative approach: Gaussian process (GP) models, as a class of non-parametric models for regression and classification. Estimation and inference procedures involved in GP models for categorical response are not free from computational cost, and alternative analytical approximation may result in poor performance. Our method secures both efficiency and accuracy by combining Markov Chain Monte Carlo and analytical approximation in inference. Another benefit of GP models is their flexibility in incorporating covariates. This allows models to be easily extended to multivariate spatio-temporal point patterns analysis.

Capacity Expansion from a Mean Field Games Perspective

Mean field game (MFG) theory is a general while realistic framework studying competitive behaviors of a multiple of agents influencing each other. We introduced such framework into the study of economics of capacity expansion within an industry consisting of many producers in my first-year presentation.

Also, in my second year, we proposed a numerical method, which is based on particle formulation of MFG and symplectic integral, to simulate a large number of agents according to MFG theory.

This year, I improved the first model to incorporate more realistic economical phenomena, and then generalized and applied the numerical scheme to study the new model. This gives us further insight into real-world financial market, especially some stylized facts in world crude oil market.

Change-point detection methods for estimating the proportion of alternative hypotheses

We consider the problem of estimating the number of alternative hypotheses among a large number of independently tested hypotheses. This problem is easier than the related problem of multiple testing where the goal is to identify the particular alternative hypotheses. The main application is in the analysis of microarray data for estimating the extent of the change in the gene expressions. We fit segmented linear regression models with one breakpoint to the sequence of increasingly sorted p-values. The estimate of the number of alternative hypotheses is the estimated location of the change in slope, separating the smaller alternative p-values from the null p-values.

A Nonparametric Bayesian Model for Directed, Sparse and Dynamic Networks

The Bayesian nonparametric link prediction model has become a widely used method to present the sparsity of networks with large scale nodes and links. However, unlike stochastic block model, traditional link prediction model fails to reflect the community structure that senders and receivers in the same class mostly overlap.

To overcome such limitation, in this paper, we develop a Bayesian nonparametric approach to group the senders and receivers of links with overlaps, while the sparse pattern of network is also kept. Our method is based on a correlated instead of conditionally independent prior couple for sends and receivers.

Moreover, we present a time-dependent model by extending the proposed static model to dynamic setting in a novel non-Markovian way. To achieve this, we construct a dependent Dirichlet process prior using underlying Gaussian process and Poison process.

Finally, we introduce fast variational inference algorithms for the proposed models. In particular, our variational inference framework embeds a deep neural network to explore the hidden non-Markovian dynamic. We are conducting the simulation and real data analysis.

The Forward Search for Detecting Non-invariant Items

The forward search approach bases the inferential results on a sequence carefully selected data subsets. It starts with an outlier-free subset of the data and proceeds by adding observations until the whole data set are included. The progression of the forward search maintains robustness against departures from the model and thus can be used to detect outliers. We notice that the forward search can be adapted to trace non-invariant variables that have significantly different parametric forms across groups. To detect non-invariant variables, we fit a measurement-equivalent model to a series of data subsets established throughout the forward search. Statistics for model adequacy are monitored to indicate the entries of non-invariant items. We address the detection of non-equivalent items in the context of latent variable modelling.

A random forest-based approach for predicting spreads in the primary catastrophe bond market

We introduce a random forest approach to enable spreads' prediction in the primary catastrophe bond market. We investigate whether all information provided to investors in the offering circular prior to a new issuance is equally important in predicting its spread. The whole population of non-life catastrophe bonds issued from December 2009 to May 2018 is used. The random forest shows an impressive predictive power on unseen primary catastrophe bond data explaining 93% of total variability. For comparison, linear regression, our benchmark model, has inferior predictive performance explaining only 47% of total variability. All details provided in the offering circular are predictive of spread but in a varying degree. The stability of the results is studied. The usage of random forest can speed up investment decisions in the catastrophe bond industry.

Note: First results of this work have been presented last year, but since then further research has been performed on the stability analysis and benchmark model.

Key-words: machine learning in insurance, non-life catastrophe risks, catastrophe bond pricing, primary market spread prediction, random forest, minimal depth importance, permutation importance.

Efficient subsampling for time-series ensemble learning

This article uses the artificial delete-d jackknife to structure an efficient subsampling scheme for time-series ensembles of trees. This methodology is superior to bootstrap since it builds a larger and unique range of subsamples, and it is particularly advantageous when the number of observed time periods is small. This paper shows forecasting results for economic activity, commodity prices and financial returns.

Interrupted Brownian Motion

We study a variation of the Brownian motion which we will call interrupted Brownian motion. The interrupted Brownian motion can be interpreted as a continuous version of a Brownian motion whose paths within a specified interval have been eliminated. We then find the joint distribution of the process and the number of interruptions, N(t) using martingale methodology. We also study the excursions of the interrupted Brownian motion using the perturbation method with exponential time.

Bayesian Confirmatory Factor Analysis

Factor analysis (FA) and latent variable models (LVM) have been widely used for reducing the dimension of large data matrices and for testing directional and non-directional hypothesis among the latent variables and covariates. Our research focuses on Bayesian approaches to the latter case, referred to as Confirmatory Factor Analysis (CFA), which aims at examining a hypothesized structure for the data. Previous work by Muthen and Asparouhov (2012) proposed a Bayesian framework that can be seen as a Bayesian equivalent to modification indices. We extend that framework to allow i) inspection of differences between the CFA model and data, both at population and individual level, and ii) easier extensions outside the fully Normal context of continuous data. We also propose a new goodness of fit measure based on cross validation.

Representation learning for relational data

In this talk, we consider representation learning of relational data modeled by (hyper-)graphs. The talk is divided into two parts. First, we talk about the work on beta model, a model of random hypergraphs. We will show easy-to-interpret conditions for the existence and uniqueness of MLE. This result is derived from a polytope-type condition in literature. We further extend the work to hypergraphs with random design matrices. The results include a tight threshold for the full rank condition and a sufficient condition for the existence and uniqueness of MLE. In the rest of the talk, we discuss interesting open questions in the area of relational learning.

Learning dynamic GANs via Causal Optimal Transport

We introduce COT-GAN, an algorithm to train generative models to produce sequential data. We use transport-based costs and an entropic penalization that allows the use of Sinkhorn divergences. In order to take into account sequentiality, we impose the causality constraint on the transports plans. Remarkably, this naturally provides a way to parametrize the cost function that will be learned by the discriminator.

Automatic Model Selection Method for Time Series

Change Point Detection (CPD) problem is one class of the prominent questions arising from the sudden occurrence of variations in time series data. Many approaches have been proposed to deal with CPD problems which fit different model structures. However, it seems not a simple task for researchers, especially those not familiar with this area, to choose a proper method for their CPD works. An automatic model selection procedure is hence really useful for time series analysis under possible change-points or other kinds of non-stationarity. In this research, we are particularly interested in finding structural breaks in levels/trends of mean non-stationary process, which can be represented as a piecewise-parametric signal plus dependent noise. Due to dependent error process, many existing studies conducted primarily based on the assumption of independent noises are not applicable in this case. Therefore, we shall start with modifying existing estimation methods under more general assumptions. In particular, an appropriate estimator for the long run variance (LRV) of the error process should be found at first to describe the dependence structure of models.

In this presentation, we shall first summarize the two broad classes of LRV estimators: residual- and difference-based estimators. We will show the comparison results of the efficacy of several popular difference-based estimators. Also, we shall introduce a new wavelet-based estimator, which lies somewhere in between the two classes. It is developed by combining traditional Haar wavelets and the structure of a local linear difference-based estimator. Since LRV is given by summation of autocovariances at all lags, properly eliminating the signal before taking differences may lead to a better estimation performance.

Adaptive Learning to Match

We consider adaptive learning for finding a maximum matching in a graph with stochastic edge weights, which are independent across different edge experiments. The maximum matching problem is one of the most fundamental combinatorial optimisation problems with a variety of applications, e.g. in the context of online platforms, crowdsourcing, and recommender systems. Each graph edge has a stochastic weight whose expected value is a function of unknown parameters associated with its vertices. We consider the efficiency of learning with respect to both regret minimisation and sampling complexity. We present tight regret bounds for the noiseless case where vertices have {0,1} valued parameters and edge weights are according to either MAX, MIN, XOR or Constant Elasticity of Substitution functions. We then present our more recent results on the sampling complexity for the noisy case, with [0,1] valued vertex parameters and independent Bernoulli edge experiments with mean equal to the product of the vertex parameters. The latter problem is akin to the problem of low-rank matrix factorization from noisy observations.

Exact generation of two-parameter Poisson-Dirichlet random variables

We consider a random vector $(V_1, \dots , V_n)$ where $\{V_i\}_{i=1, \dots, n}$ are the first $n$ components of a two-parameter Poisson-Dirichlet distribution. Based on their joint Laplace transform, we propose an exact generation method for the random vector. Furthermore, a special case arises when $\theta /\alpha$ is an integer, for which we present a very fast modified generation method using a compound geometric representation of the Laplace transform. Numerical examples are provided to illustrate the accuracy and effectiveness of our methods.

Statistical Inference for the Voter Model

How to present, measure, and learn in social and economic network settings have been subject of much research. Some of the key question include understanding the limit points and long-term behaviour of the underlying dynamics, time to convergence to a consensus state as well as inferring the model parameters from observed data (e.g. inferring network interaction parameters from observed time series of node states.)

In this talk, we will present results on the inference of discrete-time voter model. In the voter model, in each time step, each node adopts the state of a randomly sampled neighbour. The unknown parameter is a stochastic matrix defining the sampling probabilities. We will present an upper bounds on the maximum likelihood estimation error by using the framework, from high-dimensional statistics, known as M-estimators. Our results are derived by using new bounds on the consensus time (both for expected value and probability tails) and recent concentration bounds for functionals of geometrically ergodic Markov chains. We have also derived a sampling complexity lower bound. Finally, we will present results of our numerical experiments to demonstrate some of the key properties of the underlying inference problem.

Wednesday 29 and Thursday 30 May 2019

Bayesian inference for hidden semi-Markov models

Economies and financial markets move in cycles. A long period of expansion is often interrupted by an abrupt shock, followed by a prolonged recession period. Appropriately modelling such phenomena is of great value and importance in practice. Government policy, economic decision-making and several areas in financial institutions could all significantly benefit from advancement in this research area. As a starting point, basic hidden Markov models (HMM) have often been used to describe economic behaviour, but these kinds of models possess some major weaknesses, because (1) the duration in all states follows an implicit geometric distribution and (2) the number of hidden states must be set a priori. To address the first problem, one may explicitly model the duration in each cycle, which vastly enriches model capabilities and its suitability to describe economic and financial data at the cost of higher computational complexity and less tractability. Such models are known as hidden semi-Markov models (HSMM). There has also been increasing interest in HMMs and its extensions from a Bayesian (nonparametric) setting, which tries to tackle the second fundamental problem of hidden Markov models by providing a powerful framework for inferring arbitrarily large state complexity from data. Unfortunately, Bayesian implementations of HSMMs typically suffer from a high time complexity and large auto-correlation. Recent advancements in MCMC research, however, provide a more general, powerful framework for such state space models. My research focus is thus on simultaneously tackling both weaknesses of basic HMMs by using recent developments in MCMC literature to implement efficient Bayesian inference algorithms for HSMMs, which can then be used to describe economic cycles.

An autocovariance-based framework for curve time series

It is commonly assumed in functional data analysis (FDA) that samples of each functional variable are independent realizations of an underlying stochastic process, and are observed over a grid of points contaminated by i.i.d. measurement errors. In practice, however, the temporal dependence across curve observations may exist and the parametric assumption on the error covariance structure could be unrealistic. We consider the model setting for serially dependent curve observations, when the contamination by errors is genuinely functional with a fully nonparametric covariance structure. The classical covariance-based methods in the FDA are not applicable here due to the contamination that can result in substantial estimation bias. We propose an autocovariance-based framework to address error-contaminated curve time series problems. Under the proposed framework, we discuss several important problems in FDA, e.g. dimension reduction, functional linear regression, singular component analysis and high dimensional applications.

Epstein-Zin Utility in Infinite Horizon

We study the existence and uniqueness of Stochastic Differential Utility of Epstein-Zin (EZ) type in infinite horizon. This work is motivated by the studies of Bansal and Yaron, who applied discrete-time EZ utilities in their so-called Long Run Risk Model to explain explain a number of asset pricing anomalies, such as the equity premium puzzle and asset price volatility puzzle. The time-additive utility approach, although technically convenient, struggles empirically to explain these phenomenons - as it does not allow for independent parameterisation of the investor's attitude towards risk and intertemporal substitution. From a mathematical point of view, this involves solving a BSDE in infinite horizon with a non-Lipschitz generator, whose solution does not satisfy transversality conditions.

Nonzero-sum stochastic differential games between an impulse controller and a stopper

We study a two-player nonzero-sum stochastic differential game where one player controls the state variable via additive impulses while the other player can stop the game at any time. The main goal of this work is characterize Nash equilibria through a verification theorem, which identifies a new system of quasi-variational inequalities whose solution gives equilibrium payoffs with the correspondent strategies. Moreover, we apply the verification theorem to a game with a one-dimensional state variable, evolving as a scaled Brownian motion, and with linear payoff and costs for both players. Two types of Nash equilibrium are fully characterized, i.e. semi-explicit expressions for the equilibrium strategies and associated payoffs are provided. Both equilibria are of threshold type: in one equilibrium players’ intervention are not simultaneous, while in the other one the first player induces her competitor to stop the game. Finally, we provide some numerical results describing the qualitative properties of both types of equilibrium.

Function-on-Function Linear Lagged Regression in High Dimensions

In modern experiments, many problems involve the analysis of high-dimensional functional time series data. However, existing studies rely primarily on the assumption of independent samples. In this paper, we focus on Gaussian functional time series and investigate the properties of l₁-regularized estimates of function-on-function linear lagged regression model. We propose a cross-spectral functional stability measure for two correlated multivariate stationary processes and establish some useful concentration bounds on sample cross-covariance matrix. Under the functional principle component analysis (FPCA) framework, we establish some concentration properties of the relevant estimated terms, derive nonasymptotic upper bounds on the errors of the regularized estimates in high dimensional settings, and demonstrate its performance with simulation studies.

Keywords: Functional linear regression; functional time series; large p; small n;

Conditional Heteroschedastic Dynamic Factor Models

Dynamic factor models are more and more used in data rich settings. Several procedure, such as PCA and Kalman Filter, have been developed for the correct estimation of latent factors and corresponding model parameter with both large numbers of cross-section units (N) and time series observations (T)

In previous applications, however, the dynamics of the factors have been modelled as i.i.d. white noise. Yet, we know that many financial variables exhibit a non-zero correlation. In literature, this feature is modelled by GARCH models which assume that returns are conditionally heteroschedastic, i.e. their variance and covariance conditional on the past evolve with time.

Diebold and Nerlove (1989) specify a multivariate time series model applied to exchange rates with an underlying latent variable whose innovations are ARCH. Harvey, Ruiz, and Sentana (1992) derive the KF estimator in the finite N case, arguing against the good capacity of the estimator to correctly extract the unobserved state.

In the setting of high N dimension - proper of factor analysis - the model is firstly converted in state-space form explicitly taking into account heteroschedasticity. Subsequently, the Klaman Filter is applied to jointly estimate the parameter by the use of the Expectation Conditional Maximization Either (ECME) algorithm.

Change-point detection for high-dimensional time series

In the 1950s change-point detection methods were first developed in the context of statistical process control. The goal was to identify the time when the quality of the production starts to deteriorate. Nowadays, change point detection methods are applicable in many fields, including finance, economics, and neuroscience. We consider the simple model of change in the mean structure of a time series. Much attention has already been given to analysing this problem in the univariate and multivariate setting. Recently, the attention has been shifted to high-dimensional data due to advancements in computer technology and its availability. We will review and compare some methods for detecting change points in mean of panel data proposed in the literature. Furthermore, we will study the problem of estimating the number of coordinates with change.

Conditional Variational Inference for Hierarchical Bayesian Nonparametric Models

The current methods of inference for hierarchical Bayesian nonparametric models are computational intensive due to the use of stick-breaking representation and are biased because of uniformly finite truncation. We propose a novel approach by optimising the posterior functionally rather than updating the parameters in truncated stick-breaking representation individually. Moreover, we use conditional instead of independent variational distribution, and therefore achieve more accurate and efficient algorithms to infer hierarchical Bayesian nonparametric models.

A Double Mixture IRT Model for Detecting Cheating Behaviour

This work was motivated by the need for simultaneously identifying test questions that may have been leaked beforehand and examinees who may have had prior access to them. We propose a double mixture model which incorporates a person-specific latent classifier and an item-specific latent classifier. We fit the model under an empirical Bayes framework and give optimal decision rules with regard to flagging compromised items and potential cheaters.

"Cat" Random Forest. A tool for predicting spreads in the primary catastrophe bond market

The goal of this study is to build a tool for predicting spreads in the primary market for catastrophe bonds. In doing so, a machine learning method called random forests is used. The latter choice is based on the fact that the aim is to predict rather than explain the spreads, it is capable of embracing unorthodox characteristics of catastrophe bond data sets where previous literature had struggled to deal with and also provides internal measures of variables importance in prediction. Except for its forward-looking direction, novel aspects of the study include a rich data set of 934 observations - the largest that has ever been studied by now in the primary market setting and the incorporation of two new variables (coverage type and vendor) which have never been examined before either. The random forest built, called "Cat" Random Forest, has impressive predictive power on unseen data explaining 93% of total variability after the main hyperparameter of the method is appropriately tuned. Stability checks are performed both for "Cat" Random Forest and the two internal importance measures derived. "Cat" Random Forest prediction accuracy and one of the two importance measures are robust to changes in the catastrophe bond learning set. Links to previous literature with regards to variable importance results are drawn and a discussion is made on the applicability of the method in an industry context and further academic research opportunities. The findings are useful as, at a high level, they show that bespoke catastrophe bond deals can unite under a single pricing method as if they were traded under the umbrella of an "ordinary" financial market.

Key-words: machine learning in insurance, insurance-linked securities (ILS), non-life catastrophe risks, catastrophe bond pricing, primary market, random forest, prediction, minimal depth importance, permutation importance, per occurrence coverage, aggregate coverage, trapped capital, size and term interactions.

Predicting in a L_p sense the last zero of an spectrally negative Levy process

Given a spectrally negative Lévy process drifting to infinity, we are interested in the last time g in which the process is below zero. At any time t, the value of g is unknown and it is only with the realisation of the whole process that we can know when the last zero of the process occurred. However, this is often too late, we usually are interested in knowing how close is the process to g at time t and take some actions based on this information. We are interested in finding a stopping time which is as close as possible to g (on an L_p distance). We prove that solving this optimal prediction problem is equivalent to solving an optimal stopping problem in terms of a two dimensional Markov process which involves the time of the current excursion away from the negative half line and the Lévy process. We state some basic properties of the last zero process and prove the existence of the solution of the optimal stopping problem. Then we show the solution of the optimal stopping problem (and therefore the optimal prediction problem) is given as the first time that the process crosses above a non-increasing and non-negative curve dependent on the time of the last excursion away from the negative half line.

Selecting time-series hyperparameters with the artificial jackknife

Delete-d jackknife (Wu, 1986; Shao and Wu, 1989) is a powerful subsampling method. This article generalises it to multivariate stationary time series problems introducing the artificial delete-d jackknife: a methodology based on missing values imputation. This paper describes it focusing on hyperparameters selection problems in forecasting. As an illustration, it is used to regulate vector autoregressions with elastic-net penalty on the coefficients. Typical estimation methods for these models are not compatible with the artificial jackknife, since they cannot handle missing observations in the data. Therefore, this paper contributes further by developing an algorithm to overcome this complexity.

Interrupted Brownian Motion

Representation learning for relational data

In this talk, we will consider representation learning of relational data which can be represented by a hypergraph. A hypergraph is defined by a set of nodes and a set of hyperedges (subsets of nodes). Many relational data can be naturally represented by hypergraphs – e.g. sets of items in recommender systems, entities and relations in knowledge graphs, and teams of online users in online platforms. For example, in online labor platforms, nodes may represent online workers, hyperedges may represent teams of online workers, and the existence of a hyperedge may indicate a successful team collaboration. A challenge is to learn representations of individual items that explain observed data at a hyperedge level, e.g. to learn skills of individual users from observed team performance outputs. Based on these representations, we are able to answer queries such as identifying workers with high abilities and predicting the outcome for new projects.

We will consider a model of random hypergraphs known as the generalized beta model, where each node is associated with an unknown scalar parameter and a hyperedge exists with a probability that is according to a logistic function of the sum of individual node parameters. The well-known beta model of random graphs is accommodated as a special case. There are several fundamental open questions for generalized beta models such as (a) statistical inference questions on the existence and uniqueness of the maximum likelihood estimate of the model parameters and how this is related to graph-theoretic properties of the observed data, and (b) computational and statistical efficiency of algorithms for parameter estimation.

We will discuss relations between different conditions for the existence and uniqueness of the maximum likelihood parameter estimator. A key property for bounding the mean squared error of the maximum likelihood parameter estimator is the smallest eigenvalue of the Gram matrix of the design matrix. For beta models, we will show that this eigenvalue is strictly positive if, and only if, a certain graph-theoretic measure of non-bipartiteness is strictly positive. We will further show how this correspondence can be extended to certain classes of hypergraphs.

For the rest of the talk, we will review known results on the efficiency of algorithms for parameter estimation and certain relevant statistical hypotheses testing problems. The goal is to identify all items or a set of items with high abilities with a minimum sampling complexity. This will cover the work on group testing which are commonly studied for Boolean parameter vectors. This existing work provides results on the computation complexity of algorithms under certain assumptions how hyperedges are tested (either in an adaptive or a non-adaptive manner). We will discuss interesting open questions in this direction, as well as some alternative models that try to capture intricate interactions between items and their group performance.

Model-based Reinforcement Learning: Recurrent Neural Networks for Planning in Robotic Control

Model-free reinforcement learning has drawn great attention due to the success of AlphaGo and the recent OpenAI Five - an algorithm that has defeated human world champion team in the game of Dota 2, etc. However, model-free reinforcement learning typically is not data efficient, meaning that it requires a huge number of games to learn the controller. To address this problem, model-based reinforcement learning is used to learn the underlying dynamics before training the controller from the learned model of dynamics. In my work, a recurrent neural networks, which can also been seen as a deterministic version of state-space model, is trained to learn the dynamics. In comparison to the state-of-art model-free methods, this method has achieved a much faster speed of learning.

Learning for Assignment Problems under Uncertainty

Many problems in online advertising, recommender systems and crowdsourcing services can be formulated as an assignment problem, i.e., assigning items to slots to maximize some return under uncertainty. The uncertainty could be about the types of items/slots or the reward induced by an assignment. However, the pattern of this uncertainty is stationary and therefore learnable by assigning items to slots multiple times. We formulate this as a stochastic optimization problem with unknown but learnable parameters. And our goal is to maximize the cumulative return over time.

How to use the feedback information from the previous assignments to adaptively optimize the future reward is a challenging problem. There is a fundamental trade-off in this decision making process over time: exploration (i.e., making new and perhaps suboptimal assignments to learn the parameters) vs. exploitation (i.e., committing to the assignment with the best performance up to now). Besides, the reward of one assignment is usually a finite sum of submodular functions (i.e., the functions with diminishing marginal returns property) for which no polynomial-time algorithm can obtain an approximation ratio better than 1−1/ unless P=NP. The assignment problem under uncertainty extends the traditional multi-armed bandit problem which can be viewed as assigning 1 item to slots. Moreover, the assignment problem offers a unified framework for analyzing many important problems in different areas, e.g. recommender systems (assigning items to users), group testing (assigning samples to pools), online advertising (assigning ads to slots), collaborative working (assigning workers to jobs), and cloud computing (assigning jobs to servers).

In this presentation, we will summarize related work on the optimisation of assignments under uncertainty. We will focus on a class of assignment problems where the objective function is a welfare function defined as a sum of valuation set functions for pairs of items. These valuation set functions are defined to be functions of individual item values, e.g. minimum value (weakest link) or maximum value (strongest link). Such models have been recently studied with motivating application scenarios in the context of collaborative teamwork. This problem amounts to learning a maximum matching of items by incurring a small regret. We will present some preliminary results and discuss interesting open research questions for this class of problems.

Exploiting Disagreements between High-Dimensional Variable Selection for Uncertainty Visualization

We propose a new variable selection method, which identifies the set of true covariates and visualizes selection uncertainties by exploiting the similarities and disagreements among different variable selection methods. Our proposed method selects covariates that appear most frequently among different variable selection methods on subsampling data and visualizes variable selection uncertainties by utilizing disagreement over the selections. The method is generic and can be used with different existing variable selection methods. We demonstrate its variable selection performance using real and simulated data. The new method and its uncertainty illustration tool are publicly available as an R package. The interactive version of the graphical tool is also available online.

Parisian time of Brownian motion on simple graph with skew semiaxes

We derive the Parisian time of a Brownian motion moving on a simple graph, where there are $n$ semiaxes sharing a same origin, and on each semiaxis there is a unique excursion time threshold. The Brownian motion switches within the semiaxes at the origin according to a predefined transition probability. We are interested in the first time that the excursion time threshold is exceeded. The Laplace transform of the Parisian time, as well as its density and asymptotic behaviour, are derived. Moreover, we provide an example to show how this simple graph reduces to the real line, and in this case our results correspond to the existing formulas on real line. If time permits, the following topic will also be discussed: First Hitting time of Brownian motion on simple graph with skew semiaxes.

Statistical Methods for Social Network Data Analysis

We believe that user behavioural features can be inferred through analysis of signals gathered in online systems. This is important for detecting abnormal activities in social networks and has become increasingly important due to events such as US presidential elections and the recent terrorist attack in New Zealand. Providing robust statistical methods to accurately detect abnormal patterns in social networks is much in demand.

We will focus on studying how opinions are formed based on information individuals observe through interactions in a network. This will be modelled by stochastic processes on graphs, where nodes represent individuals and edges represent relationships between individuals. Individual opinions may be affected by their past opinions and those of their network neighbours, newly available information (public or private) and the ability of learning information. We will consider the class of models known as majority-dynamics for opinion formation, which include classical voter model as a special case. In these models, individuals interact with other individuals with rates of interaction (peer-to-peer influence) which are assumed to be unknown parameters and need to be inferred from observed data.

The following questions will be studied in my research: (a) How can we accurately estimate node interaction rates in opinion formation models? (b) What can we say about the estimation of unknown matrix of interaction rates from observed time series data of node states? (c) How can we design new models and methods to perform online analysis for social network data? Such questions have been addressed for some classes of opinion formation models such as the well-known Friedkin-Johnsen type of models and related linear and generalised linear autoregressive models. Our work will aim to advance the state of the art by developing statistical foundations for inference for the broad class of models known as majority-dynamics on networks.

Tuesday 8 and Thursday 10 May 2018

Title: Regression with Dependent Functional Errors-in-Predictors

Abstract: Functional regression is an important topic in functional data analysis. Traditionally, in functional regression, one often assumes that samples of the functional predictor are independent realizations of an underlying stochastic process, and are observed over a grid of points contaminated by independent and identically distributed measurement errors. However, in practice, the dynamic dependence across different curves may exist and the parametric assumption on the measurement error covariance structure could be unrealistic. In this paper, we consider functional linear regression with serially dependent functional predictors, when the contamination of predictors by measurement error is "genuinely functional" with fully nonparametric covariance structure. Inspired by the fact that the auto covariance operator of the observed functional predictor automatically filters out the impact of the unobserved measurement error, we propose a novel generalized method of moments estimator of the slope parameter. The asymptotic property of the resulting estimator is established. We also demonstrate that the proposed method significantly outperforms possible competitors through intensive simulation studies. Finally, the proposed method is applied to a public financial dataset, revealing some interesting findings.

Title: GARCH Dynamic Factor Models

Abstract: Dynamic factor models are more and more used in data rich settings. Several procedure, such as PCA and Kalman Filter, have been developed for the correct estimation of latent factors and corresponding model parameter with both large numbers of cross-section units (N) and time series observations (T)In previous application, however, the dynamics of the factors have been modelled as i.i.d. white noise. Yet, we know that many financial variables exhibit a non-zero correlation. In literature, this feature is modelled by GARCH models which assume that returns are conditionally heteroschedastic, i.e. their variance and covariance conditional on the past evolve with time.Diebold and Nerlove (1989) specify a multivariate time series model applied to exchange rates with an underlying latent variable whose innovations are ARCH. Harvey, Ruiz, and Sentana (1992) derive the KF estimator in the finite n case, arguing against the good capacity of the estimator to correctly extract the unobserved state.In the setting of high N dimension - proper of factor analysis - the model is firstly converted in state-space form explicitly taking into account heteroschedasticity. Subsequently, the Kalman Filter is applied to jointly estimate the parameter by Maximum Likelihood.

Title: A Mean-Field Game Production Model on Market Saturation

Abstract: Conceptually, market saturation is a situation where a product has become diffused (distributed) within a market, the industry has matured, and the price, after experiencing high rate of increment during decades, has finally stabilised. Despite their importance, relatively limited literature addresses these issues in a quantitative way. Among the existing few, many only describe them in a phenomenological way, using, e.g. a logistic curve. In this paper, we try to find the rationale behind this effect using mean field game theory.In a new industry, when demand is still low and the whole capacity of the industry is small, the producers in the industry tend not to invest too much to expand. As the demand grows, the original capacity will not suffice and thus allowing the price to quickly increase. However, at this stage, the producers in this industry will get sufficient cash flow and thus have much greater ability to invest to expand the capacity. Soon or later, the expansion of capacity can catch the growth of demand and thus preventing further significant growth of the price. At this stage, the industry will become saturated.In our paper, we used mean field game theory in modeling this process. We deem there are many producers and the capacity of each of them as the state variable and can be controlled by them. They compete in a market where demand is given as exogenous. In maximizing their own profit, producers compete with each other and, at the same time, can choose to invest to improve their capacity. This process will result finally in market saturation.

Title: An Economic Bubble Model and Its First Passage Time

Abstract: We introduce a new diffusion process to describe economic bubble dynamics. The process can be treated as a scaled version of log-transform on the Shiryaev process, where our study shows the new scaling parameter is crucial for modelling economic bubbles. We conduct fundamental analysis and prove the process and its first passage time are well-defined. Besides, a series of closed-form solutions on the process and its distribution functions are given. Especially, by solving the Fokker-Planck equation we show the process follows an exponential-Gamma distribution at infinity time. Moreover, by employing the perturbation technique, we deduce the closed-form density for the downward first passage time. Therefore, based on the model, the burst time of an economic bubble can be estimated accordingly. The object of this study is to understand asset price dynamics when a financial bubble is believed to form, and correspondingly provide estimates to the bubble’s crash time. Calibration examples on the US dot-com bubble and the 2007 Chinese stock market crash verify the effectiveness of the new model. The example on BitCoin prediction confirms that we can provide meaningful estimate to the downward probability of asset prices.

Title: Bayesian Nonparametric Modelling for Bitcoin Network

Abstract: Nowadays the price of Bitcoin has shown high volatility associated with booms and crashes. The bitcoin transactions construct network if each address and transaction is regarded as a vertex and an edge respectively. However, there is not a good probabilistic model of this network due to its sparsity, instability and large number. In my research, I try to use Bayesian nonparametric methods to model the weighted and directed network of Bitcoin transactions.

Title: The forward search for detecting outliers and clusters in multivariate data

Abstract: The forward search (Atkinson, 1994) is a graphical method which aims to detect masked outliers. The search starts from a small, outlier-free subset and increases the subset size in a sense that outliers are unlikely to be included until the very last steps. Outliers can be revealed by plotting the evolution of appropriate statistics being monitored (e.g. model-implied residual and t measures). We extend this method to latent class models for binary data with a bootstrapped p-value for bivariate residuals being monitored. However, the misspecification of the number of latent classes often makes the forward search fail to separate outliers from the rest of the data. Atkinson et al. (2013) use a large number of random start forward searches for clustering continuous multivariate data. We apply this idea to detecting the population heterogeneity not captured by misspecified models. Model and simulation settings considered here include latent class models and factor mixture analysis models for binary and continuous data.

Title: Factors influencing the price of a cat bond

Abstract: Estimation of expected loss and determination of risk load are requisites for catastrophe bond initial offering price formation. Although expected loss quantification also involves a high degree of uncertainty, the focus of this work lies on the risk load aspect. This is because, unlike expected loss, there is no industry wide accepted methodology with regards to the way it should be determined.Its level is dependent on investors' perception with respect to the riskiness of the cat bond deal in question which alternates in line with factors that we will examine. The latter are explored through the lens of investors' current portfolio state. The upper goal is to identify the most powerful risk load determinants out of all of those presented and find ways to incorporate them into the cat bond pricing formula. Given the important role of investors' personal preferences here, we will employ an entropic risk measure methodology for addressing our cat bond pricing problem.

Title: Predicting the last zero of a spectrally negative Lévy process

Given a spectrally negative Levy process drifting to infinity, we are interested in the last time g in which the process is below zero. At any time t, the value of g is unknown and it is only with the realisation of the whole process when we can know when the last zero of the process occurred. However, this is often too late, we usually are interested in know how close is the process to g at time t and take some actions based on this information. We are interested in finding a stopping time which is as close as possible to g (on an Lp distance). We prove that solving this optimal prediction problem is equivalent to solve an optimal stopping problem in terms of a three dimensional Markov process which incorporate the last zero process and the Levy process. We state some basic properties of the last zero process and prove the existence of the solution of the optimal stopping problem. Then we show the solution of the optimal stopping problem (and therefore the optimal prediction problem) is given as the first time that the process crosses above a non-increasing and non-negative curve dependent on the time spent above zero by the Levy process.

Title: Cross-validation and ensemble methods for short time series

Abstract: Developments in machine learning deeply impacted a broad range of scientific fields, including computational statistics, biology and medicine. Social sciences were less influenced from these novelties, since they use relatively short time series, with high autocorrelation. This paper attempts to solve these data limitations, by proposing a bootstrap method to increase the number of unique random subsamples that can be generated from time series. This technique increases the randomness of ordered partitions indirectly, by imposing patterns of missing observations at random. The advantages for cross-validation and ensemble forecasting methods are discussed, and forecasting results are presented using macro-finance time series.

Title: American Multi-Asset option pricing under Lévy copulas and Ballotta-Bonfiglioli model

Abstract: Our aim is to price American Multi-Asset options using Lévy copulas and Ballotta-Bonfiglioli model to capture the dependence between the n underlying Lévy processes. In particular we focus on basket options on two equally-weighted underlying assets, which are modelled as Variance-Gamma processes. In the case of Lévy copulas our results are achieved via simulation: we use Tankov’s work for the simulation part and then use Longstaff-Schwartz regression model to price the options. As for Ballotta-Bonfiglioli model, we extend the work by Hirsa and Madan and then by Fiorani to a multidimensional case, and we therefore price our basket options via generator, i.e. by solving the partial integro-differential equation, which we do numerically through finite-difference method.

Title: Large volatility matrix estimation by non-parametric eigenvalue regularization for high-frequency data

Abstract: In financial practices, we often encounter a large number of assets. The availability of high-frequency financial data makes it possible to estimate the large volatility matrix of these assets. First, it is known that the extreme eigenvalues of a realized covariance matrix are biased when the dimension p is large relative to the sample size n. Second, due to the non-synchronous trading and contamination of microstructure noise, the normal realized volatility estimation performs poorly. In this paper, we introduce a nonparametric eigenvalue regularization, which does not assume specific structures for the underlying integrated covariance matrix, to multi-scale realized volatility, kernel realized volatility and pre-averaging realized volatility estimation. Meanwhile, jumps remove (Fan and Wang 2007) are also considered into our proposed method. In theory, without jumps, kernel realized volatility and pre-averaging realized volatility work well at the order n^{1/5} and multi-scale estimator has the order n^{1/6}. However, with jumps, only multi-scale and pre-averaging estimation can theoretically converge to true value. Simulated data analysis is included to check the finite sample performances of the proposed threshold estimators.

Title: A spectral EM algorithm for estimation of Dynamic Factor Models

Abstract: This year, I will discuss the second topic in my (planned) thesis, which is broadly on the development of statistical methodologies for Dynamic Factor Models (FMs). For my first topic, I proposed a method for sequential change-point detection in an approximate static FM. For the second, I present a spectral EM algorithm for the estimation of an approximate dynamic FM. In particular, I derive the E and M step equations in the frequency domain. The E step relies on the Wiener-Kolmogorov (WK) smoother. The M step is based on maximisation of the Whittle Likelihood with respect to the parameters of the model. I am currently working on providing proofs of N,T consistency (with rates) of my estimators. During my talk, I will discuss my model and assumptions, initialisation and estimation procedure, and sketches of my proofs. Time permitting, I may close with some preliminary simulation results which compare my procedure with competing time-domain techniques.

Title: Gaussian Process Regression and Classification

Abstract: The presentation provides an introduction to Gaussian process regression and classification. It compares full Bayesian methods to empirical Bayesian methods using various data sets and demonstrates the difference between a Gaussian Process prior and I-prior.

Title: High dimensional variable selection via median aggregation

Abstract: Variable selection in high dimension has been widely discussed in the past decade. Many different variable selection methods have been proposed and conditions to provide some performance guarantee have been presented. It is usually not practical, however, to use those conditions to select the best method. In practice cross validation is used instead. As it is difficult to select the best among all candidate methods, instead we propose to aggregate the fitted results from the candidate methods via median. Numerical examples show our method in general outperforms the popular 10-fold cross validation method in terms of prediction, coefficient estimation and variable selection. Some variations of the median aggregate method is also discussed.

Title: A policy iteration algorithm for nonzero-sum stochastic impulse games

Abstract: This work presents a novel policy iteration algorithm to tackle nonzero-sum stochastic impulse games arising naturally in many applications. Despite the obvious impact of solving such problems, there are no suitable numerical methods available, to the best of our knowledge. Our method relies on the recently introduced characterisation of the value functions and Nash equilibrium via a system of quasi-variational inequalities. While our algorithm is heuristic and we do not provide a convergence analysis, numerical tests show that it performs convincingly in a wide range of situations, including the only analytically solvable example available in the literature at the time of writing.

Title: Weak Representation of Maximal Value of SDE via Brownian Bridge Approach

Abstract: A Brownian Bridge Representation approach to estimate the expectation of maximal value of SDE is proposed. It is shown that, based on same time segmentation, this approach improves performance of Euler scheme. Examples are given to some SDEs where their expectation of maximal value are explicitly known, hence one can compare the performance.

Title: Limit Theorems for Occupation Time of Time Elapsed Process since Last Zero of Bessel Process of Negative Index

Abstract: We consider a renewal process U, defined as the time elapsed since last zero of a Bessel process of index -1 < v < 0, in particular for v = - ½ that returns a Brownian motion. The entire work is established around a series of piecewise-deterministic Markov process involving U, where we assume the jump arrivals are driven by a general function of U. Within this framework, distributional properties of the process along with the problem of the next jumping time are derived. In terms of the simulation, we develop accurate and efficient schemes for generating the jumping time, jumping level and their joint. Building upon the above achievements, we prove a range of limit theorems relating the normalised occupation time of a rescaled U and/or of a rescaled Bessel to the local time of the Bessel process, as well as their difference in quantity in each case converges of a rate of ½ to a Brownian motion with a stable subordinator.

Monday 8 and Tuesday 9 May 2017

Title: Optimal Transport and its connection to the Principal-Agent problem

Abstract: There is an increased interest in the Principal-Agent problem in mathematical economics in recent years, due to progresses in Backwards Stochastic Differential Equations. However, results are still limited to strong conditions or functional form of utility functions. On the other hand, the settings of these problems lend themselves nicely to techniques from Optimal Transport, a rich and long-standing branch of Mathematics. In this presentation we explore some fundamentals of Optimal Transport and its link to the Principal-Agent problem.

Title: Dynkin Games with Information Asymmetry

Abstract: Dynkin Games have been studied a lot since they were introduced by Dynkin in 1967. However, we still have few results regarding these games in an asymmetry of information setting. In particular, we are interested in the case in which one player has more information than the other one in a discrete time framework. To do so, we want to use the enlargement of filtration argument using the results in the paper by Jeanblanc et al.

Title: Integrating Regularized Covariance Matrix Estimators

Abstract: Covariance regularization is important when the dimension p of a covariance matrix is close to or even larger than the sample size n. One branch of regularization assumes specific structures, like sparse, banded, or a having a factor structure for the true covariance matrix Σ0. Another branch regularize on the eigenvalues directly without assuming these structures. Under the practical scenario where one is not 100% certain of which regularization method to use, we introduce an integration covariance matrix estimator which is a linear combination of a rotation-equivariant and a regularized covariance matrix estimator that assumed a specific structure for Σ0. We estimate the weights in the linear combination and show that they asymptotically go to the true underlying weights. To generalize, we can put two regularized estimators into the linear combination, each assumes a specific structure for Σ0. Our estimated weights can then be shown to go to the true weights too, and if one regularized estimator is converging to Σ0 in the spectral norm, the corresponding weight then tends to 1 and others tend to 0 asymptotically. We demonstrate the performance of our estimator when compared to other state-of-the-art estimators through extensive simulation studies and a real data analysis.

Title: Importance Sampling: Crossing Probabilities for Compound Poisson Barriers by the Running Maximum of Brownian Motion

Abstract: In the previous work, the first hitting time for linear barriers by the running maximum of Brownian motion was simulated by an acceptance-rejection method. This time we look at the Poisson Barrier case and the simulation method of the crossing probabilities. An upper bound for the boundary crossing probability is obtained from the original measure. Importance sampling methods are further used to obtain a better bound and to reduce the variance of the simulation via exponential tiling and Poisson martingales

Title: Binary probit regression with I-priors

Abstract: An extension of the I-prior methodology to binary response data is explored. Starting from a latent variable approach, it is assumed that there exists continuous, auxiliary random variables which decide the outcome of the binary responses. Fitting a classical linear regression model on these latent variables while assuming normality of the error terms leads to the well-known generalised linear model with a probit link. A more general regression approach is considered instead, in which an I-prior on the regression function, which lies in some reproducing kernel Hilbert space, is assumed. An I-prior distribution is Gaussian with mean chosen a priori, and covariance equal to the Fisher information for the regression function. By working with I-priors, the benefits of the methodology are brought over to the binary case - one of which is that it provides a unified model-fitting framework that includes additive models, multilevel models and models with one or more functional covariates. The challenge is in the estimation, and a variational approximation is employed to overcome the intractable likelihood. Several real-world examples are presented from analyses conducted in R. Presentation material are made available at http://phd3.haziqj.ml.

Title: Dependency Elicitation using Expert Judgement

Abstract: Under the Solvency II regulatory framework, all insurance firms are required to specify the correlations between all the risks that may affect their solvency as well as the method used in estimating these correlations. In order to comply with the Solvency II regulations, many insurance companies rely on expert judgement to calibrate the correlations between risks when there is insufficient data to numerically quantify them.A current method used by a firm to estimate a correlation between two risks will be to use their panel of experts in order to come to an agreed estimate. Each expert brings their own opinion to the panel. Hence, it is neither known how to robustly model their opinions nor calculate the sensitivity of their beliefs to the resultant output.We used the principles from a specific kind of mathematical set theory called Fuzzy Logic to propose a model for the expert’s opinions and also to calibrate the value of the correlation. We provide a framework which models the elicitation process and the sensitivity of each expert’s opinion on the output. We have constructed the new model in R and conducted a sensitivity analysis from the perspective of a theoretical multinational insurer interested in examining the correlation of a mass lapse of insurance policies between two of its business units.

Title: On the Closed-Form Approximation for the First Passage Time Density of the Ornstein-Uhlenbeck Process

Abstract: We consider the first passage time (FPT) problem for the Ornstein-Uhlenbeck (OU) process at an arbitrary constant crossing level. By applying the parametric perturbation technique we are able to translate the OU FPT into a drifted Brownian motion (DBM) representation. We show that the initial boundary value (BV) problem for the OU process can be expressed in terms of a series of BV problems for a DBM with path integrals. Recursive solutions to the BV problems are obtained and which can be inverted explicitly to get the closed-form density approximation for the OU FPT. Calibration schemes for the perturbation parameter α and numerical comparisons with other known methodologies are discussed as well. In the end of this talk we will provide applicable examples on financial modelling and process simulations.

Title: Bi-scale Linear Regression Models for functional data

Abstract: We introduce a way of modelling inherent statistical dependence in random functions X(t) in the framework of classical scalar-on-function regression. Based on the discretised curves {X(0), X(1),..., X(T)} observed on the equispaced fine grids {t0, t1,..., tT}, the very last points X(T) are predicted from the past observations {X(0), X(1),..., X(T-1)}. The proposed model flexibly reflects the relative importance of predictors by enjoying both discrete and continuous regimes. Specifically, the influential observations located close to X(T) are treated as scalar predictors while relatively less weight is given in uninformative interval by using functional predictor. We give particular emphasis on the simplest case, one functional and one or more scalar predictors, which has the structure of partial functional linear regression. The scalar predictors are obtained from consecutive grid points, and thus estimating optimal number of scalar predictors is considered as detecting breakpoint which splits the whole interval into two according to its informativeness. The sequential estimation procedures for breakpoint and regression parameters are suggested and the consistency of estimated breakpoint is also shown. We derive the theoretical properties of the approach and the usefulness is demonstrated through simulations and two real data examples.

Title: Pricing American Multi-asset options under Lévy processes

Abstract: The aim of our research is to price American options depending on more than one underlying asset in the case that such underlying assets are represented by Lévy processes. For such purpose the dependence structure will be defined through Lévy copulas. Different types of American options will be considered and therefore also the used Lévy copulas will differ depending on what kind of dependence we will need to stress. We will therefore go through a description of the most important classes of Lévy copulas such as Archimedean, elliptical, stable and extreme-value copulas and some general definitions and theorems. Once the appropriate dependence structure has been identified, we will try to do the pricing of the option via generator, therefore using the partial integro-differential equation.

Title: Predicting the Last Zero of a Spectrally Negative Lévy Process

Abstract: Given a spectrally negative Lévy process, we are interested in predicting the last time (before an exponential time) g in which the process is below zero. At any time t, the value of g is unknown and it is only with the realisation of the whole process when we can know when the last zero of the process occurred. However, this is often too late, we usually are interested in know how close is the process to g at time t and take some actions based on this information.Stopping times are random times such that the decision whether to stop or not depends only on the past and present information. Due to this fact, the aim of this work is to predict the last zero by a stopping time. This is, find a stopping time which is as close as possible to the last time that a spectrally negative Lévy process is equal to zero (in the L1 sense). We prove that the problem can be reduced to a standard optimal stopping problem which can be solved using a direct method with the help of the general theory of optimal stopping.

Title: Spatial Lag Model with Time-lagged Effects and Spatial Weight Matrix Estimation

Abstract: This title considers a spatial lag model with different spatial weight matrices for different time-lagged spatial effects, while allowing both the sample size T and the panel dimension N to grow to infinity together. To overcome potential misspecifications of these spatial weight matrices, we estimate each one by a linear combination of a set of M specified spatial weight matrices, with M being finite. Moveover, by penalizing on the coefficients of these linear combinations, oracle properties for these penalized coefficient estimators are proved, including their asymptotic normality and sign consistency. Other parameters of the model are estimated by profile-least square type of estimators after introducing covariates which serve similar functions as instrumental variables. Asymptotic normality for our estimators are developed under the framework of functional dependence used in Wu (2011), which is a measure of time series dependence. The proposed methods are illustrated using both simulated and real financial data.

Title: Exact Simulation of Truncated Tempered Stable Subordinator

Abstract: We consider a special type of Levy subordinators that the jumps sizes of the processes are restricted, which means that every jump made by the process cannot exceed a certain level. We establish a general expression for the joint distribution of the hitting time and the overshoot, and propose a general simulation framework. In particular, we use truncated inverse Gaussian process to illustrate the idea as the distributions of various quantities are nicely formulated. The entire simulation algorithm is developed by breaking down the process at every hitting time when the process hits the truncated level until we reach the time t. An acceptance-rejection algorithm and a double-rejection algorithm are developed to generate the hitting time and overshoot of the truncated process. Numerical experiments, tests and comparisons are reported to demonstrate the accuracy and effectiveness of our algorithms.

Title: The Distribution of a Perpetuity

Abstract: We consider the problem of estimating the joint distribution of a perpetuity, e.g. a cumulative discounted loss process in the ruin problem, and the underlying factors driving the economy in an ergodic Markov model. One can identify the distribution in two manners: first, as the explosion probability of a certain locally elliptic (explosive) diffusion; second, using results regarding time reversal of diffusions, as the invariant measure associated to a certain (different) ergodic diffusion. These identifications enable efficient estimation of the distribution through both simulation and numerical solutions of partial differential equations. When the process representing the value of an economic factor is one-dimensional, or more generally reversing, the invariant distribution is given in an explicit form with respect to the model parameters. In continuous time and a general multi-dimensional setup, the lack of knowledge of the invariant distribution could pose an issue. In my talk I will show how one can amend the situation in the discrete-time case.

Title: Automatic model selection system

Abstract: Whether a statistical method performs well with the data and task in hand depends on the data structure and the purpose of fitting a model. An intelligent system which identifies the most suitable method for users, based on the data and purpose of research, would be very helpful. This would be particularly useful for researchers from other areas who may not have sufficient statistical knowledge to determine which method(s) they should use in their research. \parIn this presentation I will focus on selecting variable selection methods in a linear setting. Variable selection consistency for different methods are guaranteed under different conditions and assumptions on the data structure, signal to noise ratio, size of parameters, etc. Unfortunately these assumptions are difficult, if not impossible to check. In practice, cross-validation is used to select a method. However, cross-validation is based on the prediction error, not the variable selection error. In order to provide an alternative way to select a method, I propose using the ``model distance'' between a method and a ``median model'' as a proxy of the variable selection error. The median model is created by combining the fitted results from different candidate methods. Empirically, the model distance is positively correlated with the variable selection error and hence it is helpful in choosing the best model for variable selection. I will also compare the performance of this newly proposed measure with the usual cross-validation method.

Title: Optimal market making under partial information

Abstract: We consider a market maker (MM), her clients and a given asset. The MM's role is to give bid-ask quotes for this asset in continuous time. To do this she observes its price (reference price) on an external dealer to dealer market, evolving as an arithmetic Brownian motion. Her clients can also see the reference price but with a small noise modeled as an independent Brownian motion. They give buy and sell market orders at random times according to point processes with stochastic intensities depending on the spreads between the MM's prices relative to the noisy reference price. The MM accumulates orders over a finite time horizon, at the end of which she hedges the remaining net inventory on the external market. We solve the filtering and optimization problem faced by the MM who wants to maximize the expected utility - either risk neutral or CARA - of her terminal P&L by controlling her quotes without directly observing the noisy reference price of her clients.

Title: Pricing American Option with Incomplete Information

Abstract: We study a perpetual American put option with a special feature that the exercise right of option holders is initiated at some unknown point in which the underlying process hits some barrier and remains below for a predetermined amount of time for the first time. Such point of time is also known as “Parisian stopping time" originating from the Parisian options. Using a semi-Markov model with a perturbed Brownian motion, we derive an explicit expression for the Laplace transform of the Parisian stopping time written on the excursions below some barrier. Relying on this result, closed-form pricing formulae for the option described at the beginning with the initial price of the underlying asset being known/unknown are derived. Furthermore, we carry on an analogy between the put option of our interest and a normal American put and prove that the price of the former is always lower than that of the latter.

Title: An extension of the 3-step ML approach to event history models with multiple and possibly associated latent categorical predictors

Abstract: It is common in social research for more than one predictor variable to be a latent construct and there are many applications of structural equation models (SEM) with multiple continuous latent variables as predictors of one or more distal outcome. Researchers may wish to treat these latent constructs as categorical, but until recently have been limited to methods such as the modal class approach, which does not account for misclassification, or the 1-step approach, which creates a circular relationship in which the latent variable is also partly measured by the outcome. In this talk, we discuss an extension of the 3-step approach for one latent variable to a random effects event history model for recurrent events where the hazard function depends on multiple associated latent categorical variables. We describe maximum likelihood estimation of such a model and its potential to generalise to more flexible structural equation models that can handle longitudinal and other forms of clustering, measurement error and mixed response types.

Monday 9 and Tuesday 10 May 2016

Title: Effects of European Sovereign Debt Crisis on the Long Memory in Credit Default Swaps.

Abstract: We study the presence of long memory in sovereign credit default swaps (CDS) for a variety of maturities (1, 5, 10, 30 years) in the European Monetary Union by the time-varying average generalized Hurst exponent (TVA-GHE) from 2007-2014. We obtain daily TVA-GHE based on a 2 year moving time window and test for significance of long-memory employing (i) a pre-whitening and post-blackening bootstrap approach and confirmed results by using (ii) a random permutation shuffling procedure. We reveal that while numerous (peripheral) countries suffered from an unsustainable combination of overly high government structural deficits and accelerating debt levels, sovereign credit default risk increased and long memory decreased considerably during the European sovereign debt crisis. This behaviour lasted until the European Central Bank (ECB), international institutions (e.g. the International Monetary Fund (IMF)) as well as national governments implemented extraordinary policy interventions (e.g. financial assistance programmes) aiming to restore stability in financial markets in 2011, which followed with an increased persistence in sovereign CDS. Moreover, degree of long memory decreases with CDS maturity which is in contrast to economic theory. We conclude that changes in long memory might be associated with changes in predictability of default or non-default cases, where events in the far future are less predictable.

Title: First hitting time of the super-Maximum of a standard Brownian motion.

Abstract: We study the first hitting time of a super-maximum of standard Brownian motion. The explicit expression for the Laplace Transform and distribution functions of this hitting time were obtained. The problem is solved by setting up the infinitesimal generator of a standard Brownian motion. A suitable martingale is obtained from the solution of the generator following by the Fourier transform of the solution. Further we solve the joint density function of the first hitting time and the corresponding level of the reflected Brownian motion from the double Laplace Transform. An acceptance-rejection algorithm was also developed to generate the pair of hitting time and associated reflected Brownian motion. The problem is motivated by contingent convertibles.

Title: Bayesian variable selection for linear models using I-priors.

Abstract: In last year’s presentation event, I showed that the use of I-priors in various linear models can be considered as a solution to the over-fitting problem. In that work, estimation was still done using maximum likelihood on the marginal likelihood (after integrating out the prior), so in a sense it was a type of empirical-Bayes approach. Switching over to a fully Bayesian framework, we now look at the problem of variable selection, specifically in an ordinary linear regression setting. The appeal of Bayesian methods is that it reduces the selection problem to one of estimation, rather than a true search of the variable space for the model that optimises a certain criterion. I will review several Bayesian variable selection methods currently present in the literature, and show how we can make use of I-priors in such methods. Simulation studies show that the I-prior performs well in the presence of multicollinearity. Research is still ongoing, with hopes that the I-prior is able to cope well under sparse linear regression, and also able to be extended to generalised linear models such as binary response models.

Title: First Passage Time Problem for Ornstein-Uhlenbeck Process.

Abstract: In this project we analyze the first passage time (FPT) problem of the Ornstein-Uhlenbeck (OU) process to an arbitrary threshold. By applying perturbation expansions on the mean-reverting parameter we give an explicit solution of the inverse to the perturbed Laplace transform. Numerical examples in comparison with other known methods are provided to show the accuracy and computational efficiency of this new approach. Potentially this technique could be applied to similar problems from other (jump) diffusion processes.

Title: Survival probability of a risk process with variable premium income.

Abstract: Although the study of the ruin problem for classical collective risk process has been the centre of interest in a number of papers focusing on constant premium rate, only a few publications considering premium whose value depends on current surplus. We begin with a risk process with this generalized non-constant premium rate. It is assumed that the wealth available is invested at some continuously interest rate, leading that the premium income is a linear function of surplus. We also assume that the aggregate loss is an inverse Gaussian process. Our purpose is to study the probability of survival of an insurance company in infinite time horizon by applying Laplace transform to an infinitesimal generator. Explicit formula of survival probability and numerical results with different initial capitals and interest rates are given.

Title: A flexible model for prediction in functional time series.

Abstract: Functional data analysis has attracted great attention of scholars and is constantly growing due to recent developments in computer technology which enable the record of dense datasets on arbitrarily fine grids. In financial markets, enormous amounts of high-frequency data occur everyday, and it has become necessary to handle such huge volumes of information at the same time. In functional time series analysis, little research has been conducted on the prediction model besides the functional autoregressive model of order one(ARH(1)), while the prediction problem has been investigated from many angles in typical time series analysis. In this talk, the new model for the prediction in functional framework will be discussed, which allows more smoothing on curves observed far from the point of prediction compared to the observations located close to the prediction point. It gives more flexibility to the ordinary model in the sense that we allow more weight on the interval considered more important than others.

Title: Spatial weight matrix estimation.

Abstract: Spatial econometrics focus on cross sectional interaction between physical or economic units. However, most of studies apply a prior knowledge about spatial weight matrix in spatial econometrics model. Therefore misspecification on spatial weight matirx could affect significantly accuracy of model estimation. Lam (2014) has provided an error upper bound for the spatial regression parameter estimators in a spatial autoregressive model, showing that misspecification can indeed introduce large bias in the final estimates. Meanwhile, new researches on spatial weight matrix estimation only consider static effects but not include dynamic effects between spatial units. Our model firstly use the different linear combinations of same spatial weight matrix specifications for different time-lag responds in proposed spatial econometrics model. To overcome endogeneity from autoregression, instrumental variables are introduced. The model we use in this paper can also find fixed effects and spillover effects. Finally, we also develop asymptotic normality for our estimation under the framework of functional dependence measure introduced in Wu (2011). The proposed methods are illustrated using both simulated data.

Title: Exact Simulation of Point Process with Mean reverting Intensity driven by Levy subordinator.

Abstract: The mean reverting processes driven by Levy processes have a wide application in modelling intensity of event arrivals in finance and economics. We aim to develop an efficient Monte Carlo simulation scheme for exactly simulating point processes with mean reverting stochastic intensities driven by Levy subordinators. The simulation scheme for the point process will based on simulating the inter-arrival time and the associated intensity level, we will use the joint Laplace transform of the intensity and the inter-arrival of the process to derive these distributional properties. Our main work concentrated on intensities driven by Inverse Gaussian Process and Gamma Process. The main approach, instead of directly working out the Laplace transform of the joint distributions of the process to derive the transition densities, based on distributional decomposition of the process with aid of cutting the Levy measure of the driven subordinator with some proper value. Through this method, all the inter-arrival times and intensity levels at jump arrival times can be decomposed to familiar random variables that allows us to simulate exactly without introducing bias or truncation error.

Title: Sequential Changepoint Detection in Factor Models for Time Series.

Abstract: We address the problem of detecting changepoints in a Static Approximate Factor Model (SAFM). In particular, we consider three different types of changes: (i) emerging factors, (ii) disappearing factors, and (iii) changes in loadings. We make two key contributions. First, we introduce a changepoint estimator based on eigenvalue ratios and prove consistency of this estimator in the offline setting. Second, we propose methodologies for adapting our estimator to the sequential setting.

Title: The Distribution of a Perpetuity.

Abstract: We consider the problem of estimating the joint distribution of a perpetuity, e.g. a cumulative discounted loss process in the ruin problem, and the underlying factors driving the economy in an ergodic Markov model. One can identify the distribution in two manners: first, as the explosion probability of a certain locally elliptic (explosive) diffusion; second, using results regarding time reversal of diffusions, as the invariant measure associated to a certain (different) ergodic diffusion. These identifications enable efficient estimation of the distribution through both simulation and numerical solutions of partial differential equations. When the process representing the value of an economic factor is one-dimensional, or more generally reversing, the invariant distribution is given in an explicit form with respect to the model parameters. In continuous time and a general multi-dimensional setup, the lack of knowledge of the invariant distribution could pose an issue. In my talk I will show how one can amend the situation in the discrete-time case.

Title: The Implied Risk Aversion in Risk-Sharing Transactions.

Abstract: We consider a market of a given vector of securities and finitely many financial agents, who are heterogeneous with respect to their risky endowments and risk aversions. The market is assumed to be thin, meaning that each agent's actions could heavily influence the price and allocation of the securities. In contrast with the majority of related literature, we assume that agents' risk aversion is not public information, which implies that agents may strategically choose the risk aversion that they will implement in the trading. In this environment, equilibrium is modelled as the outcome of a Nash-type game, where the agents' sets of strategic choices are the demand functions on the traded securities.

Under the standard assumptions of exponential utility preferences and normal distributed pay-offs, we first show that the agents have motive to declare different risk aversions than their true ones. The Nash equilibrium is then characterized as a solution to a system of quadratic equations, which is shown to have a unique solution in the market with two agents or with multiple agents under the additional assumption that all but one endowments have beta less than one. Interestingly enough, it is shown that agents with sufficiently low (true) risk aversion profit more from Nash equilibrium as compared to the one with no strategic behavior.

Title: Optimal Market Making in FX.

Abstract: We consider an FX market maker (MM) and his clients. The MM role is to give bid-ask quotes in continuous time, and to do this he observes the price of the same pair on an external dealer to dealer market. This price evolves as a GBM. The clients also have access to that price but with a slight delay and they give buy and sell market orders at random times. We model the accumulated buy and sell orders with Cox processes whose intensities depend on the spreads and quadratic variation of the MM prices relative to the delayed external price. The MM accumulates orders over a small finite time horizon, at the end of which he hedges the remaining net inventory on the external market. The objective is to maximize the terminal expected P&L of the MM by controlling his quotes.

Title: Hitting Time Problem of Stochastic Process with Non-deterministic Drift.

Abstract: Stochastic process with non-deterministic drift, in particular, driven by the process itself has been widely used in modelling the evolution of short rate. The hitting time problem of such process to some constant or time-varying level is one of our interest. We start with the Ornstein Uhlenbeck process as an example and obtain explicit expression of Laplace transform and distribution function of the first hitting time. Departure from which, the attention is turned to the stochastic process whose drift is driven by its maximum level. Instead of working with a fixed drift term, we set the drift as an undetermined function of the maximum level and subject to change accordingly. We are going to discuss some findings of the drift function and present related results.

Title: A general three-step method for estimating the effect of multiple latent categorical predictors on a distal outcome.

Abstract: Latent class analysis (LCA) is widely used to derive categorical variables from multivariate data which are then included as predictors of a distal outcome. The traditional ‘modal class’ approach is to assign subjects to the latent class with the highest posterior probability. However, regression coefficients for the modal class will be biased due to potential misclassification and the unintended influence of the distal outcome on class membership. To address these problems, Asparouhov and Muthén (2014) proposed a 3-step method in which the modal class is treated as an imperfect measurement of the true class in the regression for the distal outcome, with measurement error determined by the misclassification probabilities. Our work extends their proposition to the multiple latent categorical variable case and assesses the relative performance of the 3-step method against the traditional modal class approach under settings of associated and independent latent class variables at different entropy levels. Current results show that the 3-step method is robust to unclear class separation and outperforms the modal class approach in most scenarios. The results are particularly useful for empirical studies that have more than one, possibly associated, latent constructs with unclear class separation.

Tuesday 19 and Wednesday 20 May 2015

Title: Multi-zoom autoregressive time series models.

Abstract: We consider the problem of modelling financial returns observed at a high or mid frequency, for example one minute. To this end, we adopt so-called “multi-zoom” approach, in which the returns are assumed to depend on a few past values observed at (unknown) lower frequencies such as one day ore one week. When the dependence is additionally assumed to be linear, the returns follow the Multi-Zoom Autoregressive (MZAR) time series model. We introduce an estimation procedure allowing for fitting MZAR models to the data and demonstrate preliminary theoretical results providing theoretical justification of our methodology. Finally, in a extensive simulation study based on the data from the New York Stock Exchange Trade and Quotes Database, we show that MZAR models can offer a very good predictive power for forecasting high- and mid-frequency financial returns. (With Piotr Fryzlewicz)

Title: Models for Chinese micro-blog data.

Abstract: Before the arrival of modern information and communication technology, it was not easy to capture people's consuming and company-rating preferences; however, the prevalence of social-networking websites provides opportunities to capture those trends in order to predict social and economic changes. With the establishment of numerous text mining methods in statistical learning, valuable information can be derived via the devising of patterns and trends over the textual content. Latent Dirichlet allocation (LDA), which can be regarded as an improvement of PLSI(Probabilistic latent semantic indexing), is one of the most common model for discovering topics from large sets of textual data. In LDA, each document in the collection is modelled as a mixture over an underlying set of topics; meanwhile, explicit representations of documents are provided by topic probabilities. In this presentation, the fundamental concept and structure of LDA will be clarified and variations of topic models that unveil the evolution of topics over time on Chinese micro-blog (Weibo) will be proposed. Approaches to resolve and eliminate the disturbance of randomness are attempted to generate more stable topic distributions. Methods for topic evolution analysis are employed to measure the trend, strength, and variability of topics.

Title: A nonparametric eigenvalue-regularised integrated volatility matrix estimator using high-frequency data for portfolio allocation.

Abstract: In portfolio allocation of a large pool of assets, the use of high frequency data allows the corresponding high-dimensional integrated volatility matrix estimator to be more adaptive to local volatility features, while sample size is significantly increased. To ameliorate the bias contributed from the extreme eigenvalues of the sample covariance matrix when the dimension $p$ of the matrix is large relative to the sample size $n$, and the contamination by microstructure noise, various researchers attempted regularization with specific assumptions on the true matrix itself, like sparsity or factor structure, which can be restrictive at times. With non-synchronous trading and contamination of microstructure noise, we propose a nonparametrically eigenvalue-regularized integrated volatility matrix estimator (NERIVE) which does not assume specific structures for the underlying integrated volatility matrix. We show that NERIVE is almost surely positive definite, with extreme eigenvalues shrunk nonlinearly under the high dimensional framework $p/n \rightarrow c > 0$. We also prove that almost surely, the optimal weight vector constructed using NERIVE has maximum weight magnitude of order $p^{-1/2}$. The asymptotic risk of the constructed optimal portfolio is also theoretically analyzed. The practical performance of NERIVE is illustrated by comparing to the usual two-scale realized covariance matrix as well as some other nonparametric alternatives using different simulation settings and a real data set.

Title: Nonlinear forecasting with many predictors.

Abstract: Although there is a rapidly growing literature on forecasting with many predictors, only few publications have appeared in recent years concerning possible nonlinear dynamics in high-dimensional time series. The aim of this study is to
develop forecasting models capable of capturing nonlinearity and nonnormality in high-dimensional time series with complex patterns. This study is organized as follows. First, it is logical to ask if the use of such nonlinear techniques is justified by the data, therefore we applied different types of nonlinearity tests available in the literature to determine if complex real world time series like financial returns behave in a linear or nonlinear fashion. The experimental results indicate that the financial series are rarely pure linear. There are strong evidence of existence of nonlinearity in and between financial series. Hence we proposed a two-stage forecasting procedure based on improved factor models with two neural network extensions. In the first stage, we perform a neural network PCA to estimate common factors, which allows the factors to have a nonlinear relationship to the input variables. In the second stage, we conduct a nonlinear factor augmented forecasting equation, which is the prediction of the variable of interest by using common factors, based on neural network models. Out-of-sample forecast results show that the proposed neural network factor model signicantly outperformed linear factor models and the random walk. Finally, We introduced a one-shot procedure to forecast high-dimensional time series with complex patterns, which is based on a neural network with skip-layer connections optimized by a novel learning algorithm including both L1 and L2 norms simultaneously. Both techniques introduced in this study include linear and nonlinear structures and if there is no nonlinearity between variables, they converge to a linear model.

Title: On exact simulation algorithms for some distribution related to Brownian motion.

Abstract: I survey exact random variate generators for several distributions related to Brownian motion. Various parameters such as extremes and locations of extremes of Brownian motions, first exit time of Brownian motions, supremum of reflected Brownian motion, supremum of reflected Brownian motion and its location and supremum of Brownian motion with drift. Exact simulation is important for financial modelling such as barrier options to avoid bias.

Title: NERCOME estimator for integrated covariance matrices.

Abstract: Introduced by Lam (2014), Nonparametric eigen-regularized covariance matrix estimator (NERCOME) is a novel method to estimate covariance matrix through splitting of the data. It enjoys many nice properties. However, one of the key assumptions is that the data must be independent. We consider the estimation of integrated covariance (ICV) matrices of high dimensional diffusion processes based on high frequency observations. We extend NERCOME method to allow a time-varying structure for a particular class $\mathcal{C}$ of diffusion processes which the data follows (Zheng and Li ($2011$)). We prove some asymptotic results. Finally we use both simulated data and real data examples to compare our estimator with the commonly used realized covariance matrix (RCV) and time-variation adjusted realized covariance (TVARCV) matrix.

Title: NOVELIST estimator of large correlation and covariance matrices and their inverses.

Abstract: We propose a "NOVEL Integration of the Sample and Thresholded covariance estimators" (NOVELIST) to estimate the large covariance (correlation) and precision matrix. NOVELIST performs shrinkage of the sample covariance (correlation) towards its thresholded version. The sample covariance (correlation) component is non-sparse and can be low-rank in high dimensions. The thresholded sample covariance (correlation) component is sparse, and its addition ensures the stable invertibility of NOVELIST. The benefits of the NOVELIST estimator include simplicity, ease of implementation, computational efficiency and the fact that its application avoids eigenanalysis. We obtain an explicit convergence rate in the operator norm over a large class of covariance (correlation) matrices when the dimension $p$ and the sample size $n$ satisfy log $p/n\to 0$. In empirical comparisons with several popular estimators, the NOVELIST estimator in which the amount of shrinkage and thresholding is chosen by cross-validation performs well in estimating covariance and precision matrices over a wide range of models and sparsity classes.

Title: Regression modelling using I-priors.

Abstract: The I-prior methodology is a new modelling technique which aims to improve on maximum likelihood estimation of linear models when the dimensionality is large relative to the sample size. By putting a prior which is informed by the dataset (as opposed to a subjective prior), advantages such as model parsimony, lesser model assumptions, simpler estimation, and simpler hypothesis testing can be had. By way of introducing the I-prior methodology, we will give examples of linear models estimated using I-priors. This includes multiple regression models, smoothing models, random effects models, and longitudinal models. Research into this area involve extending the I-prior methodology to generalised linear models (e.g. logistic regression), Structural Equation Models (SEM), and models with structured error covariances.

Title: Trading in limit order market with asymmetry information.

Abstract: We study a trading problem in limit order market with asymmetry information. There are two types of agents, noisy traders and an insider. Noisy traders come to the market with liquidation purpose only. The insider knows the fundamental value of a risk asset before the trade and is to maximise her expected profit. In Glosten and Milgrom [1985] model, aggregated demand is a point process and the insider is only allowed to place market orders. We borrow the main structure of the model from Glosten and Milgrom [1985]. At the same time, the insider is allowed to apply hybrid way, i.e. limit and market orders, to maximise her expected profit by a trade-off between limit and market orders. This is formulated as a control problem that we characterise in terms of HJB system.

Title: Joint law of classical Cramér-Lundberg risk model.

Abstract: Classical collective risk model, Cramér–Lundberg risk model, focuses on the probability of ruin of an insurance company. We begin with this risk model with claim sizes following an identical inverse Gaussian distribution, and study corresponding joint laws in finite time horizon through Gerber-Shiu expected discounted penalty functions and Laplace transform. The joint distribution of first passage time and overshoot with zero initial capital is derived. Particular attention is given to the asymptotic result for the joint distribution of first passage time, overshoot and any nor-zero initial capital, which could provide us with the probability of ruin at any finite time with different initial capital. Numerical results of probability of ruin are given with different ruin time and initial capital.

Title: Change-point detection in multichannel EEG data.

Abstract: We present a novel method for detecting frequency-specific change points in the spectral features of multi-channel electroencephalogram (EEG) recordings. Our method detects temporal changes in the spectral energy distribution at EEG channels and in the coherence between channel pairs. As opposed to existing methods on multi-channel change-point detection, our proposed method is able to localize change points not only in time and space, but also attribute them to specific frequency bands (e.g., delta, alpha, beta and gamma). This is feature is important and highly relevant in advancing our understanding of specific changes in neuronal activity. One such example is the design of early warning systems for epileptic seizure patients. Our proposed method is computationally fast and its results are easily interpretable. We illustrate this with an application to EEG seizure data that provides insights to spectral energy changes in pre-seizure brain activity.

Title: Real-world probabilistic modelling of El Niño.

Abstract: In this research I apply non-linear analysis methods, which I have been developing in my PhD thesis, to situations of economic interest such as El Niño forecasting. El Niño is a global climatic phenomenon with widespread climate and economic impacts. Prediction of El Niño behaviour would be of great value in many countries, but existing forecast methods are inadequate to provide useful information on timescales of interest. In part this is due to model error, which is the focus of my thesis and known to be important in climate simulation. In this study I first consider a perfect model scenario based on Columbia University’s model for El Niño, and present the results of an experiment tracking the decay of information due to sensitivity in initial conditions. This illustrates the use of the tools I have developed to interpret, value and apply probabilistic forecasts. I then explore the novel use of the information deficit in model development and forecast evaluation. Findings about predictability of the El Niño model are similar to my previous conclusions about the predictability of toy mathematical systems. Increasing the ensemble size, cutting down the noise level or choice of data assimilation technique can have practical implications for the real-world use of the forecast system.

Title: Proposing a new measure for detecting (latent variable model aberrant) semi-plausible response patterns.

Abstract: New challenges concerning bias to measurement error have arisen due to the increasing use of paid participants: semi-plausible response patterns (SpRPs). SpRPs result when participants only superficially process the information of (online) experiments or questionnaires and attempt only to respond in a plausible way. This is due to the fact that participants who are paid are generally motivated by fast cash, and try to efficiently overcome objective plausibility checks and process other items only superficially, if at all. The consequences are biased estimations, blurred or even covered true effect sizes, and contaminated valid models. A new measure developed for the identification of SpRPs in a latent variable framework is evaluated and future research outlined.

Tuesday 20 and Wednesday 21 May 2014

Title: Ranking-based subset selection for high-dimensional data.

Abstract: In this presentation, we consider high-dimensional variable selection problem, where the number of predictors is much larger than the number of observations. Our goal is to identify those predictors, which truly affect the response variable. To achieve this, we propose the Ranking Based Subset Selection (RBSS), which combines subsampling with any variable selection algorithm allowing to rank “importance” of the explanatory variables . Unlike the existing competitors such as Stability Selection (Meinshausen and Bühlmann, 2010), RBSS can identify subsets of relevant predictors selected by the original procedure with relatively low but yet significant probability. We provide a real data example, which demonstrates that this issue arises in practice and show that RBSS offers a very good performance then. Moreover, we report results of an extensive simulation study and some of the theoretical results derived, which show that RBSS is a valid and powerful statistical procedure.

Title: Text mining and time series analysis on Chinese microblogs.

Abstract: This presentation will discuss some text mining and time series analysis results on Chinese Micro-blogs (Weibo). First, It will give brief review towards social media/micro-blog, techniques of Micro-blog data acquisition, and some exploratory data analysis. The aim of using text mining is to understand general public’s perspectives towards certain keywords (e.g. speciﬁc companies). Useful information is typically derived through the devising of patterns and trends through statistical pattern learning. Text mining methods such as Clustering and Support Vector Machine are applied. In addition, to discover the abstract “topics” that occur in a collection of posts, topic modelling was applied in the simulation study. Next, time series analysis on sentiment and on the correlation between posts amount and stock price will be presented. Plans and problems for next stage will be proposed in the end.

Title: Measuring the efficacy of the UK counterweight programme via g-computation algorithm.

Abstract: One of the purposes of longitudinal studies is the evaluation of the impact of a sequence of treatments/exposures on an outcome measured at the final stage. When dealing with observational data, particular care is needed in stating dependencies among variables into play, in order to avoid a number of drawbacks that could affect the validity of performed inference. Time-varying confounding is one of the most important and arises naturally when the causality framework is adapted to a multi-temporal context, as there may be variables that at each time act as confounders for the treatments/outcome relation but are also influenced by previous treatments, lying therefore on the causal paths under investigation. The g-computation algorithm (Robins 1986, Ryan et al. 2012) is probably the most popular method to overcome this issue. In order to handle informative drop-out, we propose an extension of Heckman correction to deal with several occasions. The motivating example consists of a follow-up study implemented within the Counterweight Programme, one of the most relevant protocols enforced to tackle the problem of obesity in the last decades in UK (Taubman et al. 2009), from which the dataset used for the application has been gathered.

Essential references:
Robins, J. (1986) - A new approach to causal inference in mortality studies with a sustained exposure period - application to control of the healthy worker survivor effect. Mathematical Modelling.
Daniel, R. M. et al. (2012) - Methods for dealing with time-dependent confounding. Statistics in Medicine.
Taubman, S. L. et al. (2009) - Intervening on risk factors for coronary heart disease: an application of the parametric g-formula. International Journal of Epidemiology.

Title: Data augmentation: simulating diffusion bridges using Bayesian filters.

Abstract: We propose a new approach to simulating diffusion bridges. We focus on bridges for nonlinear processes however our method is applicable to linear diffusion processes as well. Novelty of our data augmentation technique lies in the proposal which is based on a Bayesian filter, in particular Kalman filter or unscented Kalman filter, applied to Euler approximation of a given diffusion process. We thus follow multivariate normal regression theory applying unscented transformation whenever diffusion process is nonlinear. Bridges we study are for mean reverting processes, such as linear Ornstein-Uhlenbeck process, square root process with nonlinear diffusion coefficient and inverse square root process with nonlinear drift and diffusion coefficient. We introduce a correction to approximation of drift in the Euler scheme and generalize it for a class of mean-reverting processes with polynomial drift. Setting our method against other techniques found in the literature, in cases we study we find acceptance rates we obtain comparable for values of mean-reversion parameter lying in the unit interval. However, unlike the other methods our method leads to incomparably higher acceptance rates for values of this parameter higher than unity. We believe this result to be of interest especially when modelling term-structure dynamics or other phenomena with inverse square-root processes. Our next goal is to extend these results to a multidimensional setting and simulate diffusion processes conditional on their integrals, followed by applications in stochastic volatility models.

Title: Financial forecasting with many predictors with neural network factor models.

Abstract: Modelling and forecasting financial returns have been an essential question of recent studies in academia as well as in financial markets to understand market dynamics. Financial returns present special features, which makes the forecast of this variable hard. This study aims to propose a non-linear forecasting technique based on an improved factor model with two neural network extensions. The first extension proposes an auto-associative neural network principal component analysis as an alternative for factor estimation, which allows the factors to have a non-linear relationship to the input variables. After finding the common factors, the next step will propose a non-linear factor augmented forecasting equation based on a single hidden layer feed forward neural network model. In this study, statistical approach has been demonstrated to show that the modelling procedure is not a black box. This proposed neural network factor model can capture both non-linearity and non-guasianity of a high-dimensional dataset. Therefore, this model can be more accurate to forecast the complex behaviour in financial data.

Title: Nonparametric eigenvalue-regularized precision or covariance matrix estimator.

Abstract: Recently there are numerous works on the estimation of large covariance or precision matrix. The high dimensional nature of data means that the sample covariance matrix can be ill-conditioned. Without assuming a particular structure, much efforts have been devoted to regularizing the eigenvalues of the sample covariance matrix. Lam (2014) proposes to regularize these eigenvalues through subsampling of the data. The method enjoys asymptotic optimal nonlinear shrinkage of eigenvalues with respect to the Frobenius error norm. Coincidentally, this nonlinear shrinkage is asymptotically the same as that introduced in Ledoit and Wolf 2012. One advantage of our estimator is its computational speed when the dimension p is not extremely large. Our estimator also allows p to be larger than the sample size n, and is always positive semi-definite.

Title: NOVELIST estimator for large covariance matrix.

Abstract: We propose a NOVEL Integration of the Sample and Thresholded covariance estimators (NOVELIST) to estimate large covariance matrix. It is shrinkage of the sample covariance towards a general thresholding target, especially soft or hard thresholding estimators. The benefits of NOVELIST include simplicity, ease of implementation, and the fact that its application avoids eigenanalysis, which is unfamiliar to many practitioners. We obtain an explicit convergence rate in the operator norm over a large class of covariance matrices when dimension p and sample size n satisfy log p/n→0. Further we show the rate is a trade-off between sparsity, shrinkage intensity, thresholding level, dimension and sample size under different covariance structures. The simulation results will be presented and comparison with other competing methods will also be given.

Title: Limit convergence of BSDEs driven by a marked point process.

Abstract: We study backward stochastic differential equations (BSDEs) driven by a random measure, or equivalently, by a marked point process. When some assumptions hold, there exists a unique supersolution with its unique decomposition to the BSDE. Thanks to Peng’s paper written in 1999, we can follow his method with proper modifications to prove limit theorem of BSDEs driven by a marked point process, i.e. if there exists a sequence of supersolutions of BSDEs increasingly converges to a supersolution Y, there also exists the convergence to Y’s unique decomposition. Moreover, we can apply this limit convergence theorem to show the existence of the smallest supersolution of a BSDE with a constraint. Finally, we apply our results to consider the insider trading problem.

Title: Excursions of Lévy processes.

Abstract: We study the classical collective risk model, Cramér-Lundberg risk model, driven by a compound Poisson process, which concerns the probability of ultimate ruin of an insurance company both in finite time horizon and infinite time horizon. Particular attention is given to Gerber-Shiu expected discounted penalty functions, which provide a method of calculating the probability of ruin. We derive the Laplace transforms of claim sizes following an inverse Gaussian distribution and mixture of two exponential distributions and we obtain the asymptotic formulas of probability of ruin based on the two scenarios mentioned above. The infinite divisibility of Lévy processes and the Lévy-Khintchine representation theorem are introduced as preliminaries to study the excursions of Lévy processes as well as applications in financial mathematics.

Title: Adaptive trend estimation in financial return data - recent findings and new challenges.

Abstract: Financial returns can be modelled as centred around piecewise-constant trend functions which change at certain points in time. We can capture this in a model using a hierarchically-ordered oscillatory basis of simple piecewise-constant functions which is uniquely defined through Binary Segmentation for change-point detection. The resulting interpretable decomposition of nonstationarity into short- and long-term components yields an adaptive moving-average estimator of the current trend, which beats comparable forecast estimators in applications on daily return data. In my presentation I discuss some challenges and interesting questions as well as potential paths to improve the existing framework. I also show some promising results for a multivariate extension of this model.

Title: How long in the future can you trust the forecast?

Abstract: In this research I quantify the predictability of a chaotic system, estimate how far in the future it is predictable for and identify the two main limitations. Sensitivity to initial conditions complicates the forecasting of chaotic dynamical systems, even when the model is perfect. Structural model inadequacy is a distinct source of forecast failure, failures which are sometimes mistakenly interpreted to be due to chaos. These methods are demonstrated using a toy mathematical system (Henon Map) as an illustration. Model inadequacy is shown to be important in real-world forecasting practice using the example of climate models. The research findings based on North American Regional Climate Change Assessment Program (NARCCAP) database show significant divergence between Regional and Global Climate Models estimates of surface radiation, and consider the implications for the reliability of such models.

Title: Methods for the identification of semi-plausible response patterns (SpRPs)

Abstract: New challenges concerning bias from measurement error have arisen due to the increasing use of paid participants: semi-plausible response patterns (SpRPs). SpRPs result when participants only superficially process the information of (online) experiments or questionnaires and attempt only to respond in a plausible way. This is due to the fact that participants who are paid are generally motivated by fast cash, and try to efficiently overcome objective plausibility checks and process other items only superficially, if at all. Thus, those participants produce not only useless but detrimental data, because they attempt to conceal their malpractice from the researcher. The potential consequences are biased estimation and misleading statistical inference. The inferential objective is to derive identification statistics within latent models that detect these behavioural patterns (detection of error), by drawing knowledge from related fields of research (e.g., outlier analysis, person-fit indices, fraud detection).

Title: The joint distribution of excursion and hitting times of the Brownian motion with application to Parisian option pricing.

Abstract: We study the joint law of excursion time and hitting time of a drifted Brownian motion by using a three state semi-Markov model obtained through perturbation. We obtain a martingale to which we can apply the optional sampling theorem and derive the double Laplace transform. This general result is applied to address problems in option pricing. We introduce a new option related to Parisian options being triggered when the age of an excursion exceeds a certain time or/and a barrier is hit. We obtain an explicit expression for the Laplace transform of its fair price.

Tuesday 21 and Wednesday 22 May 2013

Title: Subset stability selection.

Abstract: In this presentation, we provide a brief introduction to the concepts standing behind recently developed variable screening procedures in a linear regression model. These techniques aim to remove a great number of unimportant variables from the analysed data set, preserving all relevant ones. In practice, however, it may occur that the obtained set does not include any important variables at all! That is why there is a need for a tool, which could assess reliability and stability of a set of variables and implement these assessments in the further analysis. We introduce a new method, termed “subset stability selection”, which combines any variable screening procedure with resampling techniques, in order to find significant variables only. Our method is fully nonparametric, easily applicable in much wider context than linear regression only and it exhibits very promising finite sample performance in the simulation study provided.

Title: Hedging of barrier options via a general self-duality.

Abstract: Classical put-call symmetry relates the price of puts and calls under a suitable dual market transform. One well-known application is the semi-static hedging of path dependent barrier options with European options. Nevertheless, one has to relieve restrictions on modelling price processes so as to fit empirical data of stock prices. In this work, we develop a general self-duality theorem to develop hedging schemes for barrier options in stochastic volatility models with correlation.

Title: Data analysis and text mining on mico-blogs.

Abstracts: This presentation will discuss some data analysis and text mining on Micro-blogs, especially for Chinese Micro-blog (Weibo). Some brief introduction towards social media/micro-blog and comparison between Twitter and Weibo will be presented. It will cover several techniques of Micro-blog data acquisition, including downloading via Application Programming Interface (API), Web crawling tools, Web parsing applications. For initial data analysis, some works towards posting pattern recognition and correlation with share price has been conducted. Further text mining study towards Weibo includes Chinese word segmentation, word frequency counting, and sentiment analysis will be introduced. Plans and problems for next stage will be proposed in the end.

Title: Sparse factor model for multivariate time series.

Abstract: In this work, we model multiple time series via common factors. Under the stationary settings, we concentrate on the case when the factor loading matrix is sparse. We proposed a method to estimate the factor loading matrix and to correctly pick up the zeros from it. Two aspects of asymptotic results are investigated when the dimension of the time series p is fixed: (1) parameter consistency: the convergent rate of the new sparse estimator and (2) sign consistency. We have obtained a necessary condition for sign consistency of the estimator. Future work will allow p goes to infinity.

Title: Forecasting with many predictors with a neural-based dynamic factor model.

Abstract: The contribution of this study is to propose a non-linear forecasting technique based on an improved dynamic factor model with two neural network extensions. The first extension proposes a bottleneck-type neural network principal component analysis as an alternative for factor estimation, which allows the factors to have a nonlinear relationship to the input variables. After finding the common factors, the next step will propose a non-linear factor augmented forecasting equation based on a multilayer feed forward neural network. Neural networks as a function approximation method can capture both non-linearity and non-normality of the data. Therefore, this model can be more accurate to forecast non-linear behaviour in macroeconomic and financial high-dimensional time series data.

Title: Multivariate longitudinal data subject to dropout and item non-response - a latent variable approach.

Abstract: Longitudinal data are collected for studying changes across time. Studying many variables simultaneously across time (e.g. items from a questionnaire) is common when the interest is in measuring unobserved constructs such as democracy, happiness, fear of crime, social status, etc. The observed variables are used as indicators for the unobserved constructs "latent variables" of interest. Dropout is a common problem in longitudinal studies where subjects exit the study prematurely. Ignoring the dropout mechanism can lead to biased estimates, especially when the dropout is non - ignorable. Another possible type of missingness is item non-response where an individual chooses not to respond to a specific question. Our proposed approach uses latent variable models to capture the evolution of the latent phenomenon over time while accounting for dropout (possibly non - random), together with item non-response.

Title: Factor modelling for high dimensional time series.

Abstract: Lam et al. (2011) propose an autocorrelation based estimation method for high dimensional time series using a factor model. When factors have different strengths, a two step procedure which estimate strong factors and weak factor separately will perform better than doing the estimation in one go. It is well known that PCA method (Bai and Ng, 2002) is only valid for high dimensional data (consistency comes from dimension going to infinity). On the other hand, we derive some convergence results, which show that the autocorrelation based method can takes advantage of low dimensional estimation and estimate weaker factor better, while itself is a high dimensional data analysis procedure. This result can be applied to some macroeconomic data.

Title: Forecasting the probability of tropical cyclone formation - the reliability of NHC forecasts from the 2012 hurricane season.

Abstract: see poster

Title: Asymptotic equilibrium in glosten-milgrom model.

Abstract: Kyle (1985) studied a market with asymmetry information and obtained the equilibrium in the market. Back (1992) generalized it in continuous time. In Back’s result, the fundamental value of the risky asset can take any continuous distribution. This general result is contrast to the studies in Glosten-Milgrom equilibrium where the fundamental value of the risk asset is assumed to have a Bernoulli distribution in Back and Baruch (2004). We have taken on this project to study the existence of Glosten-Milgrom equilibrium, when the fundamental value of the risky asset has the discrete general distribution. We also introduce a notion of asymptotic equilibrium for Glosten-Milgrom equilibrium which allows a sequence of Glosten-Milgrom equilibriums to approximate Kyle-Back equilibrium, when the value of risky asset has general discrete distributions.

Title: How to quantify the predictability of a chaotic system.

Abstract: I present a new time series model for nonstationary data that is able to cope with a very low signal-to-noise ratio and time-varying volatility, both of which are typical features of financial time series. Core of our model is a set of data-adaptive basis functions and coefficients which specify location and size of jumps in the mean of a time series. The set of these change points can be determined with a uniquely identifiable hierarchical structure, allowing for unambiguous reconstruction. Thresholding the estimated wavelet coefficients adequately, our model provides practitioners with a flexible forecasting method: only those change points of higher importance (in terms of jump size) taken into account in forecasting returns.

Title: How to quantify the predictability of a chaotic system.

Abstract: Models are tools that describe reality in form of mathematical equations. For example General Circulation Models (GCM) represent actual climate system and are used to investigate major climate processes and help us better understand certain dependencies amongst climate variables. Global forecasts help foresee severe weather anywhere on the planet and save many lives, although meteorology is unreliable in long run. A model is only an approximate representation of nature, which is reflected by model error. In addition, small uncertainties in the initial conditions usually bring up errors in the final forecasts. We can handle initial condition uncertainty but not model error. This study examines how to quantify predictability of complex models with an eye towards experimental design.

Title: Estimation risk in asset allocation theory.

Abstract: Assuming that the assets returns are normally distributed with a known covariance matrix, the paper derives a joint sampling distribution for the estimated efficient portfolio weights as well as for its mean and risk return. In addition, it shows that estimation error increases with the investor’s risk tolerance and the number of assets within the portfolio, while it decreases with the sample size. While large institutional investors allocate their funds over a number of classes, in practice, these allocation decisions are made in a hierarchical manner and involve adding constraints on the process. From a pure ex-ante perspective, such procedures are likely to result in sub-optimal decision making. Nevertheless, from an ex-post view as my results approve, the committed estimation risk increases with the number of assets. Therefore, the loss of ex-ante welfare in the hierarchical approach can be outweighed by lower estimation risk achieved by optimizing over a smaller number of assets.

Title: Will it rain tomorrow? Improving probabilistic forecasts.

Abstract: Chaos is the phenomenon of small differences in the initial conditions of a process causing large differences later in time, often colloquially referred to as the “butterfly effect”. Perhaps the most well-known example though is in meteorology where small differences in the current conditions can have large effects later on. The effect is famously summed up by the notion that “when a butterfly flutters its wings in one part of the world, it can eventually cause a hurricane in another.” Of course this is only a fictional example but let’s suppose that we know this is true but we don’t know whether the butterfly has flapped its wings or not. Do we accept that we can’t predict what’s going to happen? Or can we gain some insight? Now suppose that we know from experience that the probability of the butterfly flapping its wings is 0.05, i.e. 5 percent. With this information we might conclude that the probability of a hurricane occurring is 0.05 also. This is of course an oversimplified and unrealistic example, but it illustrates the concept of ensemble forecasting in that a degree of belief about uncertainty of the initial conditions can give us a better idea of the probability of a future event.

Title: Efficient estimation of risk measures in a semiparametric GARCH model.

Abstract: This paper proposes efficient estimators of risk measures in a semiparametric GARCH model defined through moment constraints. Moment constraints are often used to identify and estimate the mean and variance parameters and are however discarded when estimating error quantiles. In order to prevent this efficiency loss in quantile estimation, we propose a quantile estimator based on inverting an empirical likelihood weighted distribution estimator. It is found that the new quantile estimator is uniformly more efficient than the simple empirical quantile and a quantile estimator based on normalized residuals. At the same time, the efficiency gain in error quantile estimation hinges on the efficiency of estimators of the variance parameters.

Title: Last passage time processes.

Abstract: The survey of last passage times play an important role in financial mathematics. Since they look into the future and are not stopping times the standard theorems in martingale theory can not be applied and therefore they are much harder to handle. Using time inversion we relate last passage times of drifted Brownian motion to first hitting times. Using this argument we derive the distribution of the increments. We extend this to general transient diffusions. Work has been done by Profeta et al. making use of Tanaka’s formula. We introduce the concept of conditioned martingales and connect it to Girsanov’s theorem. Our main focus lies in relating the Brownian meander to the BES(3) process. This transformation proofs to be useful in deriving the last passage time density of the Brownian meander.

Graduation_2016_1519_1024x576_16-9_sRGBe