Navigate to: 6 June | Student Posters
The speakers are ordered according to surname.
5 June
Abstract: Measuring the strength of dependence and testing independence for a pair of random scalars/vectors $(X, Y)$ based on $n$ independent realizations ${(X_i, Y_i)}_{i=1}^n$ is a century‑old problem. A natural and important extension of this problem is studying the strength of conditional dependence, say, for a triplet of random variable/vectors $(X, Y, Z)$ based on $n$ independent realizations ${(X_i, Y_i, Z_i)}_{i=1}^n$ can one quantify the strength of dependence $Y$ on $Z$ conditional $X$? We explore some of the recent advances in providing more satisfactory answers for quantifying dependence and the challenges we face for conditional dependence.
Take a look at Mona's slides (PDF).
Abstract: This talk is based on joint work with José Pedraza.
The last time $g$ a stochastic process $X$ is negative is an important quantity of study in various applications, including bankruptcy in insurance. This time is clearly not a stopping time as it depends on the whole path of $X$. In this talk we will see how the problem of finding stopping times closest to $g$ is substantially more difficult in $L^p$ for $p>1$ distance compared to the case $p=1$.
Inspired by early ideas of Shiryaev, in particular we will see that in the spectrally negative Lévy process (basically Brownian motion with negative jumps) setting, an optimal stopping time is given by the first time that $X$ exceeds a non-increasing and non-negative curve depending on the length of the current excursion away from zero. As an example, the case of a Brownian motion with drift and a Brownian motion with drift perturbed by a Poisson process with exponential jumps are considered.
Take a look at Erik's slides (PDF).
Abstract: Covariance function estimation is a fundamental task in multivariate functional data analysis and arises in many applications. In this paper, we consider estimating sparse covariance functions for high-dimensional functional data, where the number of random functions p is comparable to, or even larger than the sample size n. Aided by the Hilbert--Schmidt norm of functions, we introduce a new class of functional thresholding operators that combine functional versions of thresholding and shrinkage, and propose the adaptive functional thresholding estimator by incorporating the variance effects of individual entries of the sample covariance function into functional thresholding. To handle the practical scenario where curves are partially observed with errors, we also develop a nonparametric smoothing approach to obtain the smoothed adaptive functional thresholding estimator and its binned implementation to accelerate the computation. We investigate the theoretical properties of our proposals when p grows exponentially with n under both fully and partially observed functional scenarios. Finally, we demonstrate that the proposed adaptive functional thresholding estimators significantly outperform the competitors through extensive simulations and the functional connectivity analysis of two neuroimaging datasets.
Take a look at Qin's slides (PDF).
Abstract: In some applications we are interested in how three time-related variables – a person’s age in years, their year of birth (cohort), and year (period) of observation – contribute separately to observed levels of some outcome, such as a person’s level of interest in political matters. It is conceptually plausible that these three index distinct influences on the outcome, e.g. age ones on a life stages of an individual, cohort their formative experiences, and period events in the times that they live in. Empirically, however, these influences cannot be fully separated, because the three time variables are deterministically related: Period=Cohort+Age. It is well-known that this creates an inherent unidentifiability for any age-period-cohort (APC) analysis of how an outcome depends on A, P and C. However, this does not mean that nothing can be usefully determined. First, only the separate contributions of A, P and C to the linear component (overall trend) of the relationship are unidentifiable, but their contributions to its curvature can be estimated. Second, the lack of identification concerns just one parameter, so fixing the linear term of any one of A, P and C determines also the other two. This make it possible to carry reasonably constructive analysis of which scenarios are consistent with the observed data, and to evaluate whether estimates obtained by making other kinds of assumptions appear plausible. I illustrate these ideas using data on political interest from the British Social Attitudes survey.
Abstract: In the talk we will discuss limited information goodness-of-fit test statistics for latent variable models with categorical variables under simple random sampling and complex sample designs. We adopt a pairwise likelihood estimation and therefore our estimators will be limited-information maximum likelihood estimators that are consistent but not efficient. We derive the results for chi-square type test statistics based on the pairwise likelihood estimates. The performance of the proposed test statistics is studied through a simulation study for complex sample designs such as stratified and cluster sampling.
Co-authors: Haziq Jamil and Chris Skinner.
Take a look at Irini's slides (PDF).
Abstract: Maximum mean discrepancy (MMD, also called energy distance) and Hilbert-Schmidt independence criterion (HSIC, a.k.a. distance covariance) rely on the mean embedding of probability distributions and are among the most successful approaches in machine learning and statistics to quantify the difference and the independence of random variables, respectively. We present higher-order variants of MMD and HSIC by extending the notion of cumulants to reproducing kernel Hilbert spaces. The resulting kernelized cumulants have various benefits: (i) they are able to characterize the equality of distributions and independence under very mild conditions, (ii) they are easy to estimate with minimal computational overhead compared to their degree one (MMD and HSIC) counterparts, (iii) they achieve improved power when applied in two-sample and independence testing for environmental and traffic data analysis. [This is joint work with Patric Bonnier and Harald Oberhauser. Preprint: https://arxiv.org/abs/2301.12466.]
Take a look at Zoltan's slides (PDF).
Abstract: We propose a new method for high-dimensional semi-supervised learning problems based on the careful aggregation of the results of a low-dimensional procedure applied to many axis-aligned random projections of the data. Our primary goal is to identify important variables for distinguishing between the classes; existing low-dimensional methods can then be applied for final class assignment. Motivated by a generalized Rayleigh quotient, we score projections according to the traces of the estimated whitened between-class covariance matrices on the projected data. This enables us to assign an importance weight to each variable for a given projection, and to select our signal variables by aggregating these weights over high-scoring projections. Our theory shows that the resulting \texttt{Sharp-SSL} algorithm is able to recover the signal coordinates with high probability when we aggregate over sufficiently many random projections and when the base procedure estimates the whitened between-class covariance matrix sufficiently well. The Gaussian EM algorithm is a natural choice as a base procedure, and we provide a new analysis of its performance in semi-supervised settings that controls the parameter estimation error in terms of the proportion of labeled data in the sample. Numerical results on both simulated data and a real colon tumor dataset support the excellent empirical performance of the method.
Take a look at Tengyao's slides (PDF).
Abstract: AI & Data Science in Practice
Use Cases in Digital Industry and Oil & Gas (Sales Forecast & Prediction of sand accumulation in the desert) – depending on the time can also be just one use case.
6 June
Abstract: We study the mean change point detection problem for heavy-tailed high-dimensional data. Firstly, we show that when each component of the error vector follows an independent sub-Weibull distribution, a CUSUM-type statistic achieves the minimax testing rate in almost all sparsity regimes. Secondly, when the error distributions have polynomially decaying tails -- admitting bounded $\alpha$th moment for some $\alpha \geq 4$, we introduce a median-of-means-type statistic that achieves a near-optimal testing rate in both the dense and the sparse regime. A 'black-box' robust sparse mean estimator is then combined with the median-of-means-type statistic to achieve optimality in the sparse regime. Although such an estimator is usually computationally inefficient for its original purpose of mean estimation, our combined approach for change point detection is polynomial-time. Lastly, we investigate the even more challenging case when $2 \leq \alpha <4$ and unveil a new phenomenon that the (minimax) testing rate has no sparse regime, i.e. testing sparse changes is information-theoretically as hard as testing dense changes. We show that the dependence of the testing rate on the data dimension exhibits a phase transition at $\alpha = 4$.
Take a look at Yudong's slides (PDF).
Abstract: We consider the problem of choosing an investment strategy that will maximise utility over distributions, under capital gains tax and constraints on the expected liquidation date. We show that the problem can be decomposed in two separate ones. The first involves choosing an optimal target distribution, while the second involves optimally realising this distribution via an investment strategy and stopping time. The latter step may be regarded as a variant of the Skorokhod embedding problem. A solution is given very precisely in terms of the first time that the wealth of the growth optimal portfolio, properly taxed, crosses a moving stochastic (depending on its minimum-to-date) level. The suggested solution has the additional optimality property of stochastically minimising maximal losses over the investment period.
Abstract: We present a model to describe sparse random graphs with spatial covariates, exploiting the so called "graphex" setting embedded in a Bayesian nonparametric framework, that allows for flexibility and interpretable parameters. We provide a number of asymptotic results, namely that the model is able describe both sparse and dense networks (with various levels of sparsity), is equipped with positive global and local clustering coefficients and can have a power-law degree distribution whose exponent is easily tuned. We offer a way to perform posterior inference through an MCMC algorithm. We show the results of the estimation obtained on simulated and real data from airports connections.
Most of the talk will focus on spatial networks, but towards the end of the talk I will mention a different possibility for networks with dynamic communities.
Abstract: We consider a discrete-time voter model process on a set of nodes, each being in one of two states, either 0 or 1. In each time step, each node adopts the state of a randomly sampled neighbor according to sampling probabilities, referred to as node interaction parameters. We study the maximum likelihood estimation of the node interaction parameters from observed node states for a given number of realizations of the voter model process. In contrast to previous work on parameter estimation of network autoregressive processes, whose long-run behavior is according to a stationary stochastic process, the voter model is an absorbing stochastic process that eventually reaches a consensus state. This requires developing a framework for deriving parameter estimation error bounds from observations consisting of several realizations of a voter model process. We present parameter estimation error bounds by interpreting the observation data as being generated according to an extended voter process that consists of cycles, each corresponding to a realization of the voter model process until absorption to a consensus state. In order to obtain these results, consensus time of a voter model process plays an important role. We present new bounds for all moments and a bound that holds with any given probability for consensus time, which may be of independent interest. In contrast to most existing work, our results yield a consensus time bound that holds with high probability.
Joint work with Kaifang Zhou.
Take a look at Milan's slides (PDF).
Student Poster Session
Take a look at our poster session PDFs below:
Umut Cetin and Eduardo Ferioli - 'High Frequency Trading in Kyle-Back Model'.
Sixing Hao - 'Permutation Tests for Identifying Number of Factors for High-Dimensional Time Series'.
Binyan Jiang, Yutong Wang and Qiwei Yao - 'Estimation and Inference in Sparse Autoregressive Networks'.
Joshua Loftus, Sakina Hansen and Lucius Bynum - 'Explainable Machine Learning for Fairness: PDPs to Causal Dependence Plots'.
Pingfan Su, Joakin Andersen, Qingyuan Zhao and Chengchun Shi - 'Generalized gradient boosting for causal inference'.