10th World Congress in Probability and Statistics

Plenary Mon-1

IMS Presidential Address

Conference
9:00 AM — 10:00 AM KST
Local
Jul 18 Sun, 8:00 PM — 9:00 PM EDT

IMS Presidential Address

Regina Liu (Rutgers University), Susan Murphy (Harvard University)

14
This talk does not have an abstract.

Session Chair

Siva Athreya (Indian Statistical Institute) / Hee-Seok Oh (Seoul National University)

Invited 30

Functional Estimation, Testing and Clustering under Sparsity (Organizer: Jiashun Jin)

Conference
11:30 AM — 12:00 PM KST
Local
Jul 18 Sun, 10:30 PM — 11:00 PM EDT

Optimal Network Testing by the Signed-Polygon Statistics

Tracy Ke (Harvard University)

7
Given a symmetric social network, we are interested in testing whether it has only one community or multiple communities. The desired tests should (a) accommodate severe degree heterogeneity, (b) accommodate mixed-memberships, (c) have a tractable null distribution, and (d) adapt automatically to different levels of sparsity and achieve the optimal phase transition. How to find such a test is a challenging problem. Many existing tests do not allow for heterogeneity or mixed-memberships and cannot (a)-(d). We propose the Signed Polygon as a class of new tests. Fixing m ≥ 3, for each m-gon in the network, define a score using the centered adjacency matrix. The sum of such scores is then the m-th order Signed Polygon statistic. The Signed Quadrilateral (SgnQ) is a special example of the Signed Polygon with m=4. We show that the SgnQ test satisfies (a)-(d), and especially, they work well for both very sparse and less sparse networks. We derive the asymptotic null distribution and the power of the SgnQ test. For the matching lower bound, we use a phase transition framework, which is more informative than the standard minimax argument. The SgnQ test is applied to a coauthorship network constructed from research papers in 36 statistics journals in a 41-year time span. We demonstrate using the SgnQ test to (a) measure the coauthorship diversity and (b) build a multi-layer community tree.
This is a collaborated work with Jiashun Jin and Shengming Luo.

Statistical inference for linear mediation models with high-dimensional mediators

Runze Li (Penn State University)

8
Mediation analysis draws increasing attention in many scientific areas such as genomics, epidemiology and finance. In this paper, we propose new statistical inference procedures for high dimensional mediation models, in which both the outcome model and the mediator model are linear with high dimensional mediators. Traditional procedures for mediation analysis cannot be used to make statistical inference for high dimensional linear mediation models due to high-dimensionality of the mediators. We propose an estimation procedure for the indirect effects of the models via a partial penalized least squares method, and further establish its theoretical properties. We further develop a partial penalized Wald test on the indirect effects, and prove that the proposed test has a $\chi^2$ limiting null distribution. We also propose an $F$-type test for direct effects and show that the proposed test asymptotically follows a $\chi^2$-distribution under null hypothesis and a noncentral $\chi^2$-distribution under local alternatives. Monte Carlo simulations are conducted to examine the finite sample performance of the proposed tests and compare their performance with existing ones. A real data example is used to illustrate the proposed methodology.

Perturbation Bounds for Tensors and Their Applications in High Dimensional Data Analysis

Ming Yuan (Columbia University)

7
We develop deterministic perturbation bounds for singular values and vectors of orthogonally decomposable tensors, in a spirit similar to classical results for matrices. Our bounds exhibit intriguing differences between matrices and higher-order tensors. Most notably, they indicate that for higher-order tensors perturbation affects each singular value/vector in isolation. In particular, its effect on a singular vector does not depend on the multiplicity of its corresponding singular value or its distance from other singular values. Our results can be readily applied and provide a unified treatment to many different problems involving higher-order orthogonally decomposable tensors. In particular, we illustrate the implications of our bounds in several high dimensional data analysis problems.

Q&A for Invited Session 30

0
This talk does not have an abstract.

Session Chair

Jiashun Jin (Carnegie Mellon University)

Invited 39

KSS Invited Session: Interactive Particle Systems and Urn Models (Organizer: Woncheol Jang)

Conference
11:30 AM — 12:00 PM KST
Local
Jul 18 Sun, 10:30 PM — 11:00 PM EDT

Convergence of randomized urn models with irreducible and reducible replacement

Li-Xin Zhang (Zhejiang University)

7
Generalized Friedman urn is a popular model in probability theory. Since Athreya and Ney (1972) showed the almost sure convergence of urn proportions in a randomized urn model with irreducible replacement matrix under the $L\log L$ moment assumption, this assumption has been regarded as the weakest moment assumption, but its necessariness has never been shown. In this talk, we will consider the strong and weak convergence of generalized Friedman urns. It is proved that, when the random replacement matrix is irreducible in probability, the sufficient and necessary moment assumption for the almost sure convergence of the urn proportions is that the expectation of the replacement matrix is finite, which is less stringent than the $L\log L$ moment assumption, and that when the replacement is reducible, the $L\log L$ moment assumption is the weakest sufficient condition. The rate of convergence and the strong and weak convergence of non-homogeneous generalized Friedman urns are also derived.

Condensation phenomenon and metastability in interacting particle systems

Insuk Seo (Seoul National University)

16
In this talk, we discuss recent development in the study of the condensation phenomena appeared in interacting particle systems such as the super-critical or critical zero-range process and the inclusion process. In particular, we will focus on the metastable behavior, i.e., the evolution of the condensate after it is formed.

This talk is based on joint works with S. Kim, C, Landim, and D. Marcondes.

Time correlation exponents in planar last passage percolation

Riddhipratim Basu (International Centre for Theoretical Sciences-TIFR)

8
Planar last passage percolation (LPP) models are canonical examples of stochastic growth in the Kardar-Parisi-Zhang universality class, where one considers oriented paths (moving forward in the "time" direction) between points in a random environment accruing the integral of the noise along itself as its weight. The maximal weight of a path joining two points is called the last passage time between the points. Although these models are expected to exhibit universal features under mild conditions on the underlying i.i.d. noise, rigorous progress has mostly been limited to a handful of exactly solvable models. One of the questions in this class of models that has drawn a lot of recent attention is that of the two time correlation, i.e., two understand the correlation decay of the last passage times to a sequence of points varying along the time direction (i.e., the diagonal direction) starting from different initial data. I shall describe some results obtaining the exponents governing the short and long range correlations in the context of the exactly solvable model of planar exponential last passage percolation starting from flat and step initial data.

Based on joint works with Shirshendu Ganguly and Lingfu Zhang.

Q&A for Invited Session 39

0
This talk does not have an abstract.

Session Chair

Panki Kim (Seoul National University)

Organized 02

Nonlocal Operators Related to Probability (Organizer: Ildoo Kim)

Conference
11:30 AM — 12:00 PM KST
Local
Jul 18 Sun, 10:30 PM — 11:00 PM EDT

A Sobolev space theory for time-fractional stochastic PDE driven by Levy processe

Daehan Park (Korea Advanced Institute of Science and Technology (KAIST))

8
There are many kinds of studies related to time-fractional equations (both deterministic and stochastic). In this talk, the speaker will give a Sobolev space theory for time-fractional stochastic PDE driven by Levy processes. The existence of the kernel to represent solutions gives the way how we control the derivatives of solutions. Precisely, we find suitable decay of Fourier transform of the kernel and use the Littlewood-Paley theory. From this, we find a suitable condition to give regularity to solutions.

A regularity theory for stochastic modified Burgers' equation driven by multiplicative space-time white noise

Beom-Seok Han (Pohang University of Science and Technology)

7

General Law of iterated logarithm for Markov processes

Jaehun Lee (Korea Institute for Advanced Study)

11
In this talk, we discuss general criteria and forms of laws of iterated logarithm (LIL) for continuous-time Markov processes. We consider minimal assumptions for LILs to hold at zero (at infinity, respectively) in general metric measure spaces. We establish LILs under local assumptions near zero (near infinity, respectively) on uniform bounds of the first exit time from balls and uniform bounds on the tails of the jumping measure. We provide a general formulation of liminf and limsup LILs, which covers a large class of subordinated diffusions, jump processes with mixed polynomial local growths, jump processes with singular jumping kernels and random conductance models with long range jumps. We also introduce our recent results on the laws of iterated logarithm for occupation times of ball and local time for continuous-time Markov processes.

This talk is based on the joint work with Soobin Cho and Panki Kim.

Heat kernel estimates for subordinate Markov processes and their applications

Soobin Cho (Seoul National University)

11
In this talk, we discuss sharp two-sided estimates for transition densities of a large class of subordinate Markov processes. As applications, we show that parabolic Harnack inequality and Hölder regularity hold for parabolic functions of such processes, and derive sharp two-sided Green function estimates.

A maximal $L_p$-regularity theory to initial value problems with time measurable nonlocal operators generated by additive processes

Jae-Hwan Choi (Korea University)

8

Q&A for Organized Contributed Session 02

0
This talk does not have an abstract.

Session Chair

Ildoo Kim (Korea University)

Organized 15

Network-related Statistical Methods and Analysis (Organizer: Donghyeon Yu)

Conference
11:30 AM — 12:00 PM KST
Local
Jul 18 Sun, 10:30 PM — 11:00 PM EDT

Estimation of particulate levels using deep dehazing network and temporal prior

SungHwan Kim (Konkuk University)

5
Particulate matters (PM) have become one of the important pollutants that deteriorate public health. Since PM is ubiquitous in the atmosphere, it is closely related to life quality in many different ways. Thus, a system to accurately monitor PM in diverse environments is imperative. Previous studies using digital images have relied on individual atmospheric images, not benefiting from both spatial and temporal effects of image sequences. This weakness led to undermining predictive power. To address this drawback, we propose a predictive model using the deep dehazing cascaded CNN and temporal priors. The temporal prior accommodates instantaneous visual moves and estimates PM concentration from residuals between the original and dehazed images. The present method also provides, as by-product, high-quality dehazed image sequences superior to the nontemporal methods. The improvements are supported by various experiments under a range of simulation scenarios and assessments using standard metrics.

Graph-regularized contextual bandits with scalable Thompson sampling and semi-parametric reward models

Young-Geun Choi (Sookmyung Women's University)

6
Graph-based bandit algorithms for multiple users have received attention as the relationship among users captured by a social network or graph can improve personalized content recommendation. The graph-based Thompson sampling algorithm by Vaswani et al. (2017) is one of the state-of-the-art method where the relationship between users is represented by a simple undirected graph, however, a large graph poses computational challenges. We propose a novel Thompson sampling algorithm for multiple users with graph. We show that the proposed algorithm improves the regret bound by $\sqrt{n}$ from the algorithm by Vaswani et al. (2017), where $n$ is the number of users. Furthermore, we propose a method for a semi-parametric bandit problem with multiple users and graph, which is the first algorithm proposed in the same setting. We show that the upper bound of the cumulative regret has the same order as in the setting without the semi-parametric term. In establishing the proposed algorithms, novel local estimators play a crucial role in improving the bound with reduced computational cost.

INN: a stable method identifying clean-annotated samples via consistency effect in deep neural networks

Dongha Kim (Sungshin Women's University)

7
In classification problems with deep neural networks, researchers have been in trouble with collecting massive clean-annotated data and a lot of efforts have been put in to handle data with noisy labels. Many recent solutions for noisy label problems share the key idea so-called the memorization effect. While the memorization effect is a powerful tool, the performances are sensitive to the choice of a training epoch necessary in utilizing the memorization effect. In this paper, we introduce a new method called INN (Integration with the Nearest Neighborhoods) for refining noisy labels, which is more stable as well as more powerful. Our method is based on a new finding, called consistency effect, that the discrepancies of predictions at neighbor regions of clean labeled data and noisy labeled data are consistently observed regardless of training epochs. By applying the INN to the DivideMix algorithm, we propose a new learning framework called INN-DivideMix which improves the INN. By conducting various experiments including both performance test and ablation study, we demonstrate the superiority and stability of our proposed two methods.

An efficient parallel block coordinate descent algorithm for large-scale precision matrix estimation using graphics processing units

Donghyeon Yu (Inha University)

7
Large-scale sparse precision matrix estimation has attracted wide interest from the statistics community. The convex partial correlation selection method (CONCORD) developed by Khare et al. (2015) has recently been credited with some theoretical properties for estimating sparse precision matrices. The CONCORD obtains its solution by a coordinate descent algorithm (CONCORD-CD) based on the convexity of the objective function. However, since a coordinate-wise update in CONCORD-CD is inherently serial, a scale-up is nontrivial. In this paper, we propose a novel parallelization of CONCORD-CD, namely, CONCORD-PCD. CONCORD-PCD partitions the off-diagonal elements into several groups and updates each group simultaneously without harming the computational convergence of CONCORD-CD. We guarantee this by employing the notion of edge coloring in graph theory. Specifically, we establish a nontrivial correspondence between scheduling the updates of the off-diagonal elements in CONCORD-CD and coloring the edges of a complete graph. It turns out that CONCORD-PCD simultaneously updates off-diagonal elements in which the associated edges are colorable with the same color. As a result, the number of steps required for updating off-diagonal elements reduces from p(p-1)/2 to p-1 (for even p) or p (for odd p), where p denotes the number of variables. We prove that the number of such steps is irreducible In addition, CONCORD-PCD is tailored to single-instruction multiple-data (SIMD) parallelism. A numerical study shows that the SIMD-parallelized PCD algorithm implemented in graphics processing units (GPUs) boosts the CONCORD-CD algorithm multiple times.

Q&A for Organized Contributed Session 15

0
This talk does not have an abstract.

Session Chair

Donghyeon Yu (Inha University)

Organized 23

Recent Advances in Statistical Methods for Large Scale Complex Data (Organizer: Seyoung Park)

Conference
11:30 AM — 12:00 PM KST
Local
Jul 18 Sun, 10:30 PM — 11:00 PM EDT

Multivariate responses quantile regression for regional quantiles with applications to CCLE data

Seyoung Park (Sungkyunkwan University)

8
Cancer Cell Line Encyclopedia(CCLE) is a large-scale project that have generated resources with cancer cell lines characterized by high-dimensional molecular profiles along with pharmacological profiles. In CCLE, identifying gene-drug interaction is important to elucidate mechanisms of drug actions. Considering interrelations between the pharmacological responses, multivariate responses regression can be applied to identify meaningful gene-drug interaction. Quantile regression, as an alternative to classical linear regression, may better reveal the relation-ship between molecular profiles and pharmacological responses because quantile regression permits investigation of heterogeneity across quantiles. In this study, we propose a new multivariate responses quantile regression framework considering an interval of quantile levels. We aim to select relevant variables to any $\tau$-th conditional quantiles for multiple responses, where $\Delta$ is an interval of quantile levels of interest. We propose a penalized composite quantile regression framework with double group Lasso penalty to estimate the quantile coefficient function. In theory, we show the oracle property of the proposed estimator, in combination with a novel information criterion with theoretical guarantees. Numerical examples and applications to CCLE data demonstrate the effectiveness of the proposed method.

On sufficient graphical models

Kyongwon Kim (Ewha Womans University)

6
We introduce a Sufficient Graphical Model by applying the recently developed nonlinear sufficient dimension reduction techniques to the evaluation of conditional independence. The graphical model is nonparametric in nature, as it does not make distributional assumptions such as the Gaussian or copula Gaussian assumptions. However, unlike a fully nonparametric graphical model, which relies on the high-dimensional kernel to characterize conditional independence, our graphical model is based on conditional independence given a set of sufficient predictors with a substantially reduced dimension. In this way, we avoid the curse of dimensionality that comes with a high-dimensional kernel. We develop the population-level properties, convergence rate, and variable selection consistency of our estimate. By simulation comparisons and an analysis of the DREAM 4 Challenge data set, we demonstrate that our method outperforms the existing methods when the Gaussian or copula Gaussian assumptions are violated, and its performance remains excellent in the high-dimensional setting.

Principal component analysis in the wavelet domain

Yaeji Lim (Chung Ang University)

10
In this paper, we propose a new method of principal component analysis in the wavelet domain that is useful for dimension reduction of multiple non-stationary time series and for identify- ing important features. The proposed method is constructed by a novel combination of eigen analysis and the local wavelet spectrum defined in the locally stationary wavelet process. So, it can be expected that the proposed method reflects a more generalized non-stationary time series beyond some limited types of signals that existing methods have performed. We investigate the theoretical results of estimated principal components and their loadings. Results from numerical examples, including analysis of real seismic data and financial data, show the promising empirical properties of the proposed approach.

Bayesian inference of evolutionary models from genomic data

Yujin Chung (Kyonggi University)

6
The evolutionary history of a group of organisms explains the process of their genetic variation over time. Due to recent sequencing and computing advances, statistical inference has become an essential discipline in the study of evolutionary history from genomic data. However, typical analyses are either limited to a small amount of data or fail to estimate complex and diverse evolutionary models. In this talk, I will present a Bayesian method for estimating population/species-level history, including population sizes, splitting time of two populations, and migration rates. The method resolves statistical limitations and overcomes major roadblocks to analyze genome-scale data. Using importance sampling and a Markov chain representation of genealogy, the method scales to genomic data without mixing difficulty in a Markov chain Monte Carlo simulation. The method also provides for the calculation of the joint posterior density for all model parameters, thus resolving the problem of high false positive rates that arises for the likelihood ratio tests for migration rates using other existing Bayesian approaches. I will demonstrate the method with simulated data and real DNA sequences.

Q&A for Organized Contributed Session 23

0
This talk does not have an abstract.

Session Chair

Seyoung Park (Sungkyunkwan University)

Plenary Mon-2

Wald Lecture 1 (Martin Barlow)

Conference
7:00 PM — 8:00 PM KST
Local
Jul 19 Mon, 6:00 AM — 7:00 AM EDT

Random walks and fractal graphs

Martin Barlow (University of British Columbia)

30
This series of talks will study random walks on graphs with irregular, random or fractal structure. The motivation goes back to a 1976 article by the physicist Pierre de Gennes on percolation. Calling the simple random walk on a percolation cluster ‘the ant in the labyrinth’, he asked about its properties. It was conjectured in 1976, and has been proved in a number of cases since, that critical models in statistical physics have fractal structure.

I will review de Gennes' questions. Since random fractals are hard, a first step was to look at deterministic exact fractals, and the graphs that can be associated naturally with them. The simplest of these is the Sierpinski gasket graph (SGG), and I will start with this example. Early work in this area used direct probabilistic methods, which were often very specific to the particular graph. The search for a more robust theory leads one to look for more flexible tools, of which the first is given by the connection, between random walks and electrical networks.

Session Chair

Takashi Kumagai (Kyoto University)

Plenary Mon-3

Bernoulli Lecture (Allison Etheridge)

Conference
8:00 PM — 9:00 PM KST
Local
Jul 19 Mon, 7:00 AM — 8:00 AM EDT

Some models of spatially distributed populations: the effect of crowding

Alison Etheridge (University of Oxford)

19
We consider some models of spatially distributed populations in which we take account of the effects of local crowding on both the number of offspring produced by individuals, and the chance that those offspring survive to maturity. In particular, we would like to understand the way in which ancestry of individuals that survive in such populations is affected by different responses to crowding. A special case would be a birth-death process with an additional logistic term controlling local population growth, but the novelty here is that we can also see an influence on the way in which mature individuals disperse.

As time permits, we will touch on work with lots of people including Tom Kurtz (Madison), Peter Ralph (Oregon), Ian Letter, Aaron Smith, and Terence Tsui (all Oxford).

Session Chair

Ellen Baake (Bielefeld University)

Invited 17

Approximate Bayesian Computation (Organizer: Yanan Fan)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 19 Mon, 8:30 AM — 9:00 AM EDT

Approximate inference for ordinal linear regression

Jean-Luc Dortet-Bernadet (Université de Strasbourg)

4
Ordinal regression remains one of the most useful methods for analysing data arising from ordered responses, such as those typically found in opinion surveys. We consider a flexible linear ordinal regression model which, unlike the majority of existing ordinal regression models, allows to differentiate covariate effects between different levels in the response. For scalable inference, we develop a VB approach based on truncated Gaussian distributions. A real application on data arising from student satisfaction surveys is given.

(joint work with Yanan Fan)

Generalized Bayesian likelihood-free inference using scoring rules estimators

Ritabrata Dutta (University of Warwick)

4
We propose a framework for Bayesian Likelihood-Free Inference (LFI) based on Generalized Bayesian Inference using scoring rules (SR). SR are used to evaluate probabilistic models given an observation; a proper SR is minimised in expectation when the model corresponds to the true data generating process for the observation. Use of a strictly proper SR, for which the above minimum is unique, ensures posterior consistency of our method. Further, we prove outlier robustness of our posterior for a specific SR. As the likelihood function is intractable for LFI, we employ consistent estimators of SR using model simulations in a pseudo-marginal Monte Carlo Markov chain setup; we show the target of such a chain converges to the exact SR posterior with increasing number of simulations. Furthermore, we note popular LFI techniques such as Bayesian Synthetic Likelihood (BSL) can be seen as special cases of our framework using only proper (but not strictly so) SR. We empirically validate our consistency and outlier robustness results and show how related approaches do not enjoy these properties. Practically, we use the Energy and Kernel Scores, but our general framework sets the stage for extensions with other scoring rules.

Q&A for Invited Session 17

0
This talk does not have an abstract.

Session Chair

Yanan Fan (University of New South Wales)

Invited 20

Heavy Tailed Phenomena (Organizer: Stilian A Stoev)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 19 Mon, 8:30 AM — 9:00 AM EDT

Random linear functions of regularly varying vectors

Bikramjit Das (Singapore University of Technology and Design)

6
In various applications ranging from finance and insurance to network and environmental sciences, we encounter complex risk objects created using a combination of underlying risks which are heavy-tailed (or under certain assumptions regularly varying). A well-known result from Breiman says that the tail distribution of a product of a regularly varying random variable with another random variable remains regularly varying with the same index. We show that an extension of this result to a multivariate setting helps in quantifying a variety of extreme risks for linear combination of heavy-tailed underlying objects. In particular, we give a characterization of regular variation on sub-cones of the d-dimensional non-negative orthant, under random linear transformations. This allows us to compute probabilities of a variety of extreme events, which classical multivariate regularly varying models would report to be asymptotically negligible. Our findings are illustrated with applications to risk assessment in financial systems and reinsurance markets under a bipartite network structure. We also indicate applications of the result in computing multivariate risk measures, dimensionality reduction, and further extensions to stochastic processes.

This talk is based on joint work with Claudia Kluppelberg and Vicky Fasen-Hartmann.

Power laws and weak convergence of the Kingman coalescent

Henrik Hult (KTH Royal Institute of Technology)

5
The Kingman coalescent is an important and well studied process in population genetics modelling the ancestry of a sample of individuals. In this talk weak convergence results are presented that characterise asymptotic properties of the Kingman coalescent under parent dependent mutations, as the sample size grows to infinity. It is shown that the sampling probability satisfies a power-law and derive the asymptotic behaviour of transition probabilities of the block counting jump chain. For the normalised jump chain and number of mutations between types a limiting process is derived consisting of a deterministic component, describing the limit of the block counting jump chain, and independent Poisson processes with state-dependent intensities, exploding at the origin, describing the limit of the number of mutations. Finally, the results are extended to characterise the asymptotic performance of popular importance sampling algorithms, such as the Griffiths-Tavare algorithm and the Stephens-Donnelly algorithm.

This is joint work with Martina Favero.

Limit theorems for topological invariants of extreme sample cloud

Takashi Owada (Purdue University)

6
The main objective of this work is to study the topological crackle from the viewpoints of Topological Data Analysis (TDA) and Extreme Value Theory. TDA is a growing research area that broadly refers to the analysis of high-dimensional datasets, the main goal of which is to extract robust topological information from datasets. Topological crackle typically appears in the statistical manifold learning problem, referring to the layered structure of homological cycles generated by “noisy” samples, where the underlying distribution has a heavy tail. We establish various limit theorems (e.g., central limit theorems, strong laws of large numbers) for topological objects, including Betti numbers — a basic quantifier of homological cycles, and the Euler characteristics.

Q&A for Invited Session 20

0
This talk does not have an abstract.

Session Chair

Stilian A Stoev (University of Michigan, Ann Arbor)

Invited 33

Integrable Probability (Organizer: Tomohiro Sasamoto)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 19 Mon, 8:30 AM — 9:00 AM EDT

Reversing nonequilibrium systems

Leonid Petrov (University of Virginia)

6
A typical stochastic particle model for nonequilibrium thermodynamics starts from a densely packed initial configuration, and evolves by emanating particles into the “rarefaction fan”. Imagine having air and vacuum in two halves of a room, and removing the separating barrier. I will explain how for very special (integrable) stochastic particle systems one can explicitly “undo” the rarefaction, and construct another Markov chain which “puts the air back into its half of the room”. I will also discuss the corresponding stationary processes preserving each time-t nonequilibrium measure.

Mapping KPZ models to free fermions at positive temperature

Takashi Imamura (Chiba University)

4
We find a direct connection between solvable models in the Kardar-Parisi-Zhang (KPZ) universality class and free fermonic models at positive temperature. In studies of integrable probability during the last decade, Fredholm determinant formulas have been obtained for the one dimensional KPZ equation and its integrable discretized models. However it has been a long standing problem to understand the origin of such determinantal structures. Although the final formulas are very simple, they are usually found through complicated calculations. The situation was quite different around 2000. Using the Robinson-Shensted-Knuth (RSK) algorithm Johansson showed that the current of the totally asymmetric simple exclusion process (TASEP) corresponds to a marginal of a free fermionic system at zero temperature described by the Schur measure. In this talk, we will show that there exist more general connections between the discretized models of the KPZ equations and the free fermions. On the KPZ side the models we consider are solvable one parameter deformation of the TASEP. On the free fermionic side, such deformations in fact correspond to bringing the system to positive temperature. The connections we find is enabled by a new fundamental identity between marginals of the q-Whittaker measures and the periodic Schur measures. It is obtained by comparing the Fredholm determinant formulas for the q-Whittaker measures by Imamura-Sasamoto(2019) and for the Periodic Schur measures by Borodin (2007). We will also report briefly further insights into this topic. One is the bijective combinatorics approach to the identity. There are deep mathematical structures behind it. The other is about applications of this approach to a few KPZ models. Also reported are deformations to this identity connecting the KPZ models in half spaces with Pfaffian point process. Details of these two topics will be explained by Matteo Mucciconi and Tomohiro Sasamoto respectively.

Relaxation time limit of TASEP on a ring

Jinho Baik (University of Michigan)

7
TASEP, totally asymmetric simple exclusion process, is a standard example of an interacting particle system that belongs to the KPZ (Kadar-Parisi-Zhang) universality class. The 2-dimensional random field defined by the height fluctuations of the TASEP on an infinite line converges to an universal random field, the KPZ fixed point. In this talk we discuss what happens if the space is changed to a ring and the ring size grows large together with the time in a certain critical way so that all particles are critically correlated. This talk is based on a joint work with Zhipeng Liu.

Q&A for Invited Session 33

0
This talk does not have an abstract.

Session Chair

Tomohiro Sasamoto (Chiba University)

Organized 01

Coulomb Gases (Organizer: Paul Jung)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 19 Mon, 8:30 AM — 9:00 AM EDT

Outliers for Coulomb gases

David Garcia-Zelada (Aix-Marseille University)

8
We will be interested in a two-dimensional model of n positively charged particles at equilibrium. Our systems will be attracted to a background with negative charge distribution, and it will be seen that, as n goes to infinity and in the regions where there is no background charge, an interesting phenomenon occurs.

It is based on a joint work with Raphael Butez, Alon Nishry and Aron Wennman (arXiv:1811.12225 and arXiv:2104.03959).

Edge behaviors of 2D Coulomb gases with boundary confinements

Seong-Mi Seo (Korea Institute for Advanced Study)

10
In this talk, we will consider the local statistics of a planar Coulomb gas system which is determinantal. In a suitable external field, the Coulomb particles tend to accumulate on a set called a droplet, which is the support of the equilibrium measure associated with the external field. The most well-known boundary condition for the gas is the “free boundary”, where the particles are admitted to be outside of the droplet. On the other hand, if a boundary confinement is imposed to force the particles to be completely confined to a set, the edge behavior may change. In the presence of a hard-wall constraint to change the equilibrium, the density of the equilibrium measure acquires a singular component at the hard wall and the Coulomb gas system properly rescaled at the hard wall converges to a determinantal point process which appears in the context of truncated unitary matrices. I will present the edge behaviors of the Coulomb gas under the different boundary confinements and explain two approaches using the asymptotics of orthogonal polynomials and the rescaled version of Ward’s equation from the field theory.

Large deviations in the quantum quasi-1D jellium

Christian Hirsch (University of Groningen)

5
Wigner's jellium is a model for a gas of electrons. The model consists of $N$ unit negatively charged particles lying in a sea of neutralizing homogeneous positive charge spread out according to Lebesgue measure, and interactions are governed by the Coulomb potential. In this work, we consider the quantum jellium on quasi-one-dimensional spaces with Maxwell-Boltzmann statistics. Using the Feynman-Kac representation, we replace particle locations with Brownian bridges. We then adapt the approach of Leblé and Serfaty (2017) to prove a process-level large deviation principle for the empirical fields of the Brownian bridges.

Lemniscate ensembles with spectral singularities

Sung-Soo Byun (Seoul National University)

11
In this talk, I will discuss a family of determinantal Coulomb gases, which tend to occupy lemniscate type droplets in the large system. For these lemniscate ensembles under the insertion of a point charge, I will present the scaling limits at the singular boundary point, which are expressed in terms of the solution to the Painlevé IV Riemann-Hilbert problem. I will also explain the main ingredients of the proof, which include a version of the Christoffel-Darboux identity and the strong asymptotic behaviour of the associated orthogonal polynomials.

Q&A for Organized Contributed Session 01

0
This talk does not have an abstract.

Session Chair

Paul Jung (Korea Advanced Institute of Science and Technology (KAIST))

Organized 04

Interacting Particle Systems and Inclusion Process (Organizer: Cristian Giardinà)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 19 Mon, 8:30 AM — 9:00 AM EDT

Metastability in the reversible inclusion process

Sander Dommers (University of Hull)

8
In the reversible inclusion process with a fixed number of particles on a finite graph each particle at a site x jumps to site y at rate $(d+\eta_y)r(x,y)$, where d is a diffusion parameter, $\eta_y$ is the number of particles on site y and r(x,y) is the jump rate from x to y of an underlying reversible random walk. When the diffusion d tends to 0 as the number of particles tends to infinity, the particles cluster together to form a condensate. It turns out that these condensates only form on the sites where the underlying random walk spends the most time. Once such a condensate is formed the particles stick together and the condensate performs a random walk itself on much longer timescales, which can be seen as metastable (or tunnelling) behaviour. We study the rates at which the condensate jumps and characterize the behavior on the shortest time scale at which jumps occur. This generalizes work by Grosskinsky, Redig and Vafayi who study the symmetric case. Our analysis is based on the martingale approach by Beltrán and Landim.

This is joint work with Alessandra Bianchi and Cristian Giardinà.

Metastability in the reversible inclusion process II: multiple timescales

Alessandra Bianchi (Università di Padova)

8
The inclusion process (IP) is a stochastic lattice gas where particles perform random walks subjected to mutual attraction, thus providing the natural bosonic counterpart of the well-studied exclusion process. Due to the attractive interaction between particles, the IP can exhibit a condensation transition, where a positive fraction of all particles concentrates on a single site. In this talk, following the setting and results presented by S. Dommers, we consider the reversible IP on a finite set S in the limit of total number of particles going to infinity, and focus on the characterization of multiple timescales. Their presence will be related to some properties of the underlying random walk, and in particular to specific connectivity features of the underlying dynamics when restricted to points that maximize the reversible measure. We approach the problem starting from potential theoretic techniques and following some recent related ideas developed in a few papers we will refer to.

Joint work with S. Dommers and C. Giardinà.

Condensation and metastability of general Inclusion processes

Seonwoo Kim (Seoul National University)

9
In this talk, we present various results regarding the phenomenon of condensation and metastability of inclusion processes on a wide class of underlying graphs. Condensation denotes the phenomenon when a majority of the particles assemble on a single site. It occurs in various interacting particle systems, including the current one, due to the attractive behavior of the particles. In a bigger time scale, because of the small randomness of the system, the formed condensate breaks up and forms another one on a different site. This is a typical example of the metastable behavior, which is quite ubiquitous in numerous stochastic systems. The metastable behavior of the current model is known to exhibit the scheme of multiple time scales; thus, we first explain the reason of presence of such a scheme. Moreover, the fundamental behavior of metastability dramatically differs between the reversible and non-reversible ones. Therefore, the main results are divided into two parts: reversible case and non-reversible case. In the reversible case, the results are known fairly in details. In the context of multiple time scales scenario, the metastable behavior is fully characterized up to the second time scale. We present the known results along with the conjectures for the unsolved regime. In the non-reversible case, a much more complicated scenario emerges even in the first time scale. The main difficulty is that we do not have an explicit formula for the invariant measure. We explain how to overcome this drawback and characterize the condensation and metastability in the first time scale which generalizes the previous results in the reversible case.

Condensation of SIP particles and sticky Brownian motion

Gioia Carinci (Università di Modena e Reggio Emilia)

6
The symmetric inclusion process (SIP) is a particle system with attractive interaction. We study its behavior in the condensation regime attained for large values of the attraction intensity. Using Mosco convergence of Dirichlet forms, we prove convergence to sticky Brownian motion for the distance of two SIP particles. We use this result to obtain, via duality, an explicit scaling for the variance of the density field in this regime, for the SIP initially started from a homogeneous product measure. This provides relevant new information on the coarsening dynamics of condensing particle systems on the infinite lattice.
Joint works with M. Ayala, C Giardina and F. Redig

Condensed phase structure in Inclusion processes

Watthanan Jatuviriyapornchai (Mahidol University)

6
We establish a complete picture of condensation in the inclusion process in the thermodynamic limit. The condensed phase structure is derived by a size-biased sampling of occupation numbers. Our results cover the entire scaling regimes of the diffusion parameter, especially an interesting hierarchical structure characterized by the Poisson-Dirichlet distribution. Whereas our results are rigorous, Monte-Carlo simulation and recursive numerics for partition functions are presented to illustrate the main points.

Q&A for Organized Contributed Session 04

0
This talk does not have an abstract.

Session Chair

Cristian Giardinà (University of Modena and Reggio Emilia)

Organized 22

Recent Progress of Statistical Inference for Economics and Social Science (Organizer: Eun Ryung Lee)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 19 Mon, 8:30 AM — 9:00 AM EDT

A spline-based modeling approach for time-Indexed multilevel data

Eun Ryung Lee (Sungkyunkwan University)

7
This paper introduces a spline-based multilevel approach for analyzing a time-indexed data collected from an automated platform. The proposed method is computationally efficient and easy to implement, and useful for analyzing data that have hierarchical structure with varying complexity at different levels. An estimation procedure is developed combining Expectation-Maximization algorithm and nonparametric regression approach. The theoretical properties of the proposed methods are derived. The proposed estimator is shown to belong to a well-known linear smoother class. Thus, further statistical inference can be easily adopted from the existing literature on linear smoothers. An R package for the proposed methodology is provided in an open repository. The effectiveness of the approach is illustrated using music concert event data collected in the United States for several years.

Impulse response analysis for sparse high-dimensional time series

Carsten Trenkler (University of Mannheim)

6
We consider structural impulse response analysis for sparse high-dimensional vector autoregressive (VAR) systems. First, we present a consistent estimation approach in the high-dimensional setting. Second, we suggest a valid inference procedure. Inference is more involved since standard procedures, like the delta-method, do not lead to valid inference in our set-up. Therefore, by using the local projection equations, we first construct a de-sparsified version of regularized estimators of the moving average parameters that are associated with the VAR system. In order to obtain estimators of the structural impulse responses we combine these de-sparsified estimators with a non-regularized estimator of the contemporaneous impact matrix in such a way that that the high-dimension is taken into account. We show that the estimators of the structural impulse responses have a Gaussian limiting distribution. Moreover, we also present a valid bootstrap procedure. Applications of the inference procedure are confidence intervals of the impulse responses as well as tests for forecast error variance decompositions that are often used to construct connectedness measures. Our procedure is illustrated by means of simulations.

Revealing cluster structures based on mixed sampling frequencies: with an application to the state-level labor markets

Yeonwoo Rho (Michigan Technological University)

3
This paper proposes a new linearized mixed data sampling (MIDAS) model and develops a framework to infer clusters in a panel regression with mixed frequency data. The linearized MIDAS estimation method is more flexible and substantially simpler to implement than competing approaches. We show that the proposed clustering algorithm successfully recovers true membership in the cross-section, both in theory and in simulations, without requiring prior knowledge of the number of clusters. This methodology is applied to a mixed-frequency Okun’s law model for state-level data in the U.S. and uncovers four meaningful clusters based on the dynamic features of state-level labor markets.

Semiparametric efficient estimators in heteroscedastic error models

Mijeong Kim (Ewha Womans University)

4
In the mean regression context, this study considers several frequently encountered heteroscedastic error models where the regression mean and variance functions are specified up to certain parameters. An important point we note through a series of analyses is that different assumptions on standardized regression errors yield quite different efficiency bounds for the corresponding estimators. Consequently, all aspects of the assumptions need to be specifically taken into account in constructing their corresponding efficient estimators. This study clarifies the relation between the regression error assumptions and their, respectively, efficiency bounds under the general regression framework with heteroscedastic errors. Our simulation results support our findings; we carry out a real data analysis using the proposed methods where the Cobb-Douglas cost model is the regression mean.

Q&A for Organized Contributed Session 22

0
This talk does not have an abstract.

Session Chair

Eun Ryung Lee (Sungkyunkwan University)

Contributed 01

Stochastic Partial Differential Equations

Conference
9:30 PM — 10:00 PM KST
Local
Jul 19 Mon, 8:30 AM — 9:00 AM EDT

A sobolev space theory for SPDEs with space-time nonlocal operotors

Junhee Ryu (Korea University)

7

Improved stability for linear SPDEs using mixed boundary/internal controls

Dan Goreac (University Shandong Weiha, University Gustave Eiffel)

3
This talk based on a joint work with I. Munteanu ("Al. I Cuza" Univ., Iasi) is motivated by the asymptotic stabilization of abstract Stochastic PDEs of linear type. As a first step, we exhibit an abstract contribution to the exact controllability (in a general Lp-sense, p > 1) of a class of linear SDEs with general (but time-invariant) rank control coefficient in the noise term. Second, we illustrate on relevant frameworks of SPDEs, a way to drive exactly to 0 their unstable part (of dimension $n \ge 1$) by using M internal (respectively N boundary) controls such that max {M, N} < n. Some examples are presented as is the minimal gain for judicious control dimensions.

Law of the large numbers and Central limit theorems for stochastic heat equations

Kunwoo Kim (Pohang University of Science and Technology)

7

The stochastic heat equation with Lévy noise: existence, moments and intermittency

Carsten Chong (Columbia University)

4
In this talk, we present results about existence, moments and large-time asymptotics of the solution to the stochastic heat equation driven by a Lévy space-time white noise, proved using a combination of decoupling techniques, point process methods and change-of-measure techniques. As one of the more surprising results, we show that the solution exhibits the phenomenon of intermittency for all exponents in all dimensions and for all non-Gaussian Lévy noises, which is fundamentally different to what is known in the Gaussian case. Moreover, we demonstrate that the behavior of the intermittency exponents in terms of a coupling constant depends critically on whether the Lévy noise is light- or heavy-tailed.

This is based on joint work with Quentin Berger (Sorbonne) and Hubert Lacoin (IMPA).

Q&A for Contributed Session 01

0
This talk does not have an abstract.

Session Chair

Kunwoo Kim (Pohang University of Science and Technology)

Contributed 06

Various Aspects of Diffusion Processes

Conference
9:30 PM — 10:00 PM KST
Local
Jul 19 Mon, 8:30 AM — 9:00 AM EDT

On nonlinear filtering of jump diffusions

Fabian Germ (University of Edinburgh)

5
We consider a multi-dimensional signal and observation model, $Z_t=(X_t,Y_t)$, which is a jump diffusion, i.e., the solution of an SDE driven by Wiener processes and Poisson martingale measures. The multi-dimensional “signal” $X_t$ is not observable, and we are interested in classic questions about the mean square estimate of $X_t$ for each time t, given the “observations” $Y_s$ for s in [0,t]. These questions were intensively studied for partially observable diffusion processes $Z_t$ in various generality in the past, and a quite complete filtering theory for diffusion processes was developed. Our aim is to contribute to recent studies in extending the nonlinear filtering theory of diffusion processes to that of jump diffusions. We allow the signal and observation noises to be correlated and the infinitesimal generator of $Z_t$ to be degenerate in the coordinate directions of the signal. First we present the filtering equations: the equations for the time evolution of the conditional distribution $P_t$ and of an unnormalised conditional distribution $Q_t$ of $X_t$, given the observations $Y_s$ for s in [0,t]. These equations are (possibly) degenerate stochastic integro-differential equations. They are stochastic PDEs, often referred as the Kushner-Shiryayev and Zakai equations, respectively, in the special case when Z_t is a diffusion process. Next we present new results on existence and uniqueness of the solutions to the filtering equations in $L_p$-spaces. Finally, using our results on the solutions to the filtering equations, we give conditions ensuring that the conditional density $p_t:=dP_t/dx$, with respect to Lebesgue measure exists and belongs to an $L_p$ space for each $t>0$. Moreover, under quite general regularity conditions on the initial conditional density $p_0$ and on the coefficients of the SDE for $Z_t$, we prove that the process $p_t$ is a cadlag process with values in Bessel potential spaces $H^s_p$ and in Slobodeckij spaces $W^s_p$.

Uniqueness and superposition of the distribution-dependent Zakai equations

Huijie Qiao (Southeast University)

4
The work concerns the Zakai equations from nonlinear filtering problems of McKean-Vlasov stochastic differential equations with correlated noises. First, we establish the Kushner-Stratonovich equations, the Zakai equations and the distribution-dependent Zakai equations. And then, the pathwise uniqueness, uniqueness in joint law and uniqueness in law of weak solutions for the distribution-dependent Zakai equations are shown. Finally, we prove a superposition principle between the distribution-dependent Zakai equations and distribution-dependent Fokker-Planck equations. As a by-product, we give some conditions under which distribution-dependent Fokker-Planck equations have a weak solutions.

Quadratic variation and quadratic roughness

Purba Das (University of Oxford)

4
We study the concept of quadratic variation of a continuous path along a sequence of partitions and its dependence with respect to the choice of the partition sequence. We define the concept of quadratic roughness of a path along a partition sequence and show that, for Hölder-continuous paths satisfying this roughness condition, the quadratic variation along balanced partitions is invariant with respect to the choice of the partition sequence. Typical paths of Brownian motion are shown to satisfy this quadratic roughness property almost-surely along any partition with a required step size condition. Using these results we derive a formulation of Föllmer's pathwise integration along paths with finite quadratic variation which is invariant with respect to the partition sequence.

Eyring-Kramers formula for non-reversible metastable diffusion processes

Jungkyoung Lee (Seoul National University)

9
In this talk, we consider diffusion processes that admit a Gibbs invariant measure but are non-reversible. Such diffusion processes exhibit metastable behavior if the associated potential function owns multiple local minima. For this model, we provide a proof of the Eyring-Kramers formula which provides sharp asymptotics of the mean of the transition time from a local minimum to a deeper one. In particular, our work indicates that the metastable transitions of non-reversible processes are faster than that of reversible ones.

Q&A for Contributed Session 06

0
This talk does not have an abstract.

Session Chair

Insuk Seo (Seoul National University)

Contributed 18

Inference on Dependence

Conference
9:30 PM — 10:00 PM KST
Local
Jul 19 Mon, 8:30 AM — 9:00 AM EDT

Covariance networks for functional data on multidimensional domains

Soham Sarkar (Ecole Polytechnique Federale de Lausanne)

4
Covariance estimation is ubiquitous in functional data analysis. Yet, the case of functional observations over multidimensional domains introduces computational and statistical challenges, rendering the standard methods effectively inapplicable. To address this problem, we introduce Covariance Networks (CovNet) as a modeling and estimation tool. The CovNet model is universal — it can be used to approximate any covariance up to desired precision. Moreover, the model can be fitted efficiently to the data and its neural network architecture allows us to employ modern computational tools in the implementation. The CovNet model also admits a closed-form eigen-decomposition, which can be computed efficiently, without constructing the covariance itself. This facilitates easy storage and subsequent manipulation in the context of the CovNet. Moreover, we establish consistency of the proposed estimator and derive its rate of convergence. The usefulness of the proposed method is demonstrated using an extensive simulation study.

Large dimensional sample covariance matrices with independent columns and diagonalizable simultaneously population covariance matrices

Tianxing Mei (The University of Hong Kong)

6
We consider the limiting behavior of empirical spectral distribution (ESD) of a sample covariance matrix for n independent but not necessary identical distributed samples with their corresponding population covariance matrices diagonalizable simultaneously asymptotically, when dimension of samples grows proportionally with the sample size. The existing works of different types of sample covariance matrices, including the weighted sample covariance matrix, the centered Gram matrix model and that of linear times series models with diagonalizable simultaneous coefficient matrices, can be covered by our approach. As applications, we obtain the existence and uniqueness of the limiting spectral distribution (LSD) of realized covariance matrix for a multidimensional diffusion process with its co-volatility process equipped with an anisotropic time-varying spectrum. Meanwhile, for a matrix-valued autoregressive model, we derive the common limiting spectral distribution of the sample covariance matrix for each matrix-valued observation when both row and column dimension are large.

Random surface covariance estimation by shifted partial tracing

Tomas Masak (École polytechnique fédérale de Lausanne)

4
The problem of covariance estimation for replicated surface-valued processes is examined from the functional data analysis perspective. Considerations of statistical and computational efficiency often compel the use of separability of the covariance, even though the assumption may fail in practice. We consider a setting where the covariance structure may fail to be separable locally - either due to noise contamination or due to the presence of a non-separable short-range dependent signal component. That is, the covariance is an additive perturbation of a separable component by a non-separable but banded component. We introduce non-parametric estimators hinging on the novel concept of shifted partial tracing, enabling computationally efficient estimation of the model under dense observation. Due to the denoising properties of shifted partial tracing, our methods are shown to yield consistent estimators even under noisy discrete observation, without the need for smoothing. Further to deriving the convergence rates and limit theorems, we also show that the implementation of our estimators, including for the purpose of prediction, comes at no computational overhead relative to a separable model. Finally, we demonstrate empirical performance and computational feasibility of our methods in an extensive simulation study and on a real data set.

This is a joint work with Victor M. Panaretos.

Q&A for Contributed Session 18

0
This talk does not have an abstract.

Session Chair

Minsun Song (Sookmyung Women’s University)

Contributed 24

Time Series Analysis I

Conference
9:30 PM — 10:00 PM KST
Local
Jul 19 Mon, 8:30 AM — 9:00 AM EDT

Statistical modelling of rainfall time series using ensemble empirical mode decomposition and generalised extreme value distribution

Willard Zvarevashe (North West University)

3
The extreme rainfall patterns have direct and indirect effect on all earths spheres particularly the hydrosphere, biosphere and lithosphere. Therefore, an understanding of the extreme rainfall patterns is very important for future planning and management. In this study, using Western Cape (South African province) as a case study, the rainfall time series is decomposed into intrinsic mode functions (IMFs) using a data adaptive method, ensemble empirical mode decomposition. The IMFs are modelled using generalised extreme value distribution (GEVD). The model diagnosis and selection using QQ-Plot, PP-Plot and Akaike information criterion shows that the decomposed IMFs have better models than the original rainfall time series. The rainfall modelling using decomposed data may assist in future planning and further research by providing better predictions.

Regularity of multifractional moving average processes with random Hurst exponent

Fabian Mies (RWTH Aachen University)

3
A recently proposed alternative to multifractional Brownian motion (mBm) with random Hurst exponent is studied, which we refer to as Itô-mBm. It is shown that Itô-mBm is locally self-similar. In contrast to mBm, its pathwise regularity is almost unaffected by the roughness of the functional Hurst parameter. The pathwise properties are established via a new polynomial moment condition similar to the Kolmogorov-Centsov theorem, allowing for random local Hölder exponents. Our results are applicable to a broad class of moving average processes where pathwise regularity and long memory properties may be decoupled, e.g. to a multifractional generalization of the Matérn process.

High-frequency instruments and identification-robust inference for stochastic volatility models

Md. Nazmul Ahsan (Concodia University)

3
We introduce a novel class of generalized stochastic volatility (GSV) models, which can utilize and relate many high-frequency realized volatility (RV) measures to the latent volatility. Instrumental variable methods are employed to provide a unified framework for GSV models' analysis (estimation and inference). We study parameter inference problems in GSV models with nonstationary volatility and exogenous predictors in the latent volatility process. We develop identification-robust methods for joint hypotheses involving the volatility persistence parameter and the composite error's autocorrelation parameter (or the noise ratio) and apply projection techniques for inference about the persistence parameter. The proposed tests include Anderson-Rubin-type tests, dynamic versions of the split-sample procedure, and point-optimal versions of these tests. For distributional theory, three sets of assumptions are considered: we provide exact tests and confidence sets for Gaussian errors, establish exact Monte Carlo test procedures for non-Gaussian errors (possibly heavy-tailed), and show asymptotic validity under weaker distributional assumptions. Simulation results show that the proposed tests outperform the asymptotic test regarding size and exhibit excellent power in empirically realistic settings. We apply our inference methods to IBM's price and option data (2009-2013). We consider 175 different instruments (IV's) spanning 22 classes and analyze their ability to describe the low-frequency volatility. The IV's are compared based on the average length of confidence intervals, which are produced by the proposed tests. The superior instrument set mostly consists of 5-minute HF realized measures, and these IV's produce confidence sets where the volatility persistence parameter lies roughly between 0.85 and 1.0. We find RVs with higher frequency produce wider confidence intervals compared to RVs with slightly lower frequency, showing that these confidence intervals adjust to absorb market microstructure noise or discretization error. Further, when we consider irrelevant or weak IV's (jumps and signed jumps), the proposed tests produce unbounded confidence intervals.

Q&A for Contributed Session 24

0
This talk does not have an abstract.

Session Chair

Kyongwon Kim (Ewha Womans University)

Contributed 30

Spatio-temporal Data Analysis

Conference
9:30 PM — 10:00 PM KST
Local
Jul 19 Mon, 8:30 AM — 9:00 AM EDT

Statistical inference for mean function of longitudinal imaging data over complicated domains

Jie Li (Tsinghua University)

5
Motivated by longitudinal imaging data which possesses inherent spatial and temporal correlation, we propose a novel procedure to estimate its mean function. Functional moving average is applied to depict the dependence among temporally ordered images and flexible bivariate splines over triangulations are used to handle the irregular domain of images which is common in imaging studies. Both global and local asymptotic properties of the bivariate spline estimator for mean function are established with simultaneous confidence corridors (SCCs) as a theoretical byproduct. Under some mild conditions, the proposed estimator and its accompanying SCCs are shown to be consistent and oracle efficient as if all images were entirely observed without errors. The finite sample performance of the proposed method through Monte Carlo simulation experiments strongly corroborates the asymptotic theory. The proposed method is illustrated by analyzing two sea water potential temperature data sets.

Gaussian linear dynamic spatio-temporal models and time asymptotics

Suman Guha (Presidency University, Kolkata)

5
Gaussian linear dynamic spatio-temporal models (LDSTMs) are linear gaussian state-space models for spatio-temporal data which contains deterministic or (and) stochastic spatio-temporal covariates besides the spatio-temporal response. They are extensively used to model discrete-time spatial time series data. The model fitting is carried out either by classical maximum likelihood approach or by calculating Bayesian maximum a posteriori estimate of the unknown parameters. While their finite sample behaviour is well studied, literature on their asymptotic properties is relatively scarce. Classical theory on asymptotic properties of maximum likelihood estimator for linear state-space models is not applicable as it hinges on the assumption of asymptotic stationarity of covariate processes, which is seldom satisfied by discrete-time spatial time series data. In this article, we consider a very general Gaussian LDSTM that can accommodate arbitrary spatio-temporal covariate processeses which grow like power functions wrt. time in deterministic or (and) suitable stochastic sense. We show that under very minimal assumptions, any approximate MLE and Bayesian approximate MAPE of some of the unknown parameters and parametric functions are strongly consistent. Furthermore, building upon the strong consistency theorems we also establish rate of convergence results for both approximate MLE and approximate MAPE.

High-dimensional spectral analysis

Jonas Krampe (University of Mannheim)

7
An important part of multivariate time series analysis is the spectral domain and here key quantities are here the spectral density matrix and the partial coherence. Under a high-dimensional set up, we present how inference can be derived for the partial coherence. Furthermore, we present a valid statistical test for the statistical hypothesis that the partial coherence is almost everywhere smaller than a given bound that includes testing whether the partial coherence is zero or not. Applications of this are among others the construction of graphical interaction models helpful for analyzing functional connectivity among brain regions. We illustrate our procedure by means of simulations and a real data application.

Extreme value analysis for mixture models with heavy-tailed impurity

Ekaterina Morozova (National Research University Higher School of Economics)

6
While there exists a well-established theory for the asymptotic behaviour of maxima of the i.i.d. sequences, very few results are available for the triangular arrays, when the distribution can change over time. Typically, the papers on this issue deal with convergence to the Gumbel law or twice-differentiable distribution. It contributes to the aforementioned problem by providing the extreme value analysis for mixture models with varying parameters, which can be viewed as triangular arrays. In particular, we consider the case of the heavy-tailed impurity, which appears when one of the components has a heavy-tailed distribution, and the corresponding mixing parameter tends to zero as the number of observations grows. We analyse two ways of modelling this impurity, namely, by the non-truncated regularly varying law and its upper-truncated version with an increasing truncation level. The set of possible limit distributions for maxima turns out to be much more diverse than in the classical setting, especially for a mixture with the truncated component, where it includes four discontinuous laws. In the latter case, the resulting limit depends on the asymptotic behaviour of the truncation point, which is shown to be related to the truncation regimes introduced in [1]. For practical purposes we describe the procedure of the application of the considered model to the analysis of financial returns.

The current research is a joint work with Vladimir Panov, available as a preprint on arXiv.org [2].

References:
[1] Chakrabarty, A. and Samorodnitsky, G. (2012). Understanding heavy tails in a bounded world or, is a truncated heavy tail heavy or not? Stochastic models, 28(1), 109–143.
[2] Panov, V. and Morozova, E. (2021). Extreme value analysis for mixture models with heavy-tailed impurity. arXiv preprint arXiv:2103.07689.

Q&A for Contributed Session 30

0
This talk does not have an abstract.

Session Chair

Seoncheol Park (Chungbuk National University)

Contributed 34

Stochastic Process / Modeling

Conference
9:30 PM — 10:00 PM KST
Local
Jul 19 Mon, 8:30 AM — 9:00 AM EDT

Ruin probabilities in the presence of risky investments and random switching

Konstantin Borovkov (The University of Melbourne)

6
We consider a reserve process where claim times form a renewal process, while between the claim times the process has the dynamics of geometric Brownian motion-type Itô processes with time-dependent random coefficients that are “reset” after each jump. Following the approach of Pergamenshchikov and Zeitoni (2006), we use the implicit renewal theory to obtain power-function bounds for the eventual ruin probability. In the special case of the gamma-distributed claim inter-arrival times and geometric Brownian motions with random coefficients, we obtain necessary and sufficient conditions for existence of Lundberg’s exponent (ensuring the power function behaviour for the ruin probability). [Joint work with Roxanne He.]

Wasserstein convergence rates for random bit approximations of continuous Markov processes

Thomas Kruse (University of Giessen)

3
We determine the convergence speed of the EMCEL scheme for approximating one-dimensional continuous strong Markov processes. The scheme is based on the construction of certain Markov chains whose laws can be embedded into the process with a sequence of stopping times. Under a mild condition on the process’ speed measure we prove that the approximating Markov chains converge at fixed times at the rate of $1/4$ with respect to every $p$-th Wasserstein distance. For the convergence of paths, we prove any rate strictly smaller than $1/4$. These results apply, in particular, to processes with irregular behavior such as solutions of SDEs with irregular coefficients and processes with sticky points. Moreover, we present several further properties of the EMCEL scheme and discuss its differences from the Euler scheme.

The talk is based on joint works with Stefan Ankirchner, Wolfgang Löhr and Mikhail Urusov.

The discrete membrane model on trees

Biltu Dan (Indian Institute of Science)

4
The discrete membrane model (MM) is a random interface model for separating surfaces that tend to preserve curvature. It is similar to the discrete Gaussian free field (DGFF) for which the most likely interfaces are those preserving mean height. However working with the two models presents some key differences. In particular, a lot of tools (electrical networks, random walk representation for the covariance) are available for the DGFF and lack in the MM. In this talk we will investigate a random walk representation for the covariance of the MM and by means of it will define and study the MM on regular trees. In particular, we will study the scaling limit of the maxima of the MM on regular trees.

On a two-server queue with consultation by main server with protected phases of service

Resmi Thekkiniyedath (KKTM Government College)

3
This paper analyses a two-server queueing model with consultations given by the main server to the regular server. The main server not only serves customers but also provides consultation to the regular server with a pre-emptive priority over customers. The customers at the main server undergo interruptions during their service. The interruption is not allowed to a customer at the main server if the service is in any one of the protected phases of service. There are upper bounds for the number of interruptions to a customer at the main server and the number of consultations to the regular server during the service of a customer. A super clock also determines whether to allow further interruption to a customer at the main server or not. A threshold clock decides the restart or resumption of the services at the main and regular servers after each consultation. The arrival process and requirement of consultation follow mutually independent Poisson processes. The service times at the main server and the regular server are assumed to follow mutually independent phase type distributions. The stability condition is established and some performance measures are studied numerically.

Q&A for Contributed Session 34

0
This talk does not have an abstract.

Session Chair

Jaehong Jeong (Hanyang University)

Invited 13

Critical Phenomena in Statistical Mechanics Models (Organizer: Akira Sakai)

Conference
10:30 PM — 11:00 PM KST
Local
Jul 19 Mon, 9:30 AM — 10:00 AM EDT

Recent results for critical lattice models in high dimensions

Mark Holmes (University of Melbourne)

9
We’ll discuss recent results concerning the limiting behaviour of critical lattice models (e.g. lattice trees and the voter model) in high dimensions. In particular (i) results with Ed Perkins on the scaling limit of the range (the set of vertices ever visited/occupied by the model), and (ii) results with Cabezas, Fribergh, and Perkins on weak convergence of the historical processes and of random walks on lattice trees.

Near-critical avalanches in 2D frozen percolation and forest fires

Pierre Nolin (City University of Hong Kong)

8
We discuss two closely related processes in two dimensions: frozen percolation, where connected components of occupied vertices freeze (they stop growing) as soon as they reach a given size, and forest fire processes, where connected components are hit by lightning (and thus become entirely vacant) at a very small rate. When the density of occupied sites approaches the critical threshold for Bernoulli percolation, both processes display a striking phenomenon: the appearance of what we call "near-critical avalanches”. We study these avalanches, all the way up to the natural characteristic scale of each model, which constitutes an important step toward understanding the self-organized critical behavior of such processes. In the case of forest fires, it is crucial to analyze the effect of fires on the connectivity of the forest. For this purpose, a key tool is a percolation model where regions ("impurities") are removed from the lattice, in an independent fashion. The macroscopic behavior of this process is quite subtle, since the impurities are not only microscopic, but also allowed to be mesoscopic.

This talk is based on joint works with Rob van den Berg (CWI and VU, Amsterdam) and with Wai-Kit Lam (University of Minnesota).

Quenched and annealed Ising models on random graphs

Cristian Giardinà (Modena & Reggio Emilia University)

6
The ferromagnetic Ising model on a lattice is a paradigmatic model of statistical physics used to study phase transitions in lattice systems. In this talk, I shall consider the setting where the regular spatial structure of a lattice is replaced by a random graph, which is often used to model complex networks. I shall treat both the case where the graph is essentially frozen (quenched setting) and the case where instead it is rapidly changing (annealed setting). I shall prove that quenched and annealed may have different critical temperatures, provided the graph degrees are allowed to fluctuate. I shall also discuss how universal results (law of large numbers, central limit theorems, critical exponents) are affected by the disorder in the spatial structure.

The picture that I will present emerges from several joint works, involving V.H. Can, S. Dommers, C. Giberti, R.van der Hofstad and M.L.Prioriello.

Q&A for Invited Session 13

0
This talk does not have an abstract.

Session Chair

Akira Sakai (Hokkaido University)

Invited 15

Privacy (Organizer: Angelika Rohde)

Conference
10:30 PM — 11:00 PM KST
Local
Jul 19 Mon, 9:30 AM — 10:00 AM EDT

The Right Complexity Measure in Locally Private Estimation: It is not the Fisher Information

John Duchi (Stanford University)

2
We identify fundamental tradeoffs between statistical utility and privacy under local models of privacy in which data is kept private even from the statistician, providing instance-specific bounds for private estimation and learning problems by developing the local minimax risk. In contrast to approaches based on worst-case (minimax) error, which are conservative, this allows us to evaluate the difficulty of individual problem instances and delineate the possibilities for adaptation in private estimation and inference. Our main results show that the local modulus of continuity of the estimand with respect to the variation distance—as opposed to the Hellinger distance central to classical statistics—characterizes rates of convergence under locally private estimation for many notions of privacy, including differential privacy and its relaxations. As a consequence of these results, we identify an alternative to the Fisher information for private estimation, giving a more nuanced understanding of the challenges of adaptivity and optimality.

Sequentially interactive versus non-interactive local differential privacy: estimating the quadratic functional

Lukas Steinberger (University of Vienna)

2
We develop minimax rate optimal locally differentially private procedures for estimating the integrated square of the data generating density. A sequentially interactive two-step procedure is found to outperform the best possible non-interactive method even in terms of convergence rate. This is in stark contrast to many other private estimation problems (e.g., those where the estimand is a linear functional of the data generating distribution) where it is known that sequential interaction between data owners can not lead to faster rates of estimation than that of an optimal non-interactive method.

Gaussian differential privacy

Weijie Su (University of Pennsylvania)

3
Privacy-preserving data analysis has been put on a firm mathematical foundation since the introduction of differential privacy (DP) in 2006. This privacy definition, however, has some well-known weaknesses: notably, it does not tightly handle composition. In this talk, we propose a relaxation of DP that we term "f-DP", which has a number of appealing properties and avoids some of the difficulties associated with prior relaxations. First, f-DP preserves the hypothesis testing interpretation of differential privacy, which makes its guarantees easily interpretable. It allows for lossless reasoning about composition and post-processing, and notably, a direct way to analyze privacy amplification by subsampling. We define a canonical single-parameter family of definitions within our class that is termed "Gaussian Differential Privacy", based on hypothesis testing of two shifted normal distributions. We prove that this family is focal to f-DP by introducing a central limit theorem, which shows that the privacy guarantees of any hypothesis-testing based definition of privacy (including differential privacy) converge to Gaussian differential privacy in the limit under composition. This central limit theorem also gives a tractable analysis tool. We demonstrate the use of the tools we develop by giving an improved analysis of the privacy guarantees of noisy stochastic gradient descent.

This is joint work with Jinshuo Dong and Aaron Roth.

Q&A for Invited Session 15

0
This talk does not have an abstract.

Session Chair

Angelika Rohde (University of Freiburg)

Invited 24

Random Planar Geometries (Organizer: Nina Holden)

Conference
10:30 PM — 11:00 PM KST
Local
Jul 19 Mon, 9:30 AM — 10:00 AM EDT

Markovian infinite triangulations

Thomas Budzinski (École normale supérieure de Lyon)

5
We say that a random infinite planar triangulation T is Markovian if for any small triangulation t with boundaries, the probability to observe t around the root of T only depends on the boundaries and the total size of t. Such a property can be expected from the local limits of many natural models of random maps. An important example is the UIPT of Angel and Schramm, which is the local limit of large uniform triangulations of the sphere. We will classify completely infinite Markovian planar triangulations, without any assumption on the number of ends. In particular, we will see that there is (almost) no model of multi-ended Markovian triangulation. As an application, we will prove, without to rely on enumerative combinatorics, that the convergence of uniform triangulations to the UIPT is robust under certain perturbations. One example of such a perturbation is to consider random maps with prescribed face degrees where almost all faces are triangles.

Rotational invariance in planar FK-percolation

Ioan Manolescu (Université de Fribourg)

4
We prove the asymptotic rotational invariance of the critical FK-percolation model on the square lattice with any cluster-weight between 1 and 4. These models are expected to exhibit conformally invariant scaling limits that depend on the cluster weight, thus covering a continuum of universality classes. The rotation invariance of the scaling limit is a strong indication of the wider conformal invariance, and may indeed serve as a stepping stone to the latter.
Our result is obtained via a universality theorem for FK-percolation on certain isoradial lattices. This in turn is proved via the star-triangle (or Yang-Baxter) transformation, which may be used to gradually change the square lattice into any of these isoradial lattices, while preserving certain features of the model. It was previously proved that throughout this transformation, the large scale geometry of the model is distorted by at most a limited amount. In the present work we argue that the distortion becomes insignificant as the scale increases. This hinges on the interplay between the inhomogeneity of isoradial models and their embeddings, which compensate each other at large scales.
As a byproduct, we obtain the asymptotic rotational invariance also for models related to FK-percolation, such as the Potts and six-vertex ones. Moreover, the approach described here is fairly generic and may be adapted to other systems which possess a Yang-Baxter transformation. Based on joint work with Hugo Duminil-Copin, Karol Kajetan Kozlowski, Dmitry Krachun and Mendes Oulamara.

Brownian half-plane excursions, CLE_4 and critical Liouville quantum gravity

Ellen Powell (Durham University)

8
I will discuss a coupling between a Brownian excursion in the upper half plane and an exploration of nested CLE_4 loops in the unit disk. In this coupling, the CLE_4 is drawn on top of an independent “critical Liouville quantum gravity surface” known as a quantum disk. It turns out that there is a correspondence between loops in the CLE and (sub) half-planar excursions above heights in the Brownian excursion, where the “width” of the sub-excursion corresponds to the “quantum boundary length” of the loop and the height encodes a certain “quantum distance” from the boundary.

This is based on a forthcoming joint work with Juhan Aru, Nina Holden and Xin Sun, and describes the analogue of Duplantier-Miller-Sheffield’s “mating-of-trees correspondence” in the critical regime.

Q&A for Invited Session 24

0
This talk does not have an abstract.

Session Chair

Nina Holden (Swiss Federal Institute of Technology Zürich)

Organized 14

Multivariate and Object-Oriented Data Analysis (Organizer: Cheolwoo Park)

Conference
10:30 PM — 11:00 PM KST
Local
Jul 19 Mon, 9:30 AM — 10:00 AM EDT

Bayesian spatial binary regression for label fusion in structural neuroimaging

Andrew Brown (Clemson University)

2
Alzheimer's disease is a neurodegenerative condition that accelerates cognitive decline relative to normal aging. It is of critical scientific importance to gain a better understanding of early disease mechanisms in the brain to facilitate effective, targeted therapies. The volume of the hippocampus can be used as an aid to diagnosis and disease monitoring. Measuring this volume via neuroimaging is difficult since each hippocampus must either be manually identified or automatically delineated, a task referred to as segmentation. Automatic hippocampal segmentation often involves mapping a previously manually segmented image to a new brain image and propagating the labels to obtain an estimate of where each hippocampus is located in the new image. A more recent approach to this problem is to propagate labels from multiple manually segmented atlases and combine the results using a process known as label fusion. To date, most label fusion algorithms either employ voting procedures or impose prior structure and subsequently find the maximum a posteriori estimator through optimization. We propose using a fully Bayesian spatial regression model for label fusion that facilitates direct incorporation of covariate information while making accessible the entire posterior distribution. Our results suggest that incorporating tissue classification (gray matter, white matter, etc.) into the label fusion procedure can greatly improve segmentation when relatively homogeneous, healthy brains are used as atlases for diseased brains. The fully Bayesian approach also allows quantification of the associated uncertainty, information which we show can be leveraged to detect significant differences between healthy and diseased populations that would otherwise be missed.

Convex clustering analysis for histogram-valued data

Cheolwoo Park (Korea Advanced Institute of Science and Technology (KAIST))

5
In recent years, there has been increased interest in symbolic data analysis, including for exploratory analysis, supervised and unsupervised learning, time series analysis, etc. Traditional statistical approaches that are designed to analyze single-valued data are not suitable because they cannot incorporate the additional information on data structure available in symbolic data, and thus new techniques have been proposed for symbolic data to bridge this gap. In this article, we develop a regularized convex clustering approach for grouping histogram-valued data. The convex clustering is a relaxation of hierarchical clustering methods, where prototypes are grouped by having exactly the same value in each group via penalization of parameters. We apply two different distance metrics to measure (dis)similarity between histograms. Various numerical examples confirm that the proposed method shows better performance than other competitors.

A geometric mean for multivariate functional data

Juhyun Park (ENSIIE)

6
The analysis of curves has been routinely dealt with using tools from functional data analysis. However its extension to multi-dimensional curves poses a new challenge due to its inherent geometric features that are difficult to capture with the classical approaches that rely on linear approximations. We propose an alternative notion of mean that reflects shape variation of the curves. Based on a geometric representation of the curves through the Frenet-Serret ordinary differential equations, we introduce a new definition of mean curvature and mean shape through the mean ordinary differential equation. We formulate the estimation problem in a penalized regression and develop an efficient algorithm. We demonstrate our approach with both simulated data and a real data example.

A confidence region for the elastic shape mean of planar curves

Justin Strait (University of Georgia)

3
Visualization is an integral component of statistical shape analysis, where the goal is to perform inference on shapes of objects. When interested in identifying shape variation, one typically performs principal component analysis (PCA) to decompose total variation into orthogonal directions of variation. In many cases, shapes observe multiple sources of variation; using PCA to visualize requires decomposition into several plots displaying each mode of variation, without the ability to understand how these components work together. In this talk, I will discuss a constructive confidence region associated with the elastic shape mean, with a significant emphasis on producing a succinct visual summary of this region. The use of elastic shape representations allows for optimal matching of shape features, yielding more appropriate estimation of shape variation compared to other approaches within the shape analysis literature. The proposed region is demonstrated on simulated data, as well as common shapes from the MPEG-7 dataset (popular in computer vision applications).

Q&A for Organized Contributed Session 14

0
This talk does not have an abstract.

Session Chair

Cheolwoo Park (Korea Advanced Institute of Science and Technology (KAIST))

Organized 26

Recent Advances in Network Learning: Theory and Practice (Organizer: Kyoungjae Lee)

Conference
10:30 PM — 11:00 PM KST
Local
Jul 19 Mon, 9:30 AM — 10:00 AM EDT

Scalable Bayesian high-dimensional local dependence learning

Kyoungjae Lee (Inha University)

6
In this work, we propose a scalable Bayesian procedure for learning the local dependence structure in a high-dimensional model where the variables possess a natural ordering. The ordering of variables can be indexed by time, the vicinities of spatial locations, and so on, with the natural assumption that variables far apart tend to have weak correlations. Applications of such models abound in a variety of fields such as finance, genome associations analysis and spatial modeling. We adopt a flexible framework under which each variable is dependent on its neighbors or predecessors, and the neighborhood size can vary for each variable. It is of great interest to reveal this local dependence structure by estimating the covariance or precision matrix while yielding a consistent estimate of the varying neighborhood size for each variable. The existing literature on banded covariance matrix estimation, which assumes a fixed bandwidth cannot be adapted for this general setup. We employ the modified Cholesky decomposition for the precision matrix and design a flexible prior for this model through appropriate priors on the neighborhood sizes and Cholesky factors. The posterior contraction rates of the Cholesky factor are derived which are nearly or exactly minimax optimal, and our procedure leads to consistent estimates of the neighborhood size for all the variables. Another appealing feature of our procedure is its scalability to models with large numbers of variables due to efficient posterior inference without resorting to MCMC algorithms. Numerical comparisons are carried out with competitive methods, and applications are considered for some real datasets.

Fast and flexible estimation of effective migration surfaces

Wooseok Ha (University of California at Berkeley)

6
An important feature in spatial population genetic data is often “isolation-by-distance,” where genetic differentiation tends to increase as individuals become more geographically distant. Recently, Petkova et al. (2016) developed a statistical method called Estimating Effective Migration Surfaces (EEMS) for visualizing spatially heterogeneous isolation-by-distance on a geographic map. While EEMS is a powerful tool for depicting spatial population structure, it can suffer from slow runtimes. Here we develop a related method called Fast Estimation of Effective Migration Surfaces (FEEMS). FEEMS uses a Gaussian Markov Random Field in a penalized likelihood framework that allows for efficient optimization and output of effective migration surfaces. Further, the efficient optimization facilitates the inference of migration parameters per edge in the graph, rather than per node (as in EEMS). When tested with coalescent simulations, FEEMS accurately recovers effective migration surfaces with complex gene-flow histories, including those with anisotropy. Applications of FEEMS to population genetic data from North American gray wolves show it to perform comparably to EEMS, but with solutions obtained orders of magnitude faster. Overall, FEEMS expands the ability of users to quickly visualize and interpret spatial structure in their data.

Statistical inference for cluster trees

Jisu Kim (Inria)

8
A cluster tree provides a highly-interpretable summary of a density function by representing the hierarchy of its high-density clusters. It is estimated using the empirical tree, which is the cluster tree constructed from a density estimator. This talk addresses how to quantify the uncertainty by assessing the statistical significance of topological features of an empirical cluster tree. To do this, I propose methods to construct and summarize confidence sets for the unknown true cluster tree. And I introduce how to prune some of the statistically insignificant features of the empirical tree, yielding interpretable and parsimonious cluster trees. Finally, I illustrate the proposed methods on a variety of examples data set.

Autologistic network model on binary data for disease progression study

Yei Eun Shin (National Cancer Institute)

7
We propose an autologistic network model on binary spatiotemporal data to study the spreading patterns of disease. The proposed model identifies an underlying network, without the pre-specification of neighborhoods based on proximity, that can have varying effects depending on the previous states. The model parameters are estimated by maximizing the penalized pseudolikelihood with bias-corrected, which can be adapted to the generalized linear model (GLM) framework, where we show the resulting estimators are asymptotically normal. We provide spatial-joint transition probabilities for predicting disease status in the next time interval. Simulation studies were conducted to evaluate the validity and performance of the proposed method. Examples are provided using the amyotrophic lateral sclerosis (ALS) patients’ data from EMPOWER Study.

Q&A for Organized Contributed Session 26

0
This talk does not have an abstract.

Session Chair

Kyoungjae Lee (Inha University)

Contributed 05

Potential Theory in Probability Theory

Conference
10:30 PM — 11:00 PM KST
Local
Jul 19 Mon, 9:30 AM — 10:00 AM EDT

Heat contents for time-changed killed Brownian motions

Hyunchul Park (State University of New York at New Paltz)

11
In this talk, we study various heat content with respect to time-changed killed Brownian motions. The time-change is given by either a large class of subordinators or inverse of the subordinators. When the time-change is given by inverse stable subordinators and the domain is smooth, we show that the spectral heat content has a complete asymptotic expansion which is similar to the case of Brownian motions.

This is a joint work with Kei Kobayashi (Fordham University, USA).

Heat kernel bounds for nonlocal operators with singular kernels

Kyung-Youn Kim (National Chengchi University)

10
We prove sharp two-sided bounds of the fundamental solution for integro-differential operators of order $\alpha$ in (0,2) that generate a d-dimensional Markov process. The corresponding Dirichlet form is comparable to that of d-independent copies of one-dimensional jump processes, i.e., the jumping measure is singular with respect to the d-dimensional Lebesgue measure.

This is joint work with Moritz Kassmann and Takashi Kumagai.

The full characterization of the expected supremum of infinitely divisible processes

Rafal Martynek (University of Warsaw)

5
In this talk I will present the positive answer to the conjecture posed by M. Talagrand in "Regularity of Infinitely Divisible Processes" (1993) concerning two-sided bound of the expected suprema of such processes which does not require any additional assumption on the Levy measure associated with the process. It states that any infinitely divisible process can be decomposed into the part whose size is explained by the chaining method and the other which is the positive process.
The result relies highly on the Bednorz-Latała theorem characterizing suprema of Bernoulli processes and its recent reformulation due to Talagrand together with series representation due to Rosiński.
I will also describe how the method of the proof leads to the positive settlement of two others conjectures of Talagrand. Namely, the Generalized Bernoulli Conjecture concerning selector processes and analogous result for empirical processes. These three results completes an important chapter of Talagrand's program of understanding the suprema of random processes through chaining.
The part of the talk concerning infinitely divisible processes is based on the joint work with W. Bednorz, while the part about selector and empirical processes was developed by M. Talagrand after we communicated him the initial result.

The e-property of asymptotically stable Markov-Feller operators

Hanna Wojewódka-Ściążko (University of Silesia in Katowice)

4
We say that a regular Markov operator $P$, with dual operator $U$, has the e-property in the set $R$ of functions if the family of iterates $(U^nf)_{n\in\mathbb{N}}$ is equicontinuous for all $f\in R$. Most often, $R$ is assumed to be the set of all bounded Lipschitz functions, although it can be also the set of all bounded continuous functions, as in our paper [R. Kukulski and H. Wojewódka-Ściążko, Colloq. Math. 165, 269-283 (2021)]. In [S. Hille et al., Comptes Rendus Math. 355, 1247-1251 (2017)] it is shown that any asymptotically stable Markov-Feller operator with an invariant measure such that the interior of its support is non-empty has the e-property. We generalize this result. To be more precise, we prove that any asymptotically stable Markov-Feller operator has the e-property off a meagre set. Moreover, we propose an equivalent condition for the e-property of asymptotically stable Markov-Feller operators. Namely, we prove that an asymptotically stable Markov-Feller operator has the e-property if and only if it has the e-property at least at one point of the support of its invariant measure. Our results then naturally imply the main theorem of [S. Hille et al., Comptes Rendus Math. 355, 1247-1251 (2017)]. Indeed, if the interior of the support of an invariant measure of a Markov-Feller operator $P$ is non-empty, then there exists at least one point in this support at which $P$ has the e-property. This, in turn, implies that $P$ has the e-property at any point. We also provide the example of an asymptotically stable Markov-Feller operator such that the set of points at which the operator fails the e-property is dense. The example shows that the main result of our paper is tight.

Q&A for Contributed Session 05

0
This talk does not have an abstract.

Session Chair

Panki Kim (Seoul National University)

Contributed 15

Advanced Stochastic Processes

Conference
10:30 PM — 11:00 PM KST
Local
Jul 19 Mon, 9:30 AM — 10:00 AM EDT

A multi-species Ehrenfest process and its diffusion approximation

Serena Spina (University of Salerno)

4
The celebrated Ehrenfest model is a Markov chain proposed to describe the diffusion of gas molecules in a container. Our aim is to generalize this model by considering a multi-type Ehrenfest process on a star graph. The considered model results a continuous-time stochastic process describing the dynamics of an evolutionary system that can accomodate N particles and is characterized by d evolution classes, represented with d semiaxis joined at the origin. The evolution of the stochastic process over each line evolves as a classical Ehrenfest model with suitable linear transition rates, moreover, after visiting the origin, the process can move toward any semiaxis with different rates, depending on the elements of a stochastic matrix. We investigate the dynamics of this process making use of a probability generating function-based approach. This leads to the determination of the transient transition probabilities (in closed form for a particular choice of the parameter), and of the asymptotic distribution, in general. In addition, we obtain some results on the asymptotic mean, variance, coefficient of variation for the process. We also consider a continuous approximation of the process, which leads to an Ornstein-Uhlenbeck diffusion process evolving on a spider-shaped continuous state space formed by d semiaxis of infinite length joined at the origin; the origin of the given domain constitutes the equilibrium point of the system. We determine the expression of the asymptotic probability distribution for each ray of the spider. Finally, we compare the discrete process with the diffusion process in order to show the goodness of the continuous approximation.

Limit theorems for the realised semicovariances of multivariate Brownian semistationary processes

Yuan Li (Imperial College London)

4
In this talk we will introduce the realised semicovariance, which is resulted from the decomposition of the realised covariance matrix into components based on the signs of the returns, and study its in-fill asymptotic properties of multivariate Brownian semistationary (BSS) processes. The realised semicovariance is originally proposed in Bollerslev et al. (2020, Econometrica) where they worked on semimartingale settings. We extend their work to BSS processes, which are not necessarily semimartingales. More precisely, a weak convergence in the space of càdlàg functions endowed with the Skorohod topology for the realised semicovariance of a general Gaussian process with stationary increments is proved first. The methods are based on quantitative Breuer-Major theorems and on moment bound for sums of products of Gaussian vector's functions. Furthermore, we demonstrate the corresponding stable convergence. Finally, a weak law of large numbers and a central limit theorem for the realised semicovariance of multivariate BSS processes are established. These results extend the limit theorems for the realised covariation to a version for the non-linear functionals.

A Yaglom type asymptotic result for subcritical branching Brownian motion with absorption

Jiaqi Liu (University of California, San Diego)

5
In this talk, we will consider a slightly subcritical branching Brownian motion with absorption, where particles move as Brownian motion with drift $-\sqrt{2+2\epsilon}$, undergo dyadic fission at rate 1, and are killed upon hitting the origin. We are interested in the asymptotic behaviors of the process conditioned on survival up to a large time t as the process approaches criticality. Results like this are called Yaglom type results. Specifically, we will talk about the long run expected number of particles conditioned on survival as the process approaches to being critical.

Q&A for Contributed Session 15

0
This talk does not have an abstract.

Session Chair

Jaehun Lee (Korea Institute for Advanced Study)

Contributed 22

Bayesian Inference

Conference
10:30 PM — 11:00 PM KST
Local
Jul 19 Mon, 9:30 AM — 10:00 AM EDT

Bayesian and stochastic modeling of polysomnography data from children using pacifiers for improved estimation of the apnea-hypopnea index

Sujay Datta (University of Akron)

2
Polysomnography is an overnight systematic procedure to collect physiological parameters during sleep. It is considered as gold-standard for diagnosing sleep-related disorders. It takes several days to score and interpret the raw data from this study and confirm a diagnosis of, say, Obstructive Sleep Apnea (OSA) — a potentially dangerous disorder. The presence of artifacts (anomalies created by a malfunctioning sensor) makes scoring even more difficult, potentially resulting in misdiagnosis. It is common to see airflow signal artifacts in infants that use a pacifier during sleep. The act of sucking on the pacifier causes artifacts in the oro-nasal sensor (thermistor) used to monitor airflow during respiration. The resulting inaccurate scoring leads to an under-estimation of the Apnea Hypopnea Index (AHI) — the basis for a formal OSA diagnosis. So researchers are now exploring two other information sources (blood oxygen saturation readings from a pulse-oximeter and occurrence of arousal events) to supplement the artifact-corrupted thermistor data. They first look for statistical association between the thermistor and the pulse-oximeter/arousal data and then statistically predict a modified AHI score using the latter whenever the former is corrupt. To our knowledge, no attempt of statistically modeling these three data-sources to bring out their association currently exists. This project aims at developing several competing probabilistic models for these data-types and then checking how strongly they bring out the association by applying them on archived data from the Akron Children’s Hospital. These modeling approaches are a significant statistical contribution to this important medical problem. After performing some statistical tests for association, the modeling approaches will include naïve Bayes, Beta-Binomial, correlated homogeneous Poisson processes and double-chain Markov models. The resulting improvement in AHI estimates is demonstrated using data-sets from a sample of non-pacifier users after artificially discarding part of their thermistor data (as if they were artifact-corrupted).

Asymmetric prior in wavelet shrinkage

Alex Rodrigo dos Santos Sousa (University of São Paulo)

5
In bayesian wavelet shrinkage, the already proposed priors to wavelet coefficients are assumed to be symmetric around zero. Although this assumption is reasonable in many applications, it is not general. The present paper proposes the use of an asymmetric shrinkage rule based on the discrete mixture of a point mass function at zero and an asymmetric beta distribution as prior to the wavelet coefficients in a non-parametric regression model. Statistical properties such as bias, variance, classical and bayesian risks of the associated asymmetric rule are provided and performances of the proposed rule are obtained in simulation studies involving artificial asymmetric distributed coefficients and the Donoho-Johnstone test functions. Application in a seismic real dataset is also analyzed. In general, the asymmetric shrinkage rule outperformed classical symmetric rules both in simulation and real data application.

Semiparametric Bayesian regression analysis of multi-typed matrix-variate responses

Inkoo Lee (Rice University)

4
Complex data such as tensor and multiple types of responses can be found in dental medicine. Dental hygienists measure triple biomarkers at 28 teeth and 6 tooth-sites for each participant. These data have challenging characteristics: 1) binary and continuous responses with skewness, 2) matrix-variate responses for each biomarker have heavy tails, 3) pattern for missing teeth is not random. To circumvent these difficulties, we propose a joint model of multiple types of matrix-variate responses via latent variables. The model accommodates skewness in continuous responses. This statistical framework incorporates exponential factor copula models to capture heavy-tail dependence and asymmetry. Since the number of existing teeth presents the magnitude of periodontal disease (PD) we model the missing mechanism. Our method also guarantees posterior consistency under suitable priors. We illustrate the substantial advantages of our method over alternatives through simulation studies and the analysis of PD data.

Bayesian phylogenetic inference of stochastic block models on infinite trees

Wenjian Liu (Queensborough Community College, City University of New York)

2
This talk involves a classification problem on a deep network, by considering a broadcasting process on an infinite communication tree, where information is transmitted from the root of the tree to all the vertices with certain probability of error. The information reconstruction problem on an infinite tree, is to collect and analyze massive data samples at the nth level of the tree to identify whether there is non-vanishing information of the root, as n goes to infinity. Its connection to the clustering problem in the setting of the stochastic block model, which has wide applications in machine learning and data mining, has been well established. For the stochastic block mode, an "information theoretically solvable but computationally hard" region, or say "hybrid-hard phase", appears whenever the reconstruction bound is not tight of the corresponding reconstruction on the tree problem. Inspired by the recently proposed $q_1+q_2$ stochastic block model, we try to extend the classical works on the Ising model and the Potts model, by studying a general model which incorporates the characteristics of both Ising and Potts through different in-community and out-community transition probabilities, and rigorously establishing the exact conditions for reconstruction.

Order-restricted Bayesian inference for the simple step-stress accelerated life tests

David Han (The University of Texas at San Antonio)

3
In this work, we investigate the order-restricted Bayesian estimation for a simple step-stress accelerated life tests. Based on the three-parameter gamma distribution as a conditional prior, we ensure that the failure rates increase as the stress level increases. In addition, its conjugate-like structure enables us to derive the exact joint posterior distribution of the parameters without a need to perform an expensive MCMC sampling. Upon these distributional results, several Bayesian estimators for the model parameters are suggested along with their individual/joint credible intervals. Through Monte Carlo simulations, the performance of our proposed inferential methods are assessed and compared. Finally, a real engineering case study for analyzing the reliability of a solar lighting device is presented to illustrate the methods developed in this work.

Q&A for Contributed Session 22

0
This talk does not have an abstract.

Session Chair

Seongil Jo (Inha University)

Contributed 33

Novel Statistical Approaches In Genetic Association Analyses

Conference
10:30 PM — 11:00 PM KST
Local
Jul 19 Mon, 9:30 AM — 10:00 AM EDT

An extended model for phylogenetic maximum likelihood based on discrete morphological characters

David Spade (University of Wisconsin-Milwaukee)

5
Maximum likelihood is a common method of estimating a phylogenetic tree based on a set of genetic data. However, models of evolution for certain types of genetic data are highly flawed in their specification, and this misspecification can have an adverse impact on phylogenetic inference. Our attention here is focused on extending an existing class of models for estimating phylogenetic trees from discrete morphological characters. The main advance of this work is a model that allows unequal equilibrium frequencies in the estimation of phylogenetic trees from discrete morphological character data using likelihood methods. Possible extensions of the proposed model will also be discussed.

Combined linkage and association mapping integrating population-based and family-based designs using multinomial regression

Saurabh Ghosh (Indian Statistical Institute)

4
Genetic association analyses yield higher powers compared to linkage analyses in identifying chromosomal regions harboring susceptibility genes modulating complex human disorders and correlated quantitative phenotypes. However, while population-based association designs suffer from the problem of population stratification that often results in inflated type I errors, linkage designs are based on families and are protected against such inflations. The models suggested in Multiphen (O’Reilly et al., 2012) and BAMP (Majumdar et al. 2015) provide an alternative to study population-based genotype-phenotype association by exploring the dependence of genotype on phenotype instead of the naturally arising dependence of phenotype on genotype. This reversal of the regression model, while has no impact on the inference on association, provides the flexibility of incorporating multiple phenotypes without the requirement of making any a priori assumptions on the correlation structure of the vector of phenotypes. Our aim is to investigate whether family based data can be included in addition to population level data in the framework of the BAMP (Binomial regression-based Association of Multivariate Phenotypes) model so as to develop a combined test for genetic linkage and association. The family-based regression model involves the conditional distribution of identity-by-state (i.b.s.) scores on the squared sib-pair phenotype differences. However, since the marginal distribution of i.b.s. counts do not follow a Binomial distribution, we propose a Trinomial Regression model for the linkage component of our combined test. Given that the marginal distributions of the response variables in the population-based and family-based designs are different, the combined test is constructed jointly on the estimated regression parameters corresponding to the two designs. The likelihood ratio test statistic asymptotically follows a mixture of two chi-squares distributions with one and two degrees of freedom respectively under the null. We carry out extensive simulations to evaluate the power of the proposed combined test.

An alternative to intersection-union test for the composite null hypothesis used to identify shared genetic risk of disease outcomes

Debashree Ray (Johns Hopkins University)

4
With a growing number of disease- and trait-associated genetic variants detected and replicated across genome-wide association studies (GWAS), scientists are increasingly noting the influence of individual variants on multiple seemingly unrelated traits — a phenomenon known as pleiotropy. Cross-phenotype association tests, applied on two or more traits, usually test the null hypothesis of no association of a variant with any trait. Rejection of this null can be due to association between the variant and a single trait, with no indication if the variant influences >1 trait. This problem can be formulated as a composite null hypothesis test for each variant. For two traits, a level-$\alpha$ two-parameter intersection-union test (IUT) can be used. However, for testing millions of variants at genome-wide significance threshold ($\alpha=5\times10^{-8}$), IUT is extremely conservative. In this talk, I will discuss a new statistical approach, PLACO— pleiotropic analysis under composite null hypothesis— to discover variants influencing risk of two traits using GWAS summary statistics (i.e., using estimated effect size, its standard error and p-value for each variant). PLACO uses the product of Z-statistics across two traits as test statistic for pleiotropy, the null distribution of which is derived in the form of a mixture distribution that allows for fractions of variants to be associated with none or only one of the traits. PLACO gives an approximate asymptotic p-value for association with both traits, avoiding estimation of nuisance parameters related to mixture proportions and variance components. Simulation studies demonstrate its well-controlled type I error, and massive power gain over IUT and alternative ad hoc methods typically used for testing pleiotropy. Finally, I will show application of PLACO to type 2 diabetes and prostate cancer genetics to explain their inverse association reported in many previous epidemiologic studies.

Efficient SNP-based heritability estimation using Gaussian predictive process in large-scale cohort studies

Saonli Basu (University of Minnesota)

4
For decades, linear mixed models (LMM) have been widely used to estimate heritability in twin and family studies. Recently, with the advent of high throughput genetic data, there have been attempts to estimate heritability from genome-wide SNP data on a cohort of distantly related individuals. Fitting such an LMM in large-scale cohort studies, however, is tremendously challenging due to high dimensional linear algebraic operations. In this paper, we simplify the LMM by unifying the concept of Genetic Coalescence and Gaussian Predictive Process, and thereby greatly alleviating the computational burden. Our proposed approach PredLMM has much better computational complexity than most of the existing packages and thus, provides an efficient alternative for estimating heritability in large-scale cohort studies. We illustrate our approach with extensive simulation studies and use it to estimate the heritability of multiple quantitative traits from the UK Biobank cohort.

This is joint work with Souvik Seal, Colorado School of Public Health, and Abhirup Datta, Johns Hopkins University

Data-adaptive groupwise test for genomic studies via the Yanai's generalized coefficient of determination

Masao Ueki (Nagasaki University)

4
In genomic studies, repeated univariate regression for each variable is utilized to screen useful variables.
However, signals jointly detectable with other variables may be overlooked. Group-wise analysis for a pre-defined group is often developed, but the power will be limited if the knowledge is insufficient. A flexible data-adaptive test procedure is thus proposed for conditional mean applicable to a variety of model sequences that bridge between low and high complexity models as in penalized regression. The test is based on the model that maximizes a generalization of the Yanai's generalized coefficient of determination by exploiting the tendency for the dimensionality to be large under the null hypothesis. The test does not require complicated null distribution computation, thereby enabling large-scale testing application. Numerical studies demonstrated that the proposed test applied to the lasso and elastic net had a high power regardless of the simulation scenarios. Applied to a group-wise analysis in real genome-wide association study data from Alzheimer's Disease Neuroimaging Initiative, the proposal gave a higher association signal than the existing methods.

Q&A for Contributed Session 33

0
This talk does not have an abstract.

Session Chair

Saurabh Ghosh (Indian Statistical Institute)

Contributed 36

Statistical Inference

Conference
10:30 PM — 11:00 PM KST
Local
Jul 19 Mon, 9:30 AM — 10:00 AM EDT

Density deconvolution with non-standard error distributions: rates of convergence and adaptive estimation

Taeho Kim (University of Haifa)

3
It is a standard assumption in the density deconvolution problem that the characteristic function of the measurement error distribution is non-zero on the real line. While this condition is assumed in the majority of existing works on the topic, there are many problem instances of interest where it is violated. In this paper, we focus on non-standard settings where the characteristic function of the measurement errors has zeros, and study how zeros multiplicity affects the estimation accuracy. For a prototypical problem of this type, we demonstrate that the best achievable estimation accuracy is determined by the multiplicity of zeros, the rate of decay of the error characteristic function, as well as by the smoothness and the tail behavior of the estimated density. We derive lower bounds on the minimax risk and develop optimal in the minimax sense estimators. In addition, we consider the problem of adaptive estimation and propose a data-driven estimator that automatically adapts to unknown smoothness and tail behavior of the density to be estimated.

Moments of the doubly truncated selection elliptical distributions: recurrence, existence and applications

Christian Galarza Morales (Escuela Superior Politécnica del Litoral)

3
We compute doubly truncated moments for the selection elliptical (SE) class of distributions, which includes some multivariate asymmetric versions of well-known elliptical distributions, such as, the normal, Student’s t, among others. We address the moments for doubly truncated members of this family, establishing neat formulation for high order moments as well as for its first two moments. We establish sufficient and necessary conditions for their existence. Further, we propose computational efficient methods to deal with extreme settings of the parameters, partitions with almost zero volume or no truncation. Applications and simulation studies are presented in order to illustrate the usefulness of the proposed methods.

Characterization of probability distributions by a generalized notion of sufficiency and Fisher information

Atin Gayen (Indian Institute of Technology Palakkad)

4
The notion of sufficiency introduced by Fisher is based on the usual likelihood function. This is useful particularly when the underlying model is exponential. We propose a generalized notion of principle of sufficiency based on two generalized likelihood functions, namely Basu et al. and Jones et al. likelihood functions that arise in robust inference. We find the specific form of the family of probability distributions that have a fixed number of sufficient statistics (independent of sample size) with respect to these likelihood functions. These distributions are of power-law form and are a generalization of the exponential family. Student distributions are a special case of this family. We also extend the concept of minimal sufficiency with respect to this generalized notion and find a minimal sufficient statistic for Student distributions. We observe that the generalized estimators of parameters of Student distributions are functions of the minimal sufficient statistics derived from this generalized notion. We finally show that these estimators are also efficient in the sense that variance of each of these estimators equals the variance given by the asymptotic normality result.

Q&A for Contributed Session 36

0
This talk does not have an abstract.

Session Chair

Mijeong Kim (Ewha Womans University)

Made with in Toronto · Privacy Policy · © 2021 Duetone Corp.