10th World Congress in Probability and Statistics

Plenary Wed-1

Laplace Lecture (Tony Cai)

Conference
9:00 AM — 10:00 AM KST
Local
Jul 20 Tue, 5:00 PM — 6:00 PM PDT

Transfer Learning: Optimality and adaptive algorithms

Tony Cai (University of Pennsylvania)

16
Human learners have the natural ability to use knowledge gained in one setting for learning in a different but related setting. This ability to transfer knowledge from one task to another is essential for effective learning. In this talk, we consider statistical transfer learning in various settings with a focus on nonparametric classification based on observations from different distributions under the posterior drift model, which is a general framework and arises in many practical problems. We first establish the minimax rate of convergence and construct a rate-optimal weighted K-NN classifier. The results characterize precisely the contribution of the observations from the source distribution to the classification task under the target distribution. A data-driven adaptive classifier is then proposed and is shown to simultaneously attain within a logarithmic factor of the optimal rate over a large collection of parameter spaces.

Session Chair

Runze Li (Pennsylvania State University)

Plenary Wed-2

Public Lecture (Young-Han Kim)

Conference
10:00 AM — 11:00 AM KST
Local
Jul 20 Tue, 6:00 PM — 7:00 PM PDT

Structure and Randomness in Data

Young-Han Kim (University of California at San Diego and Gauss Labs Inc.)

4
In many engineering applications ranging from communications and networking to compression and storage to artificial intelligence and machine learning, the main goal is to reveal, exploit, or even design structure in apparently random data. This talk illustrates the art and science of such information processing techniques through a variety of examples, with a special focus on data storage systems from memory chips to cloud storage platforms.
Keywords: Information theory, noise, manufacturing, computer vision, distributed computing, probability laws.

Session Chair

Joong-Ho Won (Seoul National University)

Invited 05

Recent Advances in Shape Constrained Inference (Organizer: Bodhisattva Sen)

Conference
11:30 AM — 12:00 PM KST
Local
Jul 20 Tue, 7:30 PM — 8:00 PM PDT

Global rates of convergence in mixture density estimation

Arlene Kyoung Hee Kim (Korea University)

6
In this talk, we consider estimating a monotone decreasing density f_0 represented by a scale mixture of uniform densities. We first derive a general bound on the hellinger accuracy of the MLE over convex classes. Using this bound with an entropy calculation, we provide a different proof for the convergence of the MLE for d=1. Then we consider a possible multidimensional extension. We can prove, for d ≥ 2, that the rate is as conjectured by Pavlides and Wellner under the assumption that the density is bounded from above and below and supported on a compact region. We are exploring strategies for weakening the assumptions.

Convex regression in multidimensions

Adityanand Guntuboyina (University of California Berkeley)

8
I will present results on the rates of convergence of the least squares estimator for multidimensional convex regression with polytopal domains. Our results imply that the least squares estimator is minimax suboptimal when the dimension exceeds 5.

This is joint work with Gil Kur, Frank Fuchang Gao and Bodhisattva Sen.

Multiple isotonic regression: limit distribution theory and confidence intervals

Qiyang Han (Rutgers University)

8
In the first part of the talk, we study limit distributions for the tuning-free max-min block estimators in multiple isotonic regression under both fixed lattice design and random design settings. We show that at a fixed interior point in the design space, the estimation error of the max-min block estimator converges in distribution to a non-Gaussian limit at certain rate depending on the number of vanishing derivatives and certain effective dimension and sample size that drive the asymptotic theory. The limiting distribution can be viewed as a generalization of the well-known Chernoff distribution in univariate problems. The convergence rate is optimal in a local asymptotic minimax sense.

In the second part of the talk, we demonstrate how to use this limiting distribution to construct tuning-free pointwise nonparametric confidence intervals in this model, despite the existence of an infinite-dimensional nuisance parameter in the limit distribution that involves multiple unknown partial derivatives of the true regression function. We show that this difficult nuisance parameter can be effectively eliminated by taking advantage of information beyond point estimates in the block max-min and min-max estimators through random weighting. Notably, the construction of the confidence intervals, even new in the univariate setting, requires no more efforts than performing an isotonic regression for once using the block max-min and min-max estimators, and can be easily adapted to other common monotone models.

This talk is based on joint work with Hang Deng and Cun-Hui Zhang.

Q&A for Invited Session 05

0
This talk does not have an abstract.

Session Chair

Bodhisattva Sen (Columbia University)

Invited 06

Optimization in Statistical Learning (Organizer: Garvesh Raskutti)

Conference
11:30 AM — 12:00 PM KST
Local
Jul 20 Tue, 7:30 PM — 8:00 PM PDT

Statistical inference on latent network growth processes using the PAPER model

Min Xu (Rutgers University)

4
We introduce the PAPER (Preferential Attachment Plus Erods--Renyi) model for random networks, in which we let a random network G be the union of a preferential attachment (PA) tree T and additional Erdos--Renyi (ER) random edges. The PA tree component captures fact that real world networks often have an underlying growth/recruitment process where vertices and edges are added sequentially and the ER component can be regarded as random noise. Given only a single snapshot of the final network G, we study the problem of constructing confidence sets for the root node of the unobserved growth process, which can be patient-zero in a disease infection network or the source of fake news in a social media network. We propose inference algorithm based on Gibbs sampling that scales to networks of millions of nodes and provide theoretical analysis showing that the expected size of the confidence set is small so long as the noise level of the ER edges is not too large. We also propose variations of the model in which multiple growth processes occur simultaneously, reflecting the growth of multiple communities, and we use these models to derive a new approach community detection.

Adversarial classification, optimal transport, and geometric flows

Nicolas Garcia Trillos (University of Wisconsin-Madison)

4
The purpose of this talk is to provide an explicit link between the three topics that form the talk's title, and to introduce a new perspective (more dynamic and geometric) to understand robust classification problems. For concreteness, we will discuss a version of adversarial classification where an adversary is empowered to corrupt data inputs up to some distance \epsilon. We will first describe necessary conditions associated with the optimal classifier subject to such an adversary. Then, using the necessary conditions we derive a geometric evolution equation which can be used to track the change in classification boundaries as \veps varies. This evolution equation may be described as an uncoupled system of differential equations in one dimension, or as a mean curvature type equation in higher dimension. In one dimension we rigorously prove that one can use the initial value problem starting from \veps=0, which is simply the Bayes classifier, to solve for the global minimizer of the adversarial problem. Global optimality is certified using a duality principle between the original adversarial problem and an optimal transport problem. Several open questions and directions for further research will be discussed.

Capturing network effect via fused lasso penalty with application on shared-bike data

Yunjin Choi (University of Seoul)

5
Given a dataset with network structures, one of the common research interests is to model nodal features accounting for network effects. In this study, we investigate shared-bike data in Seoul, under a spatial network framework focusing on the rental counts of each station. Our proposed method models rental counts via a generalized linear model with regularizations. The regularization is made via fused lasso penalty which is devised to capture network effect. In this model, parameters are posed in a station-specific manner. The fused lasso penalty terms are applied on the parameters associated with locationally nearby stations. This approach facilitates parameters corresponding to neighboring stations to have the same value and account for underlying network effect in a data-adaptive way. The proposed method shows promising results.

Q&A for Invited Session 06

0
This talk does not have an abstract.

Session Chair

Garvesh Raskutti (University of Wisconsin-Madison)

Organized 09

Random Matrices and Infinite Particle Systems (Organizer: Hirofumi Osada)

Conference
11:30 AM — 12:00 PM KST
Local
Jul 20 Tue, 7:30 PM — 8:00 PM PDT

Dynamical universality for random matrices

Hirofumi Osada (Kyushu University)

5
We establish an invariance principle corresponding to the universality of random matrices. More precisely, we prove dynamical universality of random matrices in the sense that, if random point fields $ \mu ^N $ of $ N $-particle systems describing eigenvalues of random matrices or log-gases with general self-interaction potentials $ V^N $ converge to some random point field $ \mu $, then the associated natural $ \mu ^N $-reversible diffusion processes represented by solutions of stochastic differential equations (SDE) converge to some $ \mu $-reversible diffusion processes given by a solution of the infinite-dimensional stochastic differential equations (ISDE). Our results are general theorems and can be applied to various random point fields related to random matrices such as sine, Airy, Bessel, and Ginibre random point fields. The representations of finite-dimensional SDEs describing the $ N $-particle systems are very complicated in general. The limit ISDE has simple and universal representations, nevertheless, according to a class of random matrices such as bulk, soft-edge, and hard-edge scaling. We thus prove ISDE such that the infinite-dimensional Dyson model and Airy, Bessel, and Ginibre interacting Brownian motions are universal dynamical objects. The key ingredients are (1) Local uniform convergence of correlation functions to that of the limit point process. (2) The uniqueness of a weak solution of the limit ISDE, which deduces the uniqueness of Dirichlet forms. Concerning (2), we use the result in [1] and [2].

[1] Hirofumi Osada, Hideki Tanemura, Infinite-dimensional stochastic differential equations and tail $\sigma$-fields, Probability Theory and Related Fields 177, 1137-1242 (2020).
[2] Yosuke Kawamoto, Hirofumi Osada, Hideki Tanemura, Uniqueness of Dirichlet forms related to infinite systems of interacting Brownian motions, (online) Potential Anal.

Signal processing via the stochastic geometry of spectrogram level sets

Subhroshekhar Ghosh (National University of Singapore)

5
Spectrograms are fundamental tools in the detection, estimation and analysis of signals in the time-frequency analysis paradigm. The spectrogram of a signal (usually corrupted with noise) is the squared magnitude of its short time Fourier transform (STFT), which in turn is a generalised version of the classical Fourier transform, augmented with a window in the time domain. Signal analysis via spectrograms has traditionally explored their peaks, i.e. their maxima, complemented by a recent interest in their zeros or minima. In particular, recent investigations have demonstrated connections between Gabor spectrograms of Gaussian white noise and Gaussian analytic functions (abbrv. GAFs) in different geometries. However, the zero sets (or the maxima or minima) of GAFs have a complicated stochastic structure, which makes a direct theoretical analysis of usual spectrogram based techniques via GAFs a difficult proposition. These techniques, in turn, largely rely on statistical observables from the analysis of spatial data, whose distributional properties for spectrogram extrema are mostly understood only at an empirical level. In this work, we investigate spectrogram analysis via the stochastic, geometric and analytical properties of their level sets. We obtain theorems demonstrating the efficacy of a spectrogram level sets based approach to the detection and estimation of signals, framed in a concrete inferential set-up. Exploiting these ideas as theoretical underpinnings, we propose a level sets based algorithm for signal analysis that is intrinsic to given spectrogram data. We substantiate the effectiveness of the algorithm by exten sive empirical studies. Our results also have theoretical implications for spectrogram zero based approaches to signal analysis.

Based on joint work with Meixia Lin and Dongfang Sun.

Logarithmic derivatives and local densities of point processes arising from random matrices

Shota Osada (Kyushu University)

5
We talk about a distribution (a generalized function) theory for point processes. We show that a logarithmic derivative in the distributional sense can indicate the local density of the point process. This theory is especially effective for point processes appearing in random matrix theory. In particular, using this result, we solve infinite-dimensional stochastic differential equations associated with the point process given by de Branges spaces, so-called integrable kernels, and random matrices such as Airy, sine, and Bessel point processes. [2] Conventionally, the point process that describes an infinite particle system is described by the Dobrushin-Lanford-Ruelle (DLR) equation. The point process of an infinite particle system appearing in a random matrix has a logarithmic potential as an interaction potential. Because the logarithmic potential is not integrable at infinity, the DLR equation cannot describe the point process as it is. Logarithmic derivative for point process is a concept introduced in [1] to settle this problem. There must be a logarithmic derivative and local density of the point process to solve the infinite-dimensional stochastic differential equation. [3] With our result, the existence of a logarithmic derivative with suitable integrability is sufficient for the construction of the stochastic dynamics as a solution of infinite-dimensional stochastic differential equations.

[1] Hirofumi Osada, Infinite-dimensional stochastic differential equations related to random matrices, Probability theory and related fields, 2012, 153(3-4), 471--509.
[2] Alexander I Bufetov, Andrey V Dymov, Hirofumi Osada, The logarithmic derivative for point processes with equivalent Palm measures, J. Math. Soc. Japan, 71(2), 2019, 451--469.
[3] Hirofumi Osada, Hideki Tanemura, Infinite-dimensional stochastic differential equations and tail $\sigma$-fields, Probability Theory and Related Fields 177, 1137-1242 (2020).

Stochastic differential equations for infinite particle systems of jump type with long range interactions

Hideki Tanemura (Keio university)

5
Infinite dimensional stochastic differential equations (ISDEs) describing systems with an infinite number of particles are considered. Each particle undergoes Levy process, and interaction between particles is given by long range interaction potential, which is not only of Ruelle's class but also logarithmic. We discuss the existence and uniqueness of strong solutions of the ISDEs.

This talk is based on a collaboration with Shota Esaki (Fukuoka University).


Q&A for Organized Contributed Session 09

0
This talk does not have an abstract.

Session Chair

Hirofumi Osada (Kyushu University)

Organized 18

Advanced Learning Methods for Complex Data Analysis (Organizer: Xinlei Wang)

Conference
11:30 AM — 12:00 PM KST
Local
Jul 20 Tue, 7:30 PM — 8:00 PM PDT

Peel learning for pathway-related outcome prediction

Rui Feng (University of Pennsylvania)

6
Traditional regression models are limited in outcome prediction due to their parametric nature. Current deep learning methods allow for various effects and interactions and have shown improved performance, but they typically need to be trained on a large amount of data to obtain reliable results. Gene expression studies often have small sample sizes but high dimensional correlated predictors so that traditional deep learning methods are not readily applicable. In this talk, I present peel learning, a novel neural network that incorporates the prior relationship among genes. In each layer of learning, overall structure is peeled into multiple local substructures. Within the substructure, dependency among variables is reduced through linear projections. The overall structure is gradually simplified over layers and weight parameters are optimized through a revised backpropagation. We applied PL to a small lung transplantation study to predict recipients’ postsurgery primary graft dysfunction using donors’ gene expressions within several immunology pathways, where PL showed improved prediction accuracy compared to conventional penalized regression, classification trees, feed-forward neural network, and a neural network assuming prior network structure. Through simulation studies, we also demonstrated the advantage of adding specific structure among predictor variables in neural network, over no or uniform group structure, which is more favorable in smaller studies. The empirical evidence is consistent with our theoretical proof of improved upper bound of PL’s complexity over ordinary neural networks.

Principal boundary for data on manifolds

Zhigang Yao (National University of Singapore)

7
We will discuss the problem of finding principal components to the multivariate datasets, that lie on an embedded nonlinear Riemannian manifold within the higher-dimensional space. Our aim is to extend the geometric interpretation of PCA, while being able to capture the non-geodesic form of variation in the data. We introduce the concept of a principal sub-manifold, a manifold passing through the center of the data, and at any point of the manifold, it moves in the direction of the highest curvature in the space spanned by the eigenvectors of the local tangent space PCA. We show the principal sub-manifold yields the usual principal components in Euclidean space. We illustrate how to find, use and interpret the principal sub-manifold, with which a classification boundary can be defined for data sets on manifolds.

Probabilistic semi-supervised learning via sparse graph structure learning

Li Wang (University of Texas at Arlington)

5
We present a probabilistic semi-supervised learning (SSL) framework based on sparse graph structure learning. Different from existing SSL methods with either a predefined weighted graph heuristically constructed from the input data or a learned graph based on the locally linear embedding assumption, the proposed SSL model is capable of learning a sparse weighted graph from the unlabeled high-dimensional data and a small amount of labeled data, as well as dealing with the noise of the input data. Our representation of the weighted graph is indirectly derived from a unified model of density estimation and pairwise distance preservation in terms of various distance measurements, where latent embeddings are assumed to be random variables following an unknown density function to be learned and pairwise distances are then calculated as the expectations over the density for the model robustness to the data noise. Moreover, the labeled data based on the same distance representations is leveraged to guide the estimated density for better class separation and sparse graph structure learning. A simple inference approach for the embeddings of unlabeled data based on point estimation and kernel representation is presented. Extensive experiments on various data sets show the promising results in the setting of SSL compared with many existing methods, and significant improvements on small amounts of labeled data.

Bayesian modeling for paired data in genome-wide association studies with application to breast cancer

Min Chen (University of Texas at Dallas)

4
Genome-wide association studies (GWAS) has emerged as a useful tool to identify common genetic variants that are linked to complex diseases. Conventional GWAS are based on the case-control design where the individuals in cases and controls are independent. In cancer research, matched pair designs, which compare tumor tissues with normal ones from the same subjects, are becoming increasingly popular. Such designs succeed in identifying somatic mutations in tumors while controlling both genetic and environmental factors. Somatic variation is one of the most important cancer risk factors that contribute to continuous monitoring and early detection of various cancers. However, most GWAS analysis methods, developed for unrelated samples in case-control studies, cannot be employed in the matched pair designs. A novel framework is proposed in this manuscript to accommodate for the particularity of matched-data in association studies of somatic mutation effects. In addition, we develop a Bayesian model to combine multiple markers to further improve the power of mapping genome regions to cancer risks.

Q&A for Organized Contributed Session 18

0
This talk does not have an abstract.

Session Chair

Xinlei Wang (Southern Methodist University)

Organized 27

Bayesian Inference for Complex Models (Organizer: Joungyoun Kim)

Conference
11:30 AM — 12:00 PM KST
Local
Jul 20 Tue, 7:30 PM — 8:00 PM PDT

Nonparametric Bayesian latent factor model for multivariate functional data with covariate dependency

Yeonseung Chung (Korea Advanced Institute of Science and Technology (KAIST))

2
Nowadays, multivariate functional data are frequently encountered in many fields of science. While there exist a variety of methodologies for univariate functional clustering, the approach for multivariate functional clustering are less studied. Moreover, there is little research for the functional clustering methods incorporating additional covariate information. In this paper, we propose a Bayesian nonparametric sparse latent factor model for covariate-dependent multivariate functional clustering. Multiple functional curves are represented by basis coefficients for splines, which are reduced to latent factors. Then, the factors and covariates are jointly modeled using a Dirichlet process (DP) mixture of Gaussians to facilitate a model-based covariate dependent multivariate functional clustering. The method is further extended to dynamic multivariate functional clustering to handle sequential multivariate functional data. The proposed methods are illustrated through a simulation study and applications to Canadian weather and air pollution data.

Bayesian model selection for ultrahigh-dimensional doubly-intractable distributions

Jaewoo Park (Yonsei University)

3
Doubly intractable distributions commonly arise in many complex statistical models in physics, epidemiology, ecology, social science, among other disciplines. With an increasing number of model parameters, they often result in ultrahigh-dimensional posterior distributions; this is a challenging problem and is crucial for developing the computationally feasible approach. A particularly important application of ultrahigh-dimensional doubly intractable models is network psychometrics, which gets attention in item response analysis. However, its parameter estimation method, maximum pseudo-likelihood estimator (MPLE) combining with lasso certainly ignores the dependent structure, so that it is inaccurate. To tackle this problem, we propose a novel Markov chain Monte Carlo methods by using Bayesian variable selection methods to identify strong interactions automatically. With our new algorithm, we address some inferential and computational challenges: (1) likelihood functions involve doubly-intractable normalizing functions, and (2) increasing number of items can lead to ultrahigh dimensionality in the model. We illustrate the application of our approaches to challenging simulated and real item response data examples for which studying local dependence is very difficult. The proposed algorithm shows significant inferential gains over existing methods in the presence of strong dependence among items.

Post-processed posteriors for banded covariances

Kwangmin Lee (Seoul National University)

5
We consider Bayesian inference of banded covariance matrices and propose a post- processed posterior. The post-processing of the posterior consists of two steps. In the first step, posterior samples are obtained from the conjugate inverse-Wishart posterior which does not satisfy any structural restrictions. In the second step, the posterior samples are transformed to satisfy the structural restriction through a post-processing function. The conceptually straightforward procedure of the post-processed posterior makes its computation efficient and can render interval estimators of functionals of covariance matrices. We show that it has nearly optimal minimax rates for banded covariances among all possible pairs of priors and post-processing functions. Furthermore, we prove that, the expected coverage probability of the $(1-\alpha)$100% highest posterior density region of the post-processed posterior is asymptotically $1-\alpha$ with respect to a conventional posterior distribution. It implies that the highest posterior density region of the post-processed posterior is, on average, a credible set of a conventional posterior. The advantages of the post-processed posterior are demonstrated by a simulation study and a real data analysis.

Adaptive Bayesian inference for current status data on a grid

Minwoo Chae (Pohang University of Science and Technology)

3
We study a Bayesian approach to the inference of an event time distribution in the current status model where observation times are supported on a grid of potentially unknown sparsity and multiple subjects share the same observation time. The model leads to a very simple likelihood, but statistical inferences are non-trivial due to the unknown sparsity of the grid. In particular, for an inference based on the maximum likelihood estimator, one needs to estimate the density of the event time distribution which is challenging because the event time is not directly observed. We consider Bayes procedures with a Dirichlet prior on the event time distribution. With this prior, the Bayes estimator and credible sets can be easily computed via a Gibbs sampler algorithm. Our main contribution is to provide thorough investigation of frequentist's properties of the posterior distribution. Specifically, it is shown that the posterior convergence rate is adaptive to the unknown sparsity of the grid. If the grid is sufficiently sparse, we further prove the Bernstein-von Mises theorem which guarantees frequentist's validity of Bayesian credible sets. A numerical study is also conducted for illustration.

Q&A for Organized Contributed Session 27

0
This talk does not have an abstract.

Session Chair

Joungyoun Kim (Yonsei Univesity)

Organized 28

Recent advances in Time Series Analysis (Organizer: Changryoung Baek)

Conference
11:30 AM — 12:00 PM KST
Local
Jul 20 Tue, 7:30 PM — 8:00 PM PDT

Resampling long-range dependent time series

Shuyang Bai (University of Georgia)

3
For time series exhibiting long-range dependence, inference through resampling is of particular interest since the asymptotic distributions are often difficult to determine statistically. On the other hand, due to the strong dependence and the non-standard scaling, designing versatile resampling strategies and establishing their validity is challenging. We shall introduce some progress on this direction.

Robust test for structural instability in dynamic factor models

Changryong Baek (Sungkyunkwan University)

3
In this paper, we consider a robust test for structural breaks in dynamic factor models. Our framework considers structural changes when the underlying high dimensional time series is contaminated by some outlying observations, which is typically observed in many real applications such as fMRI, economics and finance. We propose a test based on the robust estimation of vector autoregressive model for principal component factors using minimum density power divergence estimator. Simulations study shows excellent finite sample performance, higher powers while achieving good sizes in all cases considered. Our method is illustrated to resting state fMRI series to detect brain connectivity changes. It shows that brain connectivity indeed changes even in the resting state and this is not an artifact of outlier effects.

On scaling in high dimensions

Gustavo Didier (Tulane University)

3
Scaling relationships have been found in a wide range of phenomena that includes coastal landscapes, hydrodynamic turbulence, the metabolic rates of animals and Internet traffic. For scale invariant systems, also called fractals, a continuum of time scales contributes to the observed dynamics, and the analyst's focus is on identifying mechanisms that relate the scales, often in the form of exponents. In this talk, we will look into the little explored topic of scale invariance in high dimensions, which is especially important in the modern era of "Big Data". We will discuss the role played by wavelets in the analysis of self-similar stochastic processes and visit recent contributions to the wavelet modeling of high- and multidimensional scaling systems.

This is joint work with P. Abry (CNRS and ENS-Lyon), B.C. Boniece (Washington University in St Louis) and H. Wendt (CNRS and Université de Toulouse).

Thresholding and graphical local Whittle estimation

Marie Duker (Cornell University)

2
The long-run variance matrix and its inverse, the so-called precision matrix, give, respectively, information about correlations and partial correlations between dependent component series of multivariate time series around zero frequency. This talk will present non-asymptotic theory for estimation of the long-run variance and precision matrices for high-dimensional Gaussian time series under general assumptions on the dependence structure including long-range dependence. The presented results for thresholding and penalizing versions of the classical local Whittle estimator ensure consistent estimation in a possibly high-dimensional regime. The key technical result is a concentration inequality of the local Whittle estimator for the long-run variance matrix around the true model parameters. In particular, it handles simultaneously the estimation of the memory parameters which enter the underlying model.

Cotrending: testing for common deterministic trends in varying means model

Vladas Pipiras (University of North Carolina at Chapel Hill)

3
In a varying means model, the temporary evolution of a p-vector system is determined by p deterministic nonparametric functions superimposed by error terms, possibly dependent cross sectionally. The basic interest is in linear combinations across the p dimensions that make the deterministic functions constant over time. The number of such linearly independent linear combinations is referred to as a cotrending dimension, and their spanned space as a cotrending space. This work puts forward a framework to test statistically for cotrending dimension and space. Connections to principal component analysis and cointegration are also considered. Finally, a simulation study to assess the finite-sample performance of the proposed tests, and applications to several real data sets are also provided.

Q&A for Organized Contributed Session 28

0
This talk does not have an abstract.

Session Chair

Changryoung Baek (Sungkyunkwan University)

Contributed 29

Spatial Data Analysis

Conference
11:30 AM — 12:00 PM KST
Local
Jul 20 Tue, 7:30 PM — 8:00 PM PDT

Wild bootstrap for high-dimensional spatial data

Daisuke Kurisu (Tokyo Institute of Technology)

2
This study establishes a high-dimensional CLT for the sample mean of p-dimensional spatial data observed over irregularly spaced sampling sites in R^d, allowing the dimension p to be much larger than the sample size n. We adopt a stochastic sampling scheme that can flexibly generate irregularly spaced sampling sites and include both pure increasing domain and mixed increasing domain frameworks. To facilitate statistical inference, we develop the spatially dependent wild bootstrap (SDWB) and justify its asymptotic validity in high dimensions by deriving error bounds that hold almost surely conditionally on the stochastic sampling sites. Our dependence conditions on the underlying random field cover a wide class of random fields such as Gaussian random fields and continuous autoregressive moving average random fields. Through numerical simulations and a real data analysis, we demonstrate the usefulness of our bootstrap-based inference in several applications, including joint confidence interval construction for high-dimensional spatial data and change-point detection for spatio-temporal data.

Lifting scheme for streamflow data in river networks

Seoncheol Park (Chungbuk National University)

6
In this presentation, we suggest a new multiscale method for analyzing water pollutant data located in river networks. The main idea of the proposed method is to adapt the conventional lifting scheme, reflecting the characteristics of streamflow data in the river network domain. Due to the complexity of the data domain structure, it is difficult to apply the lifting scheme to the streamflow data directly. To solve this problem, we propose a new lifting scheme algorithm for streamflow data that incorporates flow-adaptive neighborhood selection, flow proportional weight generation, and flow-length adaptive removal point selection. A nondecimated version of the proposed lifting scheme is also suggested. We will provide a simulation study and a real data analysis of water pollutant data observed on the Geum-River basin in South Korea.

Optimal designs for some bivariate cokriging models

Subhadra Dasgupta (Indian Institute of Technology Bambay-Monash Research Academy)

3
This article focuses on the estimation and design aspects of a bivariate collocated cokriging experiment. For a large class of covariance matrices a linear dependency criterion is identified, which allows the best linear unbiased estimator of the primary variable in a bivariate collocated cokriging setup to reduce to a univariate kriging estimator. Exact optimal designs for efficient prediction for such simple and ordinary reduced cokriging models, with one dimensional inputs are determined. Designs are found by minimizing the maximum and integrated prediction variance, where the primary variable is an Ornstein-Uhlenbeck process. For simple and ordinary cokriging models with known covariance parameters, the equispaced design is shown to be optimal for both criterion functions. The more realistic scenario of unknown covariance parameters is addressed by assuming prior distributions on the parameter vector, thus adopting a Bayesian approach to the design problem. The equispaced design is proved to be the Bayesian optimal design for both criteria. The work is motivated by designing an optimal water monitoring system for an Indian river.

Q&A for Contributed Session 29

0
This talk does not have an abstract.

Session Chair

Yaeji Lim (Chung-Ang University)

Poster II-1

Poster Session II-1

Conference
11:30 AM — 12:00 PM KST
Local
Jul 20 Tue, 7:30 PM — 8:00 PM PDT

Nonconstant error variance in generalized propensity score model

Doyoung Kim (Sungkyunkwan University)

2
In observational study, the most salient challenge is to adjust for confounders to mimic randomized experiment. In the setting of more than two treatment levels, several generalized propensity score (GPS) models have been proposed to balance covariates among treatment groups. Those models assume some parametric forms for treatment variable distributions especially with constant variance assumption. With the existence of heteroskedasticity, the constant variance assumption might affect the existing propensity score methods and the causal effect of interest. In this paper, we propose a novel GPS method to handle non-constant variance in the treatment model by extending Xiao et al. (2020) with weighted least squares method. We conduct a set simulation studies and show that the proposed method outperforms in terms of covariate balance and low bias in causal effect estimates.

Causal mediation analysis with multiple mediators of general structures

Youngho Bae (Sungkyunkwan University)

2
In assessing causal mediation effects, a challenge is that there can be more than one mediator on pathways from treatment to outcome. More precisely, we do not know exactly how many mediators are in the causal path and how they relate to each other. A few approaches have been proposed to estimate direct and indirect effects in the presence of two causally independent or dependent mediators. However, those methods cannot be generalized to settings of more than two mediators where causally independent and dependent mediators coexist. We propose a novel approach to identify direct and indirect effects under a general situation of multiple mediators: two causally dependent mediators (V,W) and one causally independent mediator (M). With our proposed sequential ignorability assumption, the overall treatment effect can be decomposed into direct and mediator-specific indirect effects. A sensitivity analysis strategy is developed for testing the proposed identifying assumptions. We can try to apply this method to the pollination data. In other words, we may use this approach to estimate the effect of a particular emission control technology, that installed on power plants, on ambient pollution where power plant emissions are potential mediators.

A fuzzy clustering ensemble based Mapper algorithm

SungJin Kang (Chung-Ang University)

2
Mapper is a popular topological data analysis method to analyze structure of the complex high-dimensional dataset.
Since Mapper algorithm can be applied to the clustering and feature selection with visualization, it is used in various fields such as biology, chemistry, etc. However, there are some resolution parameters to be chosen before applying the Mapper algorithm, and the results are sensitive to these selection. In this paper, we focus on the selection of the two resolution parameters, the number of intervals, and the overlapping percentage. We propose a new parameter selection method in Mapper based on ensemble technique. We generate multiple Mapper results under various parameters, and apply the fuzzy clustering ensemble method to combine the results. Three real data are considered to evaluate mapper algorithms including proposed one, and the results demonstrate the superiority of the proposed ensemble Mapper method.

Analysis of the association between suicide attempts and meteorological factors

Seunghyeon Kim (Chonnam National University)

2
Several studies indicate that there is an association between suicide and meteorological factors, particularly an increase in ambient temperature increases the risk of suicide. Although suicide attempts are highly likely to lead to suicide in the future, research on the relationship between suicide attempts and meteorological factors is not done much. We evaluated the association between suicide attempts and meteorological factors and examined gender and age differences. Method: We studied 30,012 people who attempted suicide and hospitalized in the emergency room of medical institutions located in Seoul from January 1, 2014, to December 31, 2018. This information was provided by the National Emergency Department Information System data. Seven meteorological factors were studied: daily lowest temperature, highest temperature, average temperature, daily temperature difference, average relative humidity, sunshine duration, and average cloud cover in Seoul during the same period. Meteorological factors were categorized, and the daily Age-standardized Suicide Attempt rate (per 100,000) (ASDAR) was defined for each category. Subgroup analysis by gender and age was done to explore the association between meteorological factors and suicide attempts. From 2014 to 2018, the ASDAR was 61.3. The ASDAR for women was 69.3 and for men was 52.8, the highest suicide attempts by age in their 20s. In terms of the seven meteorological factors, suicide attempts increased as the lowest temperature, the highest temperature, the average temperature, and the relative humidity increased. Both genders showed an increase in suicide attempts as the lowest, the highest, the average temperature, and the relative humidity increased and showed the same trend in all ages except for women in their 20s. We found that the risk of suicide attempts increases as temperature and relative humidity increase. These results suggest that exposure to high temperatures can be a suicide attempt-inducing factor.

Spectral clustering with the Wasserstein distance and its application

SangHun Jeong (Pusan National University)

3
The advance of modern automatic devices can produce a massive number of samples from the population of the individual subject. Although this development allows us to access the entire distributional structure for the population of each individual subject, traditional approaches tend to focus on detecting the local feature to recognize the pattern of the data. In this project, we consider the pattern recognition problem classifying the subject specific distributions into a few categories after estimating the subject specific distributions. Suggested approach consists of three stages procedure including the probability density estimation, the dissimilarity computation, and the clustering computation. Specifically, we use the kernel density estimator for the subject specific distribution in the first stage. Then, we focus on the Wasserstein distance to account for the dissimilarity between these distributions while using the optimal transport map for distance. Finally, we use such a dissimilar measure to figure out the structure of the Laplacian graph and conduct the spectral clustering to deal with these distributions contained not in the Euclidean space but some nonlinear space. We will demonstrate the benefit of the spectral clustering with the Wasserstein distance through simulation studies, applying our suggested method to the real data.

Robust covariance estimation for partially observed functional data

Hyunsung Kim (Chung-Ang University)

3
In recent years, applications have emerged that produce partially observed functional data, where each trajectory is collected over individual-specific subinterval(s) within the whole domain of interest. Robustness to atypical partially observed curves in the application is a practical concern, especially in the dimension reduction step through functional principal component analysis (FPCA). Existing studies implemented FPCA by applying smoothing techniques to estimate mean and covariance functions under irregular functional data structure, however, its estimation is easily affected by outlying curves with heavy-tailed noises or spikes. In this study, we investigate the robust method for the covariance estimation by using bounded loss function, and it enables us to obtain robust functional principal components under partially observed functional data. Using the functional principal scores, we reconstruct the missing parts of trajectories. Numerical experiments show that our method provides a stable and robust estimation when the data contain the atypical curves.

Fast Bayesian functional regression for non-Gaussian spatial data

Yeo Jin Jung (Yonsei University)

2
Functional generalized linear models (FGLM) have been widely used to study the relations between non-Gaussian response and functional covariates. However, most existing works assume independence among observations and therefore have limited applicability on correlated data. A particularly important example is functional data with spatial correlation, where we observe functions over spatial domains, such as the age population curve or temperature curve at each areal unit. In this paper, we extend FGLM by incorporating spatial random effects. However, such models have computational and inferential challenges. The high-dimensional spatial random effects cause the slow mixing of Markov chain Monte Carlo (MCMC) algorithms. Furthermore, spatial confounding can lead to bias in parameter estimates and inflate their variances. To address these issues, we propose an efficient Bayesian method using a sparse reparameterization of high-dimensional random effects. Furthermore, we study an often-overlooked challenge in functional spatial regression: practical issues in obtaining credible bands of functional parameters and assessing whether they provide nominal coverage. We apply our methods to simulated and real data examples, including malaria incidence data and US COVID-19 data. The proposed method is fast while providing accurate functional estimates.

Plenary Wed-3

Wald Lecture 2 (Martin Barlow)

Conference
7:00 PM — 8:00 PM KST
Local
Jul 21 Wed, 3:00 AM — 4:00 AM PDT

Low dimensional random fractals

Martin Barlow (University of British Columbia)

7
The behaviour of the random walk can often can be described by two indices, called by physicists the ‘fractal’ and ‘walk’ dimensions, and denoted by d_f and d_w. This lecture will look at the tools which enable us to calculate these, and obtain the associated transition probability or heat kernel bounds. Three kinds of estimate are needed:
(1) control of the size of balls (2) control of the resistance across annuli, and (3) a smoothness result (a Harnack inequality). In the ‘low dimensional case’ the Harnack inequality is not needed, and (2) can be replaced by easier bounds on the resistance between points. Many random fractals of interest are low dimensional: examples include critical branching processes, the incipient infinite cluster (IIC) for percolation in high dimensions, and the uniform spanning tree. Critical percolation in d=2 remains a challenge however.

Session Chair

Takashi Kumagai (Kyoto University)

Plenary Wed-4

IMS Medallion Lecture (Gerard Ben Arous)

Conference
8:00 PM — 9:00 PM KST
Local
Jul 21 Wed, 4:00 AM — 5:00 AM PDT

Random determinants and the elastic manifold

Gerard Ben Arous (New York University)

3
The elastic manifold is a paradigmatic representative of the class of disordered elastic systems. These are surfaces with rugged shapes resulting from a competition between random spatial impurities (preferring disordered configurations), on the one hand, and elastic self-interactions (preferring ordered configurations), on the other. The elastic manifold model is interesting because it displays a depinning phase transition and has a long history as a testing ground for new approaches in statistical physics of disordered media, for example for fixed dimension by Fisher (1986) using functional renormalization group methods, and in the high-dimensional limit by Mézard and Parisi (1992) using the replica method. We study the energy landscape of this model, and compute the (annealed) topological complexity both of total critical points and of local minima, in the Mezard-Parisi high dimensional limit. Our main result confirms the recent formulas by Fyodorov and Le Doussal (2020). It gives the phase diagram and identifies the boundary between simple and glassy phases. Our approach relies on new exponential asymptotics of random determinants, for non-invariant random matrices.

This is joint work with Paul Bourgade and Benjamin McKenna (Courant Institute, NYU).

Session Chair

Arup Bose (Indian Statistical Institute)

Invited 01

Conformal Invariance and Related Topics (Organizer: Hao Wu)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 21 Wed, 5:30 AM — 6:00 AM PDT

Asymptotics of determinants of discrete Laplacians

Konstantin Izyurov (University of Helsinki)

4
The zeta-regularized determinants of Laplace-Beltrami operators play an important role in analysis and mathematical physics. We show that for Euclidean surfaces with conical singularities that are glued of finitely many equal equilateral triangles or squares, these determinants appear in the asymptotic expansions of the determinants of discrete Laplacians, as the mesh size of a lattice discretization of the surface tends to zero. This establishes a particular case of a conjecture by Cardy and Peschel on the behavior of partition functions of critical lattice models, and their relation to partition functions of underlying Conformal field theories. Joint work with Mikhail Khristoforov.

On Loewner evolutions with jumps

Eveliina Peltola (Rheinische Friedrich-Wilhelms-Universität Bonn)

4
I discuss the behavior of Loewner evolutions driven by a Levy process. Schramm's celebrated version (Schramm-Loewner evolution), driven by standard Brownian motion, has been a great success for describing critical interfaces in statistical physics. Loewner evolutions with other random drivers have been proposed, for instance, as candidates for finding extremal multifractal spectra, and some tree-like growth processes in statistical physics. Questions on how the Loewner trace behaves, e.g., whether it is generated by a (discontinuous) curve, whether it is locally connected, tree-like, or forest-like, have been partially answered in the symmetric alpha-stable case. We shall consider the case of general Levy drivers.

Joint work with Anne Schreuder (Cambridge).

Extremal distance and conformal radius of a CLE_4 loop

Titus Lupu (Centre National de la Recherche Scientifique / Sorbonne Université)

3
Consider CLE_4 in the unit disk and let be the loop of the CLE_4 surrounding the origin. Schramm, Sheffield and Wilson determined the law of the conformal radius seen from the origin of the domain surrounded by this loop. We complement their result by determining the law of the extremal distance between the loop and the boundary of the unit disk. More surprisingly, we also compute the joint law of these conformal radius and extremal distance. This law involves first and last hitting times of a one-dimensional Brownian motion. Similar techniques also allow us to determine joint laws of some extremal distances in a critical Brownian loop-soup cluster. This is a joint work with Juhan Aru (EPFL) and Avelio Sep_lveda (Universit_ Lyon 1 Claude Bernard).

Q&A for Invited Session 01

0
This talk does not have an abstract.

Session Chair

Hao Wu (Yau Mathematical Sciences Center, Tsinghua University)

Invited 14

Optimal Transport (Organizer: Philippe Rigollet)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 21 Wed, 5:30 AM — 6:00 AM PDT

Density estimation and conditional simulation using triangular transport

Youssef Marzouk (Massachusetts Institute of Technology)

3
Triangular transformations of measures, such as the Knothe-Rosenblatt rearrangement, underlie many new computational approaches for density estimation and conditional simulation. This talk discusses two aspects of such constructions. First, is the problem of estimating a triangular transformation given a sample from a distribution of interest—and hence, transport-driven density estimation. We present a general functional framework for representing monotone triangular maps between distributions, and analyze properties of maximum likelihood estimation in this framework. We demonstrate that the associated optimization problem is smooth and, under appropriate conditions, has no spurious local minima. This result provides a foundation for a greedy semi-parametric estimation procedure. Second, we discuss a conditional simulation method that employs a specific composition of maps, derived from the Knothe-Rosenblatt rearrangement, to push forward a joint distribution to any desired conditional. We show that this composed-map approach reduces variability in conditional density estimates and reduces the bias associated with any approximate map representation. Moreover, this approach motivates alternative estimation objectives that focus on the removal of dependence. For context, and as a pointer to an interesting application domain, we elucidate links between conditional simulation with composed maps and the ensemble Kalman filter.

Estimation of Wasserstein distances in the spiked transport model

Jonathan Niles-Weed (Courant Institute of Mathematical Sciences, New York University)

4
We propose a new statistical model, the spiked transport model, which formalizes the assumption that two probability distributions differ only on a low-dimensional subspace. We study the minimax rate of estimation for the Wasserstein distance under this model and show that this low-dimensional structure can be exploited to avoid the curse of dimensionality. As a byproduct of our minimax analysis, we establish a lower bound showing that, in the absence of such structure, the plug-in estimator is nearly rate-optimal for estimating the Wasserstein distance in high dimension. We also give evidence for a statistical-computational gap and conjecture that any computationally efficient estimator is bound to suffer from the curse of dimensionality.

Statistical estimation of barycenters in metric spaces and the space of probability measures

Quentin Paris (National Research University Higher School of Economics)

2
The talk presents rates of convergence for empirical barycenters over a large class of geodesic spaces with curvature bounds in the sense of Alexandrov. We show that parametric rates of convergence are achievable under natural conditions that characterise the bi-extendibility of geodesics emanating from a barycenter. We show that our results apply to infinite-dimensional spaces such as the 2-Wasserstein space, where bi-extendibility of geodesics translates into regularity of Kantorovich potentials

Q&A for Invited Session 14

0
This talk does not have an abstract.

Session Chair

Philippe Rigollet (Massachusetts Institute of Technology)

Invited 21

Probabilistic Theory of Mean Field Games (Organizer: Xin Guo)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 21 Wed, 5:30 AM — 6:00 AM PDT

Portfolio liquidation games with self-exciting order flow

Ulrich Horst (Humboldt University Berlin)

2
We analyze novel portfolio liquidation games with self-exciting order flow. Both the $N$-player game and the mean-field game are considered. We assume that players' trading activities have an impact on the dynamics of future market order arrivals thereby generating an additional transient price impact. Given the strategies of her competitors each player solves a mean-field control problem. We characterize open-loop Nash equilibria in both games in terms of a novel mean-field FBSDE system with unknown terminal condition. Under a weak interaction condition we prove that the FBSDE systems have unique solutions. Using a novel sufficient maximum principle that does not require convexity of the cost function we finally prove that the solution of the FBSDE systems do indeed provide existence and uniqueness of open-loop Nash equilibria.

This is joint work with Guanxing Fu and Xiaonyu Xia.

A mean-field game approach to equilibrium pricing in renewable energy certificate markets

Sebastian Jaimungal (University of Toronto)

3
Solar Renewable Energy Certificate (SREC) markets are a market-based system that incentivizes solar energy generation. A regulatory body imposes a lower bound on the amount of energy each regulated firm must generate via solar means, providing them with a tradeable certificate for each MWh generated. Firms seek to navigate the market optimally by modulating their SREC generation and trading rates. As such, the SREC market can be viewed as a stochastic game, where agents interact through the SREC price. We study this stochastic game by solving the mean-field game (MFG) limit with sub-populations of heterogeneous agents. Market participants optimize costs accounting for trading frictions, cost of generation, non-linear non-compliance costs, and generation uncertainty. Moreover, we endogenize SREC price through market clearing. We characterize firms' optimal controls as the solution of McKean-Vlasov (MV) FBSDEs and determine the equilibrium SREC price. We establish the existence and uniqueness of a solution to this MV-FBSDE, and prove that the MFG strategies form an $\epsilon$-Nash equilibrium for the finite player game. Finally, we develop a numerical scheme for solving the MV-FBSDEs and conduct a simulation study.

Entropic optimal transport

Marcel Nutz (Columbia University)

1
Applied optimal transport is flourishing after computational advances have enabled its use in real-world problems with large data sets. Entropic regularization is a key method to approximate optimal transport in high dimensions while retaining feasible computational complexity. In this talk we discuss the convergence of entropic optimal transport to the unregularized counterpart as the regularization parameter vanishes, as well as the stability of entropic optimal transport with respect to its marginals.

Based on joint works with Espen Bernton (Columbia), Promit Ghosal (MIT), Johannes Wiesel (Columbia).

Session Chair

Xin Guo (University of California, Berkeley)

Invited 35

Stochastic Analysis in Mathematical Finance and Insurance (Organizer: Marie Kratz)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 21 Wed, 5:30 AM — 6:00 AM PDT

From signature based models in finance to affine and polynomial processes and back

Christa Cuchiero (University of Vienna)

1
Modern universal classes of dynamic processes, based on neural networks or signature methods, have recently entered the field of stochastic modeling, in particular in Mathematical Finance. This has opened the door to more data-driven and thus more robust model selection mechanisms, while first principles like no arbitrage still apply. We focus here on signature based models, i.e. (possibly Levy driven) stochastic processes whose characteristics are linear functions of an underlying process' signature and present methods how to learn these characteristics from data. From a more theoretical point of view, we show how these new models can be embedded in the framework of affine and polynomial processes, which have been -- due to their tractability -- the dominating process class prior to the new era of highly overparametrized dynamic models. Indeed, we prove that generic classes of models can be viewed as infinite dimensional affine processes, which in this setup coincide with polynomial processes. A key ingredient to establish this result is again the signature process. This then allows to get power series expansions for expected values of analytic functions of the process' marginals.

The talk is based on joint works with Guido Gazzani, Francesca Primavera, Sara-Svaluto-Ferro and Josef Teichmann.

Optimal dividends with capital injections at a level-dependent cost

Ronnie Loeffen (University of Manchester)

2
Assume the capital or surplus of an insurance company evolves randomly over time as in the Cramér-Lundberg model but where in addition the company has the possibility to pay out dividends to shareholders and to inject capital at a cost from shareholders. We impose that when the resulting surplus becomes negative the company has to decide whether to inject capital to get to a positive surplus level in order for the company to survive or to let ruin occur. The objective is to find the combined dividends and capital injections strategy that maximises the expected paid out dividends minus cost of injected capital, discounted at a constant rate, until ruin. Such optimal dividends and capital injections problems have been studied before but in the cae where the cost of capital (injections) is constant whereas we consider the setting where the cost of capital is level-dependent in the sense that it is higher when the surplus is below 0 than when it is above 0. We investigate optimality of a 3-parameter strategy with parameters -r < 0 < c < b where dividends are paid out to keep the surplus below b, capital injections are made in order to keep the surplus above c unless capital drops below the level -r in which case the company decides to let ruin occur.

This is joint work with Zbigniew Palmowski.

Exponential Lévy-type change-point models in mathematical finance

Lioudmila Vostrikova (University of Angers)

1

Q&A for Invited Session 35

0
This talk does not have an abstract.

Session Chair

Marie Kratz (ESSEC Business School, CREAR)

Invited 40

KSS Invited Session: Nonparametric and Semi-parametric Approaches in Survival Analysis (Organizer: Woncheol Jang)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 21 Wed, 5:30 AM — 6:00 AM PDT

Smoothed quantile regression for censored residual lifetime

Sangwook Kang (Yonsei University)

6
We consider a regression modeling of the quantiles of residual lifetime at a specific time given a set of covariates. For estimation of regression parameters, we propose an induced smoothed version of the existing non-smooth estimating equations approaches. The proposed estimating equations are smooth in regression parameters, so solutions can be readily obtained via standard numerical algorithms. Moreover, smoothness in the proposed estimating equations enables one to obtain a closed form expression of the robust sandwich-type covariance estimator of regression estimators. To handle data under right censoring, inverse probabilities of censoring are incorporated as weights. Consistency and asymptotic normality of the proposed estimator are established. Extensive simulation studies are conducted to verify performances of the proposed estimator under various finite samples settings. We apply the proposed method to dental study data evaluating the longevity of dental restorations.

Superefficient estimation of future conditional hazards based on marker information

Enno Mammen (Heidelberg University)

5
We introduce a new concept for forecasting future events based on marker information. The model is based on a nonparametric approach with counting processes featuring so-called high quality markers. Despite the model having nonparametric parts we show that we attain a parametric rate of uniform consistency and uniform asymptotic normality. In usual nonparametric scenarios reaching such a fast convergence rate is not possible, so one can say that our approach is superefficient. We then use these theoretical results to construct simultaneous confidence bands directly for the hazard rate.

On a semiparametric estimation method for AFT mixture cure models

Ingrid Van Keilegom (Katholieke Universiteit Leuven)

4
When studying survival data in the presence of right censoring, it often happens that a certain proportion of the individuals under study do not experience the event of interest and are considered as cured. The mixture cure model is one of the common models that take this feature into account. It depends on a model for the conditional probability of being cured (called the incidence) and a model for the conditional survival function of the uncured individuals (called the latency). This work considers a logistic model for the incidence and a semiparametric accelerated failure time model for the latency part. The estimation of this model is obtained via the maximization of the semiparametric likelihood, in which the unknown error density is replaced by a kernel estimator based on the Kaplan-Meier estimator of the error distribution. Asymptotic theory for consistency and asymptotic normality of the parameter estimators is provided. Moreover, the proposed estimation method is compared with several competitors. Finally, the new method is applied to data coming from a cancer clinical trial.

Q&A for Invited Session 40

0
This talk does not have an abstract.

Session Chair

Woncheol Jang (Seoul National University)

Organized 03

Gaussian Processes (Organizer: Naomi Feldheim)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 21 Wed, 5:30 AM — 6:00 AM PDT

Gaussian determinantal processes: a new model for directionality in data

Subhro Ghosh (National University of Singapore)

4
Determinantal point processes (DPPs) have recently become pop- ular tools for modeling the phenomenon of negative dependence, or repulsion, in data. However, our understanding of an analogue of a classical parametric statistical theory is rather limited for this class of models. In this work, we investigate a parametric family of Gaussian DPPs with a clearly interpretable effect of parametric modulation on the observed points. We show that parameter modulation impacts the observed points by introducing direc- tionality in their repulsion structure, and the principal directions correspond to the directions of maximal (i.e., the most long- ranged) dependency. This model readily yields a viable alternative to principal component analysis (PCA) as a dimension reduc- tion tool that favors directions along which the data are most spread out. This methodological contribution is complemented by a statistical analysis of a spiked model similar to that employed for covariance matrices as a framework to study PCA. These theoretical investigations unveil intriguing questions for further examination in random matrix theory, stochastic geometry, and related topics.

Based on joint work with Philippe Rigollet.

Persistence exponents of Gaussian stationary functions

Ohad Noy Feldheim (Hebrew University of Jerusalem)

3
Let $f:R \to R$ be a Gaussian stationary process, that is, a random function, invariant to real shifts, whose marginals have multi-normal distribution. Persistence is the event that the process remains positive over the interval [0,T]. The asymptotics of this quantity as T tends to infinity has been long studied since the early 50’s with motivation stemming from Probability theory, Physics and Electric Engineering. In recent years, it has been discovered that persistence is best characterized in spectral terms. This view was used to describe the decay rate of persistence probability (up to a constant in the exponent). In this work we take this study one step further, showing mild conditions for the existence of persistence exponents, that is, C such that the probability of persistence on [0,T] is $e^{-CT(1+o(1)}$. This we obtain by establishing an array of continuity properties of the persistence probability and relating the problem to small ball exponents. In particular, we show that the persistence exponent is independent from the singular component of the spectral measure away from the origin.

Joint work with N. Feldheim and S. Mukherjee.

Connectivity of the excursion sets of Gaussian fields with long-range correlations

Stephen Muirhead (University of Melbourne)

4
In recent years the global connectivity of the excursion sets of smooth Gaussian fields with rapidly decaying correlations has been fairly well understood (at least in the case of positively-correlated fields), and the general picture that emerges is that the connectivity undergoes a phase transition which is analogous to that of Bernoulli percolation. On the other hand, if the fields have long-range correlations then they are believed to lie outside the Bernoulli percolation universality class, with different scaling limits and critical exponents. The behaviour of the connectivity is not well-understood in this regime, and in this talk I will present some recent results and conjectures that shed some light on the behaviour.

Overcrowding estimates for the nodal volume of stationary Gaussian processes on R^d

Lakshmi Priya (Indian Institute of Science)

3
We consider centered stationary Gaussian processes (SGPs) on Euclidean spaces R^d and study an aspect of their nodal set: for T>0, we study the nodal volume in [0,T]^d. In earlier studies, under varying assumptions on the spectral measures of SGPs, the following statistics were obtained for the nodal volume in [0,T]^d: expectation, variance asymptotics, CLT, exponential concentration (only for d=1), and finiteness of moments.

We study the unlikely event of overcrowding of the nodal set in [0,T]^d; this is the event that the volume of the nodal set in [0,T]^d is much larger than its expected value. Under some mild assumptions on the spectral measure, we obtain estimates for the overcrowding event's probability. We first get overcrowding estimates for the zero count of SGPs on R. In higher dimensions, we consider Crofton's formula which gives the volume of the nodal set in terms of the number of intersections of the nodal set with all lines in R^d. We discretise this formula to get a more workable version of it; we use this and the ideas used to obtain the overcrowding estimates in one dimension to get the overcrowding estimates in higher dimensions.

Q&A for Organized Contributed Session 03

0
This talk does not have an abstract.

Session Chair

Naomi Feldheim (Bar-Ilan University)

Organized 20

Theories and Applications for Complex Data Analysis (Organizer: Arlene K.H. Kim)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 21 Wed, 5:30 AM — 6:00 AM PDT

Partly interval-censored rank regression

Sangbum Choi (Korea University)

4
This paper studies estimation of the semiparametric accelerated failure time model for double and partly interval-censored data. Gehan-type weighted estimating function is constructed by contrasting comparable rank cases under interval-censoring. An extension to the general class of log-rank estimating functions can also be investigated, along with an efficient variance estimation procedure. Asymptotic behaviors of the proposed estimator are established under mild conditions by using empirical processes theory. Simulation studies demonstrate our method works very well with practical size of samples. Two data examples are given to illustrate the practical usefulness of our method.

Two-sample testing of high-dimensional linear regression coefficients via complementary sketching

Tengyao Wang (University College London)

6
We introduce a new method for two-sample testing of high-dimensional linear regression coefficients without assuming that those coefficients are individually estimable. The procedure works by first projecting the matrices of covariates and response vectors along directions that are complementary in sign in a subset of the coordinates, a process which we call 'complementary sketching'. The resulting projected covariates and responses are aggregated to form two test statistics, which are shown to have essentially optimal asymptotic power under a Gaussian design when the difference between the two regression coefficients is sparse and dense respectively. Simulations confirm that our methods perform well in a broad class of settings.

Optimal rates for independence testing via U-statistic permutation tests

Tom Berrett (University of Warwick)

4
Independence testing is one of the most well-studied problems in statistics, and the use of procedures such as the chi-squared test is ubiquitous in the sciences. While tests have traditionally been calibrated through asymptotic theory, permutation tests are experiencing a growth in popularity due to their simplicity and exact Type I error control. In this talk I will present new, finite-sample results on the power of a new class of permutation tests, which show that their power is optimal in many interesting settings, including those with discrete, continuous, and functional data. A simulation study shows that our test for discrete data can significantly outperform the chi-squared for natural data-generating distributions. Defining a natural measure of dependence $D(f)$ to be the squared $L^2$-distance between a joint density $f$ and the product of its marginals, we first show that there is generally no valid test of independence that is uniformly consistent against alternatives of the form $\{f: D(f) \geq \rho^2 \}$. Motivated by this observation, we restrict attention to alternatives that satisfy additional Sobolev-type smoothness constraints, and consider as a test statistic a U-statistic estimator of $D(f)$. Using novel techniques for studying the behaviour of U-statistics calculated on permuted data sets, we prove that our tests can be minimax optimal. Finally, based on new normal approximations in the Wasserstein distance for such permuted statistics, we also provide an approximation to the power function of our permutation test in a canonical example, which offers several additional insights.

This is joint work with Ioannis Kontoyiannis and Richard Samworth.

Empirical Bayes PCA in high dimensions

Zhou Fan (Yale University)

4
When the dimension of data is comparable to or larger than the number of data samples, Principal Components Analysis (PCA) may exhibit problematic high-dimensional noise. In this work, we propose an Empirical Bayes PCA method that reduces this noise by estimating a joint prior distribution for the principal components. EB-PCA is based on the classical Kiefer-Wolfowitz nonparametric MLE for empirical Bayes estimation, distributional results derived from random matrix theory for the sample PCs, and iterative refinement using an Approximate Message Passing (AMP) algorithm. In theoretical “spiked” models, EB-PCA achieves Bayes-optimal estimation accuracy in the same settings as an oracle Bayes AMP procedure that knows the true priors. Empirically, EB-PCA significantly improves over PCA when there is strong prior structure, both in simulation and on quantitative benchmarks constructed from the 1000 Genomes Project and the International HapMap Project. An illustration is presented for analysis of gene expression data obtained by single-cell RNA-seq.

Q&A for Organized Contributed Session 20

0
This talk does not have an abstract.

Session Chair

Arlene K.H. Kim (Korea University)

Contributed 13

Random Structures

Conference
9:30 PM — 10:00 PM KST
Local
Jul 21 Wed, 5:30 AM — 6:00 AM PDT

Universal phenomena for random constrained permutations

Jacopo Borga (University of Zurich)

4
How do local/global constraints affect the limiting shape of random permutations? This is a classical question that has received considerable attention in the last 15 years. In this talk we give an overview of some recent results on this topic, mainly focusing on random pattern-avoiding permutations. We first introduce a notion of scaling limit for permutations, called permutons. Then we present some recent results that highlight certain universal phenomena for permuton limits of various families of pattern-avoiding permutations. These results will lead us to the definition of three remarkable new limiting random permutons: the “biased Brownian separable permuton”, the “Baxter permuton” and the “skew Brownian permuton”. We finally discuss some recent results that show how permuton limits are useful to investigate the behaviour of certain statistics on random pattern-avoiding permutations, such as the length of the longest increasing subsequence.

The scaling limit of the strongly connected components of a uniform directed graph with an i.i.d. degree sequence

Serte Donderwinkel (University of Oxford)

5

Spherical principal curves

Jongmin Lee (Seoul National University)

10
This paper presents a new approach for dimension reduction of data observed on spherical surfaces. Several dimension reduction techniques have been developed in recent years for non-Euclidean data analysis. As a pioneer work, Hauberg (2016) attempted to implement principal curves on Riemannian manifolds. However, this approach uses approximations to process data on Riemannian manifolds, resulting in distorted results. This study proposes a new approach to project data onto a continuous curve to construct principal curves on spherical surfaces. Our approach lies in the same line of Hastie and Stuetzle (1989) that proposed principal curves for data on Euclidean space. We further investigate the stationarity of the proposed principal curves that satisfy the self-consistency on spherical surfaces. The results on the real data analysis and simulation examples show promising empirical characteristics of the proposed approach.

Q&A for Contributed Session 13

0
This talk does not have an abstract.

Session Chair

Namgyu Kang (Korea Institute for Advanced Study)

Contributed 20

Copula Modeling

Conference
9:30 PM — 10:00 PM KST
Local
Jul 21 Wed, 5:30 AM — 6:00 AM PDT

Estimation of multivariate generalized gamma convolutions through Laguerre expansions

Oskar Laverny (Université Lyon 1)

3
The generalized gamma convolution class of distribution appeared in Thorin's work while looking for the infinite divisibility of the log-Normal and Pareto distributions. Although these distributions have been extensively studied in the univariate case, the multivariate case and the dependence structures that can arise from it have received little interest in the literature. Furthermore, only one projection procedure for the univariate case was recently constructed, and no estimation procedure are available. By expending the densities of multivariate generalized gamma convolutions into a tensorized Laguerre basis, we bridge the gap and provide performant estimations procedures for both the univariate and multivariate cases. We provide some insights about performance of these procedures, and a convergent series for the density of multivariate gamma convolutions, which is shown to be more stable than Moschopoulos's and Mathai's univariate series. We furthermore discuss some examples.

Copula-based Markov zero-inflated count time series models

Mohammed Alqawba (Qassim University)

3
Count time series data with excess zeros are observed in several applied disciplines. When these zero-inflated counts are sequentially recorded, they might result in serial dependence. Ignoring the zero-inflation and the serial dependence might produce inaccurate results. In this paper, Markov zero-inflated count time series models based on a joint distribution on consecutive observations are proposed. The joint distribution function of the consecutive observations is constructed through copula functions. First and second order Markov chains are considered with the univariate margins of zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), or zero-inflated Conway-Maxwell-Poisson (ZICMP) distributions. Under the Markov models, bivariate copula functions such as the bivariate Gaussian, Frank, and Gumbel are chosen to construct a bivariate distribution of two consecutive observations. Moreover, the trivariate Gaussian and max-infinitely divisible copula functions are considered to build the joint distribution of three consecutive observations. Likelihood based inference is performed and asymptotic properties are studied. To evaluate the estimation method and the asymptotic results, simulated examples are studied. The proposed class of models are applied to sandstorm counts example. The results suggest that the proposed models have some advantages over some of the models in the literature for modeling zero-inflated count time series data.

Bi-factor and second-order copula models for item response data

Sayed H. Kadhem (University of East Anglia)

3
Bi-factor and second-order models based on copulas are proposed for item response data, where the items can be split into non-overlapping groups such that there is a homogeneous dependence within each group. Our general models include the Gaussian bi-factor and second-order models as special cases and can lead to more probability in the joint upper or lower tail compared with the Gaussian bi-factor and second-order models. Details on maximum likelihood estimation of parameters for the bi-factor and second-order copula models are given, as well as model selection and goodness-of-fit techniques. Our general methodology is demonstrated with an extensive simulation study and illustrated for the Toronto Alexithymia Scale. Our studies suggest that there can be a substantial improvement over the Gaussian bi-factor and second-order models both conceptually, as the items can have interpretations of latent maxima/minima or mixtures of means in comparison with latent means, and in fit to data.

Q&A for Contributed Session 20

0
This talk does not have an abstract.

Session Chair

Daewoo Pak (Yonsei University)

Contributed 26

Multivariate Data Analysis

Conference
9:30 PM — 10:00 PM KST
Local
Jul 21 Wed, 5:30 AM — 6:00 AM PDT

A nonparametric test for paired data

Grzegorz Wyłupek (Institute of Mathematics, University of Wrocław)

2
The paper proposes the weighted Kolmogorov-Smirnov type test for the two-sample problem when the data is paired. We derive the asymptotic distribution of the test statistic under the null model as well as prove the consistency of the related test under the general alternatives. The dependence of the asymptotic distribution of the test statistic from the dependence structure of the data forces the usage of the wild bootstrap technique for the inference. The bootstrap version of the test controls the Type I error under the null model and works very well under the alternative. In the proofs, the main role play the empirical processes' tools.

Inference for Generalized Multivariate Analysis of Variance (GMANOVA) models, under multivariate skew t distribution for modelling skewed and heavy-tailed data

Sayantee Jana (Indian Institute of Management Nagpur)

3
The most extensively used statistical model in practice, both in research and in practice, is the linear model, due to its simplicity and interpretability. Linear models are preferred, even when approximate, for both univariate and multivariate data, especially since, multivariate skewed models come with their own added complexity. Hence, researchers would not prefer to deliberately add extra layers of complexity by considering non-linear models. Generalized Multivariate Analysis of Variance (GMANOVA) models, is one such linear model useful for the analysis of longitudinal data, which is repeated measurements of a continuous variable, from several individuals across any ordered variable such as time, temperature, pressure etc. It consists of a bilinear structure which allows for comparison across between groups, while maintaining the temporal structure of the data, unlike the Multivariate Analysis of Variance (MANOVA) which does not allow for any temporal ordering or temporal correlation in the model. GMANOVA models are widely used in economics, social and physical sciences, medical research and pharmaceutical studies. However, despite financial data being time-varying, the traditional GMANOVA model has limited to no applications in finance, due to the skewed and volatile nature of such data. This in turn makes financial data the right candidate for Multivariate Skew t (MST) distribution, as it allows for outliers in the data to be modelled, due to its heavy tails. In fact, portfolio analysis including mutual funds, capital asset pricing are all modelled using elliptical distributions, especially multivariate t distribution. The classical GMANOVA model assumes multivariate normality, and hence inferential tools developed for the classical GMANOVA model, may not be appropriate for skewed and heavy-tailed data. In our study, first we explore the sensitivity of inferential tools developed under multivariate normality for skewed and volatile data, and then we develop inferential tools for the GMANOVA model under the MST distribution.

Multiscale representation of directional scattered data: use of anisotropic radial basis functions

Junhyeon Kwon (Seoul National Universtiy)

10
Spatial inhomogeniety along the one-dimensional curve makes two-dimensional data non-stationary. Curvelet transform, first proposed by Candes and Donoho (1999), is one of the most well-known multiscale methods to represent the directional singularity, but it has a limitation that the data needs to be observed on equally-spaced sites. On the other hand, radial basis function interpolation is widely used to approximate the underlying function from the scattered data. However, the isotropy of the radial basis functions lowers the efficiency of the directional representation. This research proposes a new multiscale method that uses anisotropic radial basis functions to efficiently represent the direction from the noisy scattered data in two-dimensional Euclidean space. Basis functions are orthogonalized across the scales so that each scale can represent global or local directional structure separately. It is shown that the proposed method is remarkable for representing directional scattered data through the numerical experiments. Convergence property and practical issues in implementation are discussed as well.

Q&A for Contributed Session 26

0
This talk does not have an abstract.

Session Chair

Yunjin Choi (University of Seoul)

Contributed 31

Statistical Prediction

Conference
9:30 PM — 10:00 PM KST
Local
Jul 21 Wed, 5:30 AM — 6:00 AM PDT

Robust geodesic regression

Ha-Young Shin (Seoul National University)

5
This study explores robust regression for data on Riemannian manifolds. Geodesic regression is the generalization of linear regression to a setting with a manifold-valued dependent variable and one or more real-valued independent variables. The existing work on geodesic regression uses the sum-of-squared errors to find the solution, but as in the classical Euclidean case, the least-squares method is highly sensitive to outliers. In this study, we use M-type estimators, including the L1, Huber and Tukey biweight estimators, to perform robust geodesic regression, and describe how to calculate the tuning parameters for the latter two. We show that, on compact symmetric spaces, all M-type estimators are maximum likelihood estimators, and argue for the overall superiority of the L1 estimator over the L2 and Huber estimators on high-dimensional manifolds and over the Tukey biweight estimator on compact high-dimensional manifolds. A derivation of the Riemannian Gaussian distribution on k-dimensional spheres is also included. Results from numerical examples, including analysis of real neuroimaging data, demonstrate the promising empirical properties of the proposed approach.

A multi-sigmoidal logistic model: statistical analysis and first-passage-time application

Paola Paraggio (Università degli Studi di Salerno (UNISA))

2
Sigmoidal growth models are widely used in various applied fields, from biology to software reliability and economics. Usually, they describe dynamics in restricted environments.
However, many real phenomena exhibit different phases, each one following a sigmoidal-type pattern. Stimulated by these more complex dynamics, many researchers investigate generalized versions of classical sigmoidal models characterized by several inflection points.
Along these research lines, a generalization of the classical logistic growth model is considered in the present work, introducing in its expression a polynomial term. The model is described by a stochastic differential equation obtained from the deterministic counterpart by adding a multiplicative noise term. The resulting diffusion process, having a multi-sigmoidal mean, may be useful in the description of particular growth dynamics in which the evolution occurs by stages.
The problem of finding the maximum likelihood estimates of the parameters involved in the definition of the process is also addressed. Precisely, the maximization of the likelihood function will be performed by means of meta-heuristic optimization techniques. Moreover, various strategies for the selection of the optimal degree of the polynomial will be provided.
Further, the first-passage-time (FPT) problem is considered: an approximation of its density function will be obtained numerically, by means of the fptdApprox R-package
Finally, some simulated examples are presented.

Statistical inference for functional linear problems

Tim Kutta (Ruhr University Bochum)

3
In this talk we consider the linear regression model Y=SX+e with functional regressors and responses. This model has attracted much attention in terms of estimation and prediction, but less is known with regard to statistical inference for the unobservable slope operator S. In this talk we discuss new inference tools to detect relevant deviations of the parameter S from a hypothesized slope S'. As modes of comparison we consider the Hilbert-Schmidt norm || S-S'||^2 as well as the prediction error E || SX-S' X ||^2. Our theory is based on the novel technique of "smoothness shifting", which helps us to circumvent existing negative results on the weak convergence of estimators for S. In contrast to all related works the test statistic proposed converges at a rate of N^(-1/2), permitting a fast detection of local alternatives. Furthermore, while most existing procedures rely on i.i.d. observations for Gaussian approximations, our test statistic converges even in the presence of dependence, quantified by phi- or strong mixing. Due to a self-normalization procedure, our approach is user friendly, computationally inexpensive and robust.

Q&A for Contributed Session 31

0
This talk does not have an abstract.

Session Chair

Changwon Lim (Chung-Ang University)

Invited 03

Potential Theory for Non-local Operators and Jump Processes (Organizer: Panki Kim)

Conference
10:30 PM — 11:00 PM KST
Local
Jul 21 Wed, 6:30 AM — 7:00 AM PDT

SDEs driven by multiplicative stable-like Levy processes

Zhen-Qing Chen (University of Washington)

6
In this talk, I will present results on weak as well as strong well-poshness results for solutions to time-inhomogeneous SDEs driven by stable-like Levy processes with Holder continuous coefficients. The Levy measure of the Levy process can be anisotropic and singular with respect to the Lebesgue measure on R^d and its support can be a proper subset of R^d.
Based on joint work with Xicheng Zhang and Guohuan Zhao.

Periodic homogenization of non-symmetric Lévy-type processes

Takashi Kumagai (Kyoto University)

6

Optimal Hardy identities and inequalites for the fractional Laplacian on $L^p$

Krzysztof Bogdan (Wrocław University of Science and Technology)

6
We will present a route from symmetric Markovian semigroups to Hardy inequalities, to nonexplosion and contractivity results for Feynman-Kac semigroups on $L^p$. We will focus on the fractional Laplacian on $\mathbb{R}^d$, in which case the constants, estimates of the Feynman-Kac semigroups and tresholds for contractivity and explosion are sharp. Namely we will discuss selected results from joint work with Bartłomiej Dyda, Tomasz Grzywny, Tomasz Jakubowski, Panki Kim, Julia Lenczewska, Katarzyna Pietruska-Pałuba and Dominika Pilarczyk (see arXiv).

Q&A for Invited Session 03

0
This talk does not have an abstract.

Session Chair

Panki Kim (Seoul National University)

Invited 10

Change-point Problems for Complex Data (Organizer: Claudia Kirch)

Conference
10:30 PM — 11:00 PM KST
Local
Jul 21 Wed, 6:30 AM — 7:00 AM PDT

Two-sample tests for relevant differences in the eigenfunctions of covariance operators

Alexander Aue (University of California at Davis)

4
This talk deals with two-sample tests for functional time series data, which have become widely available in conjunction with the advent of modern complex observation systems. Here, particular interest is in evaluating whether two sets of functional time series observations share the shape of their primary modes of variation as encoded by the eigenfunctions of the respective covariance operators. To this end, a novel testing approach is introduced that connects with, and extends, existing literature in two main ways. First, tests are set up in the relevant testing framework, where interest is not in testing an exact null hypothesis but rather in detecting deviations deemed sufficiently relevant, with relevance determined by the practitioner and perhaps guided by domain experts. Second, the proposed test statistics rely on a self-normalization principle that helps to avoid the notoriously difficult task of estimating the long-run covariance structure of the underlying functional time series. The main theoretical result of this paper is the derivation of the large-sample behavior of the proposed test statistics. Empirical evidence, indicating that the proposed procedures work well in finite samples and compare favorably with competing methods, is provided through a simulation study, and an application to annual temperature data.

Multiple change point detection under serial dependence

Haeran Cho (University of Bristol)

5
We propose a methodology for detecting multiple change points in the mean of an otherwise stationary, autocorrelated, linear time series. It combines solution path generation based on the wild energy maximisation principle, and an information criterion-based model selection strategy termed gappy Schwarz criterion. The former is well-suited to separating shifts in the mean from fluctuations due to serial correlations, while the latter simultaneously estimates the dependence structure and the number of change points without performing the difficult task of estimating the level of the noise as quantified e.g. by the long-run variance. We provide modular investigation into their theoretical properties and show that the combined methodology, named WEM.gSC, achieves consistency in estimating both the total number and the locations of the change points. The good performance of WEM.gSC is demonstrated via extensive simulation studies, and we further illustrate its usefulness by applying the methodology to London air quality data.

An asymptotic test for constancy of the variance in a time series

Herold Dehling (Ruhr-University Bochum)

6
We present a novel approach to test for heteroscedasticity of a non-stationary time series that is based on Gini's mean difference of logarithmic local sample variances. In order to analyse the large sample behaviour of our test statistic, we establish new limit theorems for U-statistics of dependent triangular arrays. We derive the asymptotic distribution of the test statistic under the the null hypothesis of a constant variance and show that the test is consistent against a large class of alternatives, including multiple structural breaks in the variance. Our test is applicable even in the case of non-stationary processes, assuming a locally varying mean function. The performance of the test and its comparatively low computation time are illustrated in an extensive simulation study.

Q&A for Invited Session 10

0
This talk does not have an abstract.

Session Chair

Claudia Kirch (Otto von Guericke University Magdeburg)

Invited 12

Statistics for Data with Geometric Structure (Organizer: Sungkyu Jung)

Conference
10:30 PM — 11:00 PM KST
Local
Jul 21 Wed, 6:30 AM — 7:00 AM PDT

Wasserstein regression

Hans-Georg Müller (University of California, Davis)

4
The analysis of samples of random objects that do not lie in a vector space has found increasing attention in statistics in recent years. An important class of such object data are univariate probability measures defined on the real line. Adopting the Wasserstein metric, we develop a class of regression models for data that include random distributions as predictors and distributions or scalars as responses. To study these regression models, we utilize the geometry of tangent bundles of the metric space of random measures with the Wasserstein metric and derive asymptotic rates of convergence for estimators of the regression coefficient function and for predicted distributions. We also study an extension to autoregressive models for distribution-valued time series. The proposed methods are illustrated with data that include distributional components in various regression settings.

Finite sample smeariness for Fréchet means

Stephan Huckemann (Georg-August-Universitaet Goettingen)

4
It is well known for the Euclidean setting that a variety of statistical asymptotic tests, e.g. T-tests or MANOVA, are robust under nonnormality. It is much less known, that this cannot be taken for granted, for similar tests based on manifolds data, in particular for data on compact spaces. The reason lies in a recently discovered phenomenon: Smeariness lowers the classical square-root-of-n-rate for Fréchet means. While true smeariness is only present for a nullset of most parametric families, it surfaces in a finite sample regime for a large class of distributions: For instance, all nontrivial distributions on spheres are affected and all distributions on circles whose support extends beyond a half circle, like, e.g. all Fisher-von-Mises distributions. We give finite sample smeariness a precise definition and illustrate some effects in theory and practice. In particular, the presence of finite sample smeariness renders tests based on quantiles of asymptotic distributions ineffective up to considerably high sample sizes. Suitably designed bootstrap tests remain valid, however.

Score matching for microbiome compositional data

Janice Scealy (Australian National University)

2
Compositional data and multivariate count data with known totals are challenging to analyse due to the non-negativity and sum constraint on the sample space. It is often the case with microbiome compositional data that many of the components are highly right-skewed, with large numbers of zeros. A major limitation of currently available estimators for compositional models is that they either cannot handle many zeros in the data or are not computationally feasible in moderate to high dimensions. We derive a new set of novel score matching estimators applicable to distributions on a Riemannian manifold with boundary, of which the standard simplex is a special case. The score matching method is applied to estimate the parameters in a new flexible model for compositional data and we show that the estimators are scalable and available in closed form. We apply the new model and estimators to real microbiome compositional data and show that the model provides a good fit to the data.

Q&A for Invited Session 12

0
This talk does not have an abstract.

Session Chair

Sungkyu Jung (Seoul National University)

Invited 25

Random Graphs (Organizer: Christina Goldschmidt)

Conference
10:30 PM — 11:00 PM KST
Local
Jul 21 Wed, 6:30 AM — 7:00 AM PDT

An unexpected phase transition for percolation on scale-free networks

Souvik Dhara (Massachusetts Institute of Technology)

5
The talk concerns the critical behavior for percolation on finite, inhomogeneous random networks, where the weights of the vertices follow a power-law distribution with exponent $\tau \in (2,3)$. Such networks, often referred to as scale-free networks, exhibit critical behavior when the percolation probability tends to zero, as the network-size becomes large. We identify the critical window for percolation phase transition. Rather surprisingly, the critical window turns out to be of finite length, which is in sharp contrast with the previously studied critical behaviors for$\tau \in (3,4)$ and $\tau >4$ regimes. The rescaled vector of maximum component sizes are shown to converge in distribution to an infinite vector of non-degenerate random variables that can be described in terms of components of a one-dimensional inhomogeneous percolation model studied in a seminal work by Durrett and Kesten (1990).

Based on joint work with Shankar Bhamidi, Remco van der Hofstad.

Recent results for the graph alignment problem

Marc Lelarge (INRIA)

3
Random graph alignment refers to recovering the underlying vertex correspondence between two random graphs with correlated edges. This can be viewed as an average-case and noisy version of the well-known NP-hard graph isomorphism problem. For the correlated Erdös-Rényi model, we give an impossibility result for partial recovery in the sparse regime. We also propose a machine learning approach to solve the problem and design a new graph neural network architecture showing great performances.

Local law and Tracy-Widom limit for sparse stochastic block models

Ji Oon Lee (Korea Advanced Institute of Science and Technology (KAIST))

5
We consider the spectral properties of sparse stochastic block models, where N vertices are partitioned into K balanced communities. Under an assumption that the intra-community probability and inter-community probability are of similar order, we prove a local semicircle law up to the spectral edges, with an explicit formula on the deterministic shift of the spectral edge. We also prove that the fluctuation of the extremal eigenvalues is given by the GOE Tracy-Widom law after rescaling and centering the entries of sparse stochastic block models. Applying the result to sparse stochastic block models, we rigorously prove that there is a large gap between the outliers and the spectral edge without centering.

Q&A for Invited Session 25

0
This talk does not have an abstract.

Session Chair

Christina Goldschmidt (University of Oxford)

Invited 36

Problems and Approaches in Multi-Armed Bandits (Organizer: Vianney Perchet)

Conference
10:30 PM — 11:00 PM KST
Local
Jul 21 Wed, 6:30 AM — 7:00 AM PDT

Dynamic pricing and learning under the Bass model

Shipra Agrawal (Columbia University)

1
We consider a novel formulation of the dynamic pricing and demand learning problem, where the evolution of demand in response to posted prices is governed by a stochastic variant of the popular Bass model with parameters (α, β) that are linked to the so-called "innovation" and "imitation" effects. Unlike the more commonly used i.i.d. demand models, in this model the price posted not only affects the demand and the revenue in the current round but also the evolution of demand, and hence the fraction of market potential that can be captured, in future rounds. Finding a revenue-maximizing dynamic pricing policy in this model is non-trivial even when model parameters are known, and requires solving for the optimal non-stationary policy of a continuous-time, continuous-state MDP. In this paper, we consider the problem of dynamic pricing is used in conjunction with learning the model parameters, with the objective of optimizing the cumulative revenues over a given selling horizon. Our main contribution is an algorithm with a regret guarantee of O (m^2/3), where m is mnemonic for the (known) market size. Moreover, we show that no algorithm can incur smaller order of loss by deriving a matching lower bound. We observe that in this problem the market size m, and not the time horizon T, is the fundamental driver of the complexity; our lower bound in fact indicates that for any fixed α,β, most non-trivial instances of the problem have constant T and large m. This insight sets the problem setting considered here uniquely apart from the MAB type formulations typically considered in the learning to price literature. Keywords: Dynamic Pricing, Multi-armed bandits, Bass model

TensorPlan: A new, flexible, scalable and provably efficient local planner for huge MDPs

Csaba Szepesvari (Deepmind & University of Alberta)

1
In this talk I will consider provably efficient planning in huge MDPs when the planner is helped with a hint about the form of the optimal value function. In particular, a thoughtful oracle provides the planner with basis functions the linear combination of which give the optimal value function either exactly, or with small errors. The problem is to design a local planner, which, similarly to model-predictive control, is called to find a good action after every state transition, while it is given access to a simulator. We propose a new planner which when used continuously is guaranteed to induce a near-optimal policy. When the number of action is kept as a constant, the planner is shown to require only polynomially many simulator queries as a function of the horizon and the number of basis functions. The planner does not use dynamic programming as we know it, but is based on optimism and the "tensorization" of the Bellman optimality equation.

On the importance of (linear) structure in contextual multi-armed bandit

Alessandro Lazaric (Facebook AI Research)

1
In this talk I will discuss how structural assumptions on the reward function impacts the regret performance of bandit algorithms. Notably, I will focus on linear contextual bandits and first review recent results showing how the structure of the arm set and reward function can be leveraged to achieve improved regret guarantees. Then, I will describe a novel incremental algorithm able to achieve asymptotic optimality, while ensuring finite-time worst-case optimality in the context-free case. Finally, I will discuss how stronger assumptions on context distribution and linear representation may be leveraged to achieve constant regret. This eventually leads to a representation-selection algorithm matching the regret of the best linear representation in a given set, up to a logarithmic factor in the number of representations.

Most relevant references:

T. Lattimore, Cs. Szepesvari. "The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits", 2016.

B. Hao, T. Lattimore, Cs. Szepesvari, "Adaptive Exploration in Linear Contextual Bandit", 2019.

A. Tirinzoni, M. Pirotta, M. Restelli, A. Lazaric, "An Asymptotically Optimal Primal-Dual Incremental
Algorithm for Contextual Linear Bandits", 2020.

M. Papini, A. Tirinzoni, M. Restelli, A. Lazaric, M. Pirotta, "Leveraging Good Representations in Linear Contextual Bandits", 2021.

Q&A for Invited Session 36

0
This talk does not have an abstract.

Session Chair

Vianney Perchet (École nationale de la statistique et de l'administration économique Paris)

Organized 29

Sequential Analysis and Applications (Organizer: Alexander Tartakovsky)

Conference
10:30 PM — 11:00 PM KST
Local
Jul 21 Wed, 6:30 AM — 7:00 AM PDT

Asymptotically optimal control of FDR and related metrics for sequential multiple testing

Jay Bartroff (University of Southern California)

3
I will discuss asymptotically optimal multiple testing procedures for sequential data in the context of prior information on the number of false null hypotheses, for controlling FDR/FNR, pFDR/pFNR, and other metrics. These procedures are closely related to those proposed and shown by Song & Fellouris (2017) to be asymptotically optimal for controlling type 1 and 2 familywise error rates (FWEs). We show that by appropriately adjusting the critical values of the Song-Fellouris procedures, they can be made asymptotically optimal for controlling any multiple testing error metric that is bounded between multiples of FWE in a certain sense. In addition to FDR/FNR and pFDR/pFNR this includes other metrics like the per-comparison and per-family error rates, and the false positive rate. Our setup includes asymptotic regimes in which the number of null hypotheses approaches infinity.

Nearly optimal sequential detection of signals in correlated Gaussian noise

Grigory Sokolov (Xavier University)

3
Detecting an object in AR(p) noise assuming the intensity of the signal is not specified is a problem of interest to many practitioners.
To this end we examine three procedures: (i) an adaptive version of the sequential probability ratio test (SPRT) built upon one-stage delayed estimators of the unknown signal intensity; (ii) the generalized SPRT; and (iii) the non-adaptive double SPRT (2-SPRT). The generalized SPRT has certain drawbacks in selecting thresholds to guarantee the upper bounds on error probabilities, but may appear to be slightly more efficient than the adaptive SPRT.
However, simulations show that the loss in performance of the adaptive SPRT compared to the generalized SPRT is very minor, so—coupled with the error probability guarantee—the adaptive SPRT can be recommended for practical applications.
And although the non-adaptive 2-SPRT is not asymptotically optimal for all signal strength values, it does offer benefits at the worst point in the indifference zone.

Acknowledgement: The work of Alexander Tartakovsky was supported in part by the Russian Science Foundation Grant 18-19-00452 at the Moscow Institute of Physics and Technology.

A unified approach for solving sequential selection problems

Yaakov Malinovsky (University of Maryland)

2
In this work we develop a unified approach for solving a wide class of sequential selection problems. This class includes, but is not limited to, selection problems with no-information, rank-dependent rewards, and considers both fixed as well as random problem horizons. We demonstrate that our approach allows exact and efficient computation of optimal policies and various performance metrics thereof for a variety of sequential selection problems, several of which have not been solved to date.

Sequential change detection by optimal weighted l2 divergence

Yao Xie (Georgia Institute of Technology)

2
We present a new non-parametric statistic, called the weighed l2 divergence, based on empirical distributions for sequential change detection. We start by constructing the weighed l2 divergence as a fundamental building block for two-sample tests and change detection. The proposed statistic is proved to attain the optimal sample complexity in the offline setting. We then study the sequential change detection using the weighed l2 divergence and characterize the fundamental performance metrics, including the average run length (ARL) and the expected detection delay (EDD). We also present practical algorithms to find the optimal projection to handle high-dimensional data and the optimal weights, which is critical to quick detection since, in such settings, there are not many post-change samples. Simulation results and real data examples are provided to validate the good performance of the proposed method.

Detection of temporary disorders

Michael Baron (American University)

2
Change-point detection methods are proposed for the case of temporary failures, or transient changes, when an unexpected disorder is ultimately followed by an adjustment and return to the initial state. A known base distribution of the in-control state changes to different unknown distributions for unknown periods of time. Sequential and retrospective methods are proposed for the detection and estimation of each pair of change-points. Examples of similar problems are shown in quality and process control, energy finance, and statistical genetics, although the meaning of disorder and adjustment change-points is quite different in these applications.

Q&A for Organized Contributed Session 29

1
This talk does not have an abstract.

Session Chair

Alexander Tartakovsky (Moscow Institute of Physics and Technology )

Contributed 03

Numerical Study of Stochastic Processes / Stochastic Interacting Systems

Conference
10:30 PM — 11:00 PM KST
Local
Jul 21 Wed, 6:30 AM — 7:00 AM PDT

Splitting methods for SDEs with locally Lipschitz drift. An illustration on the FitzHugh-Nagumo model

Massimiliano Tamborrino (University of Warwick)

3
In this talk, we construct and analyse explicit numerical splitting methods for a class of semilinear stochastic differential equations (SDEs) with additive noise, where the drift is allowed to grow polynomially and satis?es a global one-sided Lipschitz condition. The methods are proved to be mean-square convergent of order 1 and to preserve important structural properties of the SDE. In particular, first, they are hypoelliptic in every iteration step. Second, they are geometrically ergodic and have asymptotically bounded second moments. Third, they preserve oscillatory dynamics, such as amplitudes, frequencies and phases of oscillations, even for large time steps. Our results are illustrated on the stochastic FitzHugh-Nagumo model (a well-known neuronal model describing the generation of spikes of single neurons at the intracellular level) and compared with known mean-square convergent tamed/truncated variants of the Euler-Maruyama method. The capability of the proposed splitting methods to preserve the aforementioned properties makes them applicable within different statistical inference procedures. In contrast, known Euler-Maruyama type methods commonly fail in preserving such properties, yielding ill-conditioned likelihood-based estimation tools or computationally infeasible simulation-based inference algorithms.

Simulation methods for trawl processes

Dan Leonte (Imperial College London)

3
Trawl processes are continuous-time, stationary and infinitely divisible processes which can describe a wide range of possible serial correlation patterns in data. This talk introduces a new algorithm for the efficient simulation of monotonic trawl processes. The algorithm accommodates any monotonic trawl shape and any infinitely divisible distribution described via the Lévy seed, requiring only access to samples from the distribution of the Lévy seed. Further, the computational complexity does not scale with the number of spatial dimensions of the trawl. We describe how the above method can be generalized to a simulation scheme for monotonic ambit fields via Monte Carlo methods.

Stochastic optimal control of SDEs and importance sampling

Han Cheng Lie (University of Potsdam)

2
In applications that involve rare events, a common problem is to estimate the statistics of a functional with respect to a reference measure, where the reference measure is the law of the solution to a specific SDE. The presence of rare events motivates the approach of importance sampling by the change of drift technique. This leads to a stochastic optimal control problem, where the objective consists in the sum of the expectation of the functional of interest and a regularisation term that is proportional to the relative entropy or Kullback-Leibler divergence between the reference measure and the importance sampling measure. We analyse a class of gradient-based numerical methods for solving these stochastic optimal control problems, by computing derivatives of the individual terms in the objective, and by using this derivative information to analyse the convexity properties of the terms in the objective.

Opinion dynamics with Lotka-Volterra type interactions

Michele Aleandri (Libera Università Internazionale degli Studi Sociali)

1
We investigate a class of models for opinion dynamics in a population with two interacting families of individuals. Each family has an intrinsic mean field “Voter-like” dynamics which is influenced by interaction with the other family. The interaction terms describe a cooperative/conformist or competitive/nonconformist attitude of one family with respect to the other. We prove chaos propagation, ie, we show that on any time interval [0,T] , as the size of the system goes to infinity, each individual behaves independently of the others with transition rates driven by a macroscopic equation. We focus in particular on models with Lotka-Volterra type interactions, ie, models with cooperative vs. competitive families. For these models, although the microscopic system is driven as to consensus within each family, a periodic behaviour arises in the macroscopic scale. In order to describe fluctuations between the limiting periodic orbits, we identify a slow variable in the microscopic system and, through an averaging principle, we find a diffusion which describes the macroscopic dynamics of such variable on a larger time scale.

Q&A for Contributed Session 03

0
This talk does not have an abstract.

Session Chair

Kyung-Youn Kim (National Chengchi University)

Contributed 08

Study of Various Distributions

Conference
10:30 PM — 11:00 PM KST
Local
Jul 21 Wed, 6:30 AM — 7:00 AM PDT

Orlicz norm and concentration inequalities for beta-heavy tailed distributions

Emmanuel Gobet (Ecole Polytechnique)

2
Understanding how sample statistical fluctuations impact prediction errors is crucial in learning algorithms. This is typically made by quantifying the probability that a sum of random variables deviates from its expectation by a certain threshold. The case of sub-Gaussian, or the sub-exponential random variables as well as the case of alpha-exponential tails have been largely covered by the literature (for example, via Bennett inequality and via Bernstein inequality...). In this work we focus on situations where the distributions have long tail (like log-normal or log-gamma distributions). In this setting, we establish a new Talagrand-type inequality about the Orlicz norm of the sum of independent random variables of this type, and some maximal inequality. The concentration inequalities then follow.

The Dickman-Goncharov distribution

Vladimir Panov (National Research University Higher School of Economics)

3
In the 1930s and 40s, one and the same delay differential equation appeared in papers by two mathematicians, Karl Dickman and Vasily Goncharov, who dealt with completely different problems. Dickman investigated the limit value of the number of natural numbers free of large prime factors, while Goncharov examined the asymptotics of the maximum cycle length in decompositions of random permutations. The equation obtained in these papers defines, under a certain initial condition, the density of a probability distribution now called the Dickman-Goncharov distribution (this term was first proposed by A.Vershik in 1986). Recently, a number of completely new applications of the Dickman-Goncharov distribution have appeared in mathematics (random walks on solvable groups, random graph theory, and so on) and also in biology (models of growth and evolution of unicellular populations), finance (theory of extreme phenomena in finance and insurance), physics (the model of random energy levels), and other fields. Despite the extensive scope of applications of this distribution and of more general but related models, all the mathematical aspects of this topic (for example, infinite divisibility and absolute continuity) are little known even to specialists in limit theorems. My talk is mainly based on our survey [Molchanov S., Panov V. The Dickman-Goncharov distribution. Russian Mathematical Surveys. 2020. Vol. 75. No. 6. P. 1089-1132], which is intended to fill this gap. I'm going also to discuss several new results for the generalised Dickman-Goncharov distribution, which in the discrete case are closely related to the solution of the well-known Erdos problem for Bernoulli convolutions.

Continuous scaled phase-type distributions

Jorge Yslas (University of Bern)

2
In this talk, we study random variables characterized as the product of phase-type distributions and continuous random variables. Under this construction, one can obtain closed-form formulas for the different functionals of the resulting models. We provide new results regarding the tail behavior of these distributions and show how an EM algorithm can be employed for maximum-likelihood estimation. Finally, we present several numerical examples with real insurance data sets.

Q&A for Contributed Session 08

0
This talk does not have an abstract.

Session Chair

Gunwoong Park (University of Seoul)

Contributed 12

Optimal Transport

Conference
10:30 PM — 11:00 PM KST
Local
Jul 21 Wed, 6:30 AM — 7:00 AM PDT

Stochastic-uniform-approximations of Wasserstein barycenters

Florian Heinemann (Georg-August-University Göttingen)

3
Recently, optimal transport and more specifically the Wasserstein distance, have achieved renewed interested as they have been recognized as attractive tools in data analysis. Consequently, this also lead to an increasing interest in Fr_chet means, or barycenters, with respect to that distance. These, so called, Wasserstein barycenters offer favorable geometric properties which lend itself well to many applications. However, even more than usual optimal transport, the barycenter problem suffers from a significant computational cost. To alleviate this issue, we propose a hybrid resampling method to approximate finitely supported Wasserstein barycenters on large-scale datasets, which can be combined with any exact solver. Nonasymptotic bounds on the expected error of the objective value as well as the barycenters themselves allow to calibrate computational cost and statistical accuracy. The rate of these upper bounds is shown to be optimal and independent of the underlying dimension, which appears only in the constants. Using a simple modification of the subgradient descent algorithm of Cuturi and Doucet, we showcase the applicability of our method on a myriad of simulated datasets, as well as a real-data example which are out of reach for state of the art algorithms for computing Wasserstein barycenters.
This is joint work with Axel Munk and Yoav Zemel.

Measuring dependence between random vectors via optimal transport

Johan Segers (Université catholique de Louvain)

1
To quantify the dependence between two random vectors of possibly different dimensions, we propose to rely on the properties of the 2-Wasserstein distance. We first propose two coefficients that are based on the Wasserstein distance between the actual distribution and a reference distribution with independent components. The coefficients are normalized to take values between 0 and 1, where 1 represents the maximal amount of dependence possible given the two multivariate margins. We then make a quasi-Gaussian assumption that yields two additional coefficients rooted in the same ideas as the first two. These different coefficients are more amenable for distributional results and admit attractive formulas in terms of the joint covariance or correlation matrix. Furthermore, maximal dependence is proved to occur at the covariance matrix with minimal von Neumann entropy given the covariance matrices of the two multivariate margins. This result also helps us revisit the RV coefficient by proposing a sharper normalisation. The two coefficients based on the quasi-Gaussian approach can be estimated easily via the empirical covariance matrix. The estimators are asymptotically normal and their asymptotic variances are explicit functions of the covariance matrix, which can thus be estimated consistently too. The results extend to the Gaussian copula case, in which case the estimators are rank-based. The results are illustrated through theoretical examples, Monte Carlo simulations, and a case study involving electroencephalography data.

Transportation duality and reverse functional inequalities for Markov kernels

Nathaniel Eldredge (University of Northern Colorado)

1
Functional inequalities for a Markov semigroup $P_t$, which may express its "smoothing" properties, can also be studied in terms of the dual action of $P_t$ on the space of probability measures. These can give rise to "contraction" inequalities in terms of various distances between probability measures, such as the Wasserstein or Hellinger distances. I will discuss results for the reverse Poincar_ and reverse log Sobolev inequalities, which turn out to have dual formulations to which they are actually equivalent. Applications to Markov processes include rates of convergence to equilibrium, smoothness of transition densities, and quasi-invariance properties.

Q&A for Contributed Session 12

0
This talk does not have an abstract.

Session Chair

Yeonwoo Rho (Michigan Technology University)

Contributed 27

Machine Learning / Structural Equation

Conference
10:30 PM — 11:00 PM KST
Local
Jul 21 Wed, 6:30 AM — 7:00 AM PDT

Replicability of statistical findings under distributional shift

Suyash Gupta (Stanford University)

5
Common statistical measures of uncertainty like p-values and confidence intervals quantify the uncertainty due to sampling, i.e. the uncertainty due to not observing the full population. In practice, populations change between locations and across time. This makes it difficult to gather knowledge that replicates across data sets. We propose a measure of uncertainty that quantifies the distributional uncertainty of a statistical estimand, that is, the sensitivity of the parameter under general distributional perturbations within a Kullback-Liebler divergence ball. We also propose measure to estimate the stability of estimators with respect to directional or variable-specific shifts. The proposed measures would help judge whether a statistical finding is replicable across data sets in the presence of distributional shifts. Further, we introduce a transfer learning technique that allows estimating statistical parameters under shifted distributions if only summary statistics about the new distribution are available. We evaluate the performance of the proposed measure in experiments and show that it can elucidate the replicability of statistical findings with respect to distributional shifts and give more accurate estimates of parameters under shifted distribution.

Selection of graphical continuous Lyapunov models with Lasso

Philipp Dettling (Technical University of Munich)

3
In some applications, multivariate data may be thought of as cross-sectional observations of temporal processes. The recently proposed graphical continuous Lyapunov models take this perspective in the context of a multi-dimensional Ornstein-Uhlenbeck process in equilibrium. Under a stability assumption, the equilibrium covariance matrix is determined by the continuous Lyapunov equation. Given a sample covariance matrix, a very natural approach to model selection is to obtain sparse solutions to the Lyapunov equation by means of $\ell_1$-regularization. We apply the primal-dual witness technique to give probabilistic guarantees for successful support recovery in this approach. The key assumption in this guarantee is an irrepresentability condition. As we demonstrate, the irrepresentability condition may be violated in subtle ways, particularly, for models with feedback loops.

Identifiability of linear structural equation models with homoscedastic errors using algebraic matroids

Jun Wu (Technical University of Munich)

3
We consider structural equation models (SEMs), in which every variable is a function of a subset of the other variables and a stochastic error. Each such SEM is naturally associated with a directed graph describing the relationships between variables. For the case of homoscedastic errors, recent work has proposed methods for inferring the graph from observational data under the assumption that the graph is acyclic (i.e., the SEM is recursive). In this work we study the setting of homoscedastic errors but allow the graph to be cyclic (i.e., the SEM to be non-recursive). Using an algebraic approach that compares matroids derived from the parameterizations of the models, we derive sufficient conditions for two simple directed graphs generating different distributions generically. Based on these conditions, we exhibit subclasses of graphs that allow for directed cycles, yet are generically identifiable. Our study is supplemented by computational experiments that provide a full classification of models given by simple graphs with up to 6 nodes.

Convergence of stochastic gradient descent for Lojasiewicz-landscapes

Sebastian Kassing (Westfälische Wilhelms-Universität Münster)

4
In this talk we discuss almost sure convergence of Stochastic Gradient Descent (SGD) $(X_n)_{n \in \N}$ and Stochastic Gradient Flow (SGF) (X_t)_{t \ge 0} for a given target function $F$. First, we give a simple proof for almost sure convergence of the target value $(F(X_n))$ (resp. F(X_t)) assuming that $F$ admits a locally H_lder-continuous gradient $f=DF$. This results entails convergence of the iterates $(X_n)$ (resp. $(X_t)$) in the case where $F$ does not posses a continuum of critical points. In a general non-convex setting with $F$ possibly containing a rich set of critical points, convergence of the process itself is sometimes taken for granted, but actually is a non-trivial issue as there are solutions to the gradient flow ODE for $C^\infty$ loss functions that stay in a compact set but do not converge. Using the Lojasiewicz-inequality we derive bounds on the step-sizes and the size of the perturbation in order to guarantee convergence of $(X_n)$ (resp. $(X_t)$) for analytic target functions. Also, we derive the convergence rate under the assumptions that the loss function satisfies a particular Lojasiewicz-inequality. Last, we compare the results for SGD and SGF and discuss optimality of the assumptions.

Q&A for Contributed Session 27

0
This talk does not have an abstract.

Session Chair

Yoonsuh Jung (Korea University)

Poster II-2

Poster Session II-2

Conference
10:30 PM — 11:00 PM KST
Local
Jul 21 Wed, 6:30 AM — 7:00 AM PDT

Busemann process and semi-infinite geodesics in Brownian last-passage percolation

Evan Sorensen (University of Wisconsin-Madison)

1
We prove the existence of semi-infinite geodesics for Brownian last-passage percolation (BLPP). Specifically, on a single event of probability one, there exist semi-infinite geodesics, started from every space-time point and traveling in every asymptotic direction. Properties of these geodesics include uniqueness for a fixed initial point and direction, non-uniqueness for fixed direction but random initial points, and coalescence of all geodesics traveling in a common, fixed direction. The semi-infinite geodesics are constructed from Busemann functions, whose existence was proved for fixed initial points and directions by Alberts, Rassoul-Agha, and Simper. We extend their result to a global process of Busemann functions and derive the joint distribution of Busemann functions for varying directions. From this joint distribution, we prove results about the geometry of the semi-infinite geodesics. More specifically, there exists a Hausdorff dimension 1/2 set of initial points, and to each point an associated direction, such that there are two semi-infinite geodesics in that direction whose only shared point is the initial point. Joint work with Timo Sepp_l_inen.

Application of kernel mean embeddings to functional data

George Wynne (Imperial College London)

2
Kernel mean embeddings (KMEs) have enjoyed wide success in statistical machine learning over the past fifteen years. They offer a non-parametric method of reasoning with probability measures by mapping measures into a reproducing kernel Hilbert space. Much of the existing theory and practice has revolved around Euclidean data whereas functional data has received very little investigation. Likewise, in functional data analysis (FDA) the technique of KMEs has not been explored. This work proposes to bridge this gap in theory and practice. KMEs offer an alternative paradigm than the common practice in FDA of projecting data to finite dimensions. The KME framework can handle infinite dimensional input spaces, offers an elegant theory and leverages the spectral structure of functional data. Empirically, KMEs provide competitive performance against existing functional two-sample and goodness-of-fit tests. Finally, we discuss connections to empirical characteristic function based testing and functional depth techniques currently used in FDA.

SIR-based examination of the policy effects on the COVID-19 spread in U.S.

David Han (The University of Texas at San Antonio)

1
Since the global outbreak of the novel COVID-19, many research groups have studied the epidemiology of the virus for short-term forecasts and to formulate the effective disease containment and mitigation strategies. The major challenge lies in the proper assessment of epidemiological parameters over time and of how they are modulated by the effect of any publicly announced interventions. Here we attempt to examine and quantify the effects of various (legal) policies/orders in place to mandate social distancing and to flatten the curve in each of the U.S. states. Through Bayesian inference on the stochastic SIR models of the virus spread, the effectiveness of each policy on reducing the magnitude of the growth rate of new infections is investigated statistically. This will inform the public and policymakers, and help them understand the most effective actions to flght against the current and future pandemics. It will aid the policy-makers to respond more rapidly (select, tighten, and/or loosen appropriate measures) to stop/mitigate the pandemic early on.

Cross-validation confidence intervals for test error

Alexandre Bayle (Harvard University)

1
This work develops central limit theorems for cross-validation and consistent estimators of its asymptotic variance under weak stability conditions on the learning algorithm. Together, these results provide practical, asymptotically-exact confidence intervals for k-fold test error and valid, powerful hypothesis tests of whether one learning algorithm has smaller k-fold test error than another. These results are also the first of their kind for the popular choice of leave-one-out cross-validation. In our real-data experiments with diverse learning algorithms, the resulting intervals and tests outperform the most popular alternative methods from the literature.

Comparison of quantile regression curves under different settings with censored data

Lorenzo Tedesco (Katholieke Universiteit Leuven)

2
The poster presents a new nonparametric test for conditional quantile curves equality when the outcome of interest, typically a duration, is subjected to right censoring. The test is based on a quantile regression estimation models and do not rely on distributional assumptions. Moreover, the proposed method holds for both dependent and independent samples. Consistency of the test and asymptotic results are also provided together with a bootstrap procedure which is intended to avoid density estimations in case of small sample sizes. The poster also includes a comparison with other methods and examples of application for both dependent and independent setting.

Made with in Toronto · Privacy Policy · © 2021 Duetone Corp.