WC 2021

Program at a Glance

10th World Congress in Probability and Statistics

Plenary Thu-1: IMS Medallion Lecture (Daniela Witten) Plenary Thu-2: IMS Medallion Lecture (Andrea Montanari) Plenary Thu-3: Blackwell Lecture (Gabor Lugosi) Plenary Thu-4: Tukey Lecture (Sara van de Geer)

Invited 38: IMS Lawrence D. Brown Ph.D. Student Award Session (Organizer: Institute of Mathematical Statistics)

Invited 09: Quantum Statistics (Organizer: Cristina Butucea) Invited 22: Random Trees (Organizer: Anita Winter) Invited 29: High Dimensional Data Inference (Organizer: Florentina Bunea) Invited 34: Random Walks on Random Media (Organizer: Alexander Drewitz) Invited 37: Bernoulli Society New Researcher Award Session (Organizer: Bernoulli Society)

Invited 11: Analysis of Dependent Data (Organizer: Chae Young Lim) Invited 19: Randomized Algorithms (Organizer: Devdatt Dubhashi) Invited 23: Stochastic Partial Differential Equations (Organizer: Leonid Mytnik) Invited 26: Pathwise Stochastic Analysis (Organizer: Hendrik Weber)

Organized 11: Random Growth, Spatial Processes and Related Models (Organizer: Erik Bates) Organized 19: Recent Advances in Complex Data Analysis (Organizer: Seung Jun Shin) Organized 25: Recent Advances in Biostatistics (Organizer: Sangwook Kang) Organized 31: BOK Contributed Session: Finance and Contemporary Issues (Organizer: BOK Economic Statistics Department)

Organized 08: Rough Path Theory (Organizer: Ilya Chevyrev) Organized 21: Recent Advances in Statistics (Organizer: Yunjin Choi)

Organized 10: Random Conformal Geometry and Related Fields (Organizer: Nam-Gyu Kang) Organized 30: Stochastic Adaptive Optimization Algorithms and their Applications to Neural Networks (Organizer: Miklos Rasonyi & Sotirios Sabanis)

Contributed 09: Topics Related to RMT Contributed 11: Topics Related to KPZ Universality Contributed 21: Dimension Reduction and Model Selection Contributed 35: Financial Data Analysis

Contributed 04: Stochastic Processes and Related Topics Contributed 17: Various Limit Theorems Contributed 32: Statistical Modeling and Prediction

Poster III-1: Poster Session III-1

Plenary Lectures

Plenary Thu-1

IMS Medallion Lecture (Daniela Witten)

Conference

9:00 AM — 10:00 AM KST

Local

Jul 21 Wed, 5:00 PM — 6:00 PM PDT

Selective inference for trees

Daniela Witten (University of Washington)

As datasets grow in size, the focus of data collection has increasingly shifted away from testing pre-specified hypotheses, and towards hypothesis generation. Researchers are often interested in performing an exploratory data analysis to generate hypotheses, and then testing those hypotheses on the same data. Unfortunately, this type of 'double dipping' can lead to highly-inflated Type 1 errors. In this talk, I will consider double-dipping on trees. First, I will focus on trees generated via hierarchical clustering, and will consider testing the null hypothesis of equality of cluster means. I will propose a test for a difference in means between estimated clusters that accounts for the cluster estimation process, using a selective inference framework. Second, I'll consider trees generated using the CART procedure, and will again use selective inference to conduct inference on the means of the terminal nodes. Applications include single-cell RNA-sequencing data and the Box Lunch Study. This is collaborative work with Lucy Gao (U. Waterloo), Anna Neufeld (U. Washington), and Jacob Bien (USC).

Session Chair

Ja-Yong Koo (Korea University)

Plenary Thu-2

IMS Medallion Lecture (Andrea Montanari)

Conference

10:00 AM — 11:00 AM KST

Local

Jul 21 Wed, 6:00 PM — 7:00 PM PDT

High-dimensional interpolators: From linear regression to neural tangent models

Andrea Montanari (Stanford University)

Modern machine learning methods —most noticeably multi-layer neural networks — require to fit highly non-linear models comprising tens of thousands to millions of parameters. However, little attention is paid to the regularization mechanism to control model's complexity and the resulting models are often so complex as to achieve vanishing training error. Despite this, these models generalize well to unseen data : they have small test error. I will discuss several examples of this phenomenon, leading to two-layers neural networks in the so-called lazy regime. For these examples precise asymptotics could be determined mathematically, using tools from random matrix theory, and a unifying picture is emerging. A common feature is the fact that a complex unregularized nonlinear model becomes essentially equivalent to a simpler model, which is however regularized in a non-trivial way.

[Based on joint papers with: Michael Celentano, Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Feng Ruan, Youngtak Sohn, Jun Yan, Yiqiao Zhong]

Session Chair

Myunghee Cho Paik (Seoul National University)

Plenary Thu-3

Blackwell Lecture (Gabor Lugosi)

Conference

7:00 PM — 8:00 PM KST

Local

Jul 22 Thu, 3:00 AM — 4:00 AM PDT

Estimating the mean of a random vector

Gabor Lugosi (ICREA & Pompeu Fabra Universit)

One of the most basic problems in statistics is the estimation of the mean of a random vector, based on independent observations. This problem has received renewed attention in the last few years, both from statistical and computational points of view. In this talk we review some recent results on the statistical performance of mean estimators that allow heavy tails and adversarial contamination in the data. In particular, we are interested in estimators that have a near-optimal error in all directions in which the variance of the one dimensional marginal of the random vector is not too small. The material of this talk is based on a series of joint papers with Shahar Mendelson.

Session Chair

Byeong Uk Park (Seoul National University)

Plenary Thu-4

Tukey Lecture (Sara van de Geer)

Conference

8:00 PM — 9:00 PM KST

Local

Jul 22 Thu, 4:00 AM — 5:00 AM PDT

Max-margin classification and other interpolation methods

Sara van de Geer (Swiss Federal Institute of Technology Zürich)

John Tukey writes that detective work is an essential part of statistical analysis (Tukey [1969]). In this talk we discuss methods that do the opposite of detective work: data interpolation. This was often considered forbidden, but then again, statistical paradigms are not to be sanctified. We consider basis pursuit and one-bit compressed sensing. We re-establish the $\ell_{2}$-rates of convergence for noisy basis pursuit of Wojtaszczyk [2010]. For one-bit compressed sensing we study the algorithm of Plan and Vershynin [2013] and re-derive $\ell_{2}$-rates as well. The techniques used also allow deriving novel results for the max-margin classifier - related to the ada-boost algorithm - as given in Liang and Sur [2020].

This is joint work with Geoffrey Chinot, Felix Kuchelmeister and Matthias Löffler.

References
T. Liang and P. Sur. A precise high-dimensional asymptotic theory for boosting and minimum-$\ell_1$-norm interpolated classifiers, 2020. arXiv:2002.01586.
Y. Plan and R. Vershynin. One-bit compressed sensing by linear programming. Communications on Pure and Applied Mathematics, 66(8):1275–1297, 2013.
J. Tukey. Analyzing data: Sanctification or detective work? American Psychologist, 24:83–91, 1969.
P. Wojtaszczyk. Stability and instance optimality for gaussian measurements in compressed sensing. Foundations of Computational Mathematics, 10(1): 1–13, 2010.

Session Chair

Adam Jakubowski (Nicolaus Copernicus University)

Program at a Glance

10th World Congress in Probability and Statistics

Plenary Lectures

IMS Medallion Lecture (Daniela Witten)

Selective inference for trees

Session Chair

IMS Medallion Lecture (Andrea Montanari)

High-dimensional interpolators: From linear regression to neural tangent models

Session Chair

Blackwell Lecture (Gabor Lugosi)

Estimating the mean of a random vector

Session Chair

Tukey Lecture (Sara van de Geer)

Max-margin classification and other interpolation methods

Session Chair