Lei Wu (吴磊)

Assistant Professor
School of Mathematical Sciences
Center for Machine Learning Research
Peking University

Office: 静园6院 205
Email: leiwu (at) math (dot) pku (dot) edu (dot) cn

About Me

I am an Assistant Professor in the School of Mathematical Sciences and Center for Machine Learning Research at Peking University.

Previously, I was a postdoctoral researcher at PACM, Princeton University and at the Wharton School, University of Pennsylvania. I received my Ph.D. in Computational Mathematics from Peking University in 2018, advised by Prof. Weinan E, and my B.S. in Mathematics from Nankai University in 2012.

Research Vision

My research studies the interplay between representation, optimization, and generalization in modern machine learning through the lens of scaling.

Representation (function spaces): Characterize neural networks via their induced function spaces, including expressivity, approximation, and inductive bias.

Optimization (stochastic dynamics): Analyze training algorithms as high-dimensional stochastic dynamical systems, focusing on stability, implicit bias, and convergence.

Generalization (statistical behavior): Quantify how generalization depends on representation and optimization, particularly in large-scale regimes.

Along these axes, I develop predictive, mathematically grounded theories aimed at explaining and guiding real-world training across models ranging from kernel methods and neural networks to modern large-scale systems—particularly LLM pre-training.

Research Highlights

(See also: Full publication list)

Scaling laws:

Optimization:

Representation:

Applications (theory to practice):
- Accelerating transformer training via sharpness disparity (ICML 2025)

Recruiting

We are actively seeking self-motivated postdocs, Ph.D. students, and undergraduate interns to join my group. If you are interested, please email me your CV, transcript, and a brief description of your background and research interests.

Recent News

2026-02: Constant-depth network with smooth activations released on arXiv.
- Establishes that smooth activations (e.g., GELU, SiLU) enable smoothness adaptivity in constant-depth neural networks, achieving optimal approximation and statistical rates.

2026-02: Optimal learning-rate schedules under FSL released on arXiv.
- Proves that the optimal learning-rate schedule depends on task difficulty; in the hard-task regime, the optimal schedule exhibits a warmup–stable–decay (WSD) structure.

2026-02: Fast catch-up, late switching accepted to ICLR 2026.
- Studies batch-size scheduling under FSL, revealing a fast catch-up effect, which holds across linear regression and LLM pre-training.

2025-09: Functional Scaling Laws accepted to NeurIPS 2025 (Spotlight).
- Introduces a functional scaling law (FSL) framework that—in contrast to classical scaling laws, which only describe final-step behavior—characterizes the entire loss trajectory, spanning from linear regression to LLM pre-training.