Lei Wu (吴磊)

Assistant Professor
School of Mathematical Sciences
Center for Machine Learning Research
Peking University

Office: 静园6院 205
Email: leiwu (at) math (dot) pku (dot) edu (dot) cn

About Me

I am currently an Assistant Professor in the School of Mathematical Sciences and Center for Machine Learning Research at Peking University.

Previously, I was a postdoc in PACM at Princeton University and in the Wharton Statistics and Data Science Department at the University of Pennsylvania. I completed my Ph.D. in computational mathematics at Peking University in 2018, advised by Prof. Weinan E. I received my B.S. degree in mathematics from Nankai University in 2012.

My research aims to understand the mechanisms behind the success of deep learning, with a particular focus on:

The approximation and representation power of neural networks

The dynamical behavior of popular optimization algorithms such as SGD and Adam

Emergent phenomena in the training of large language models (LLMs)

Recruiting

We are actively seeking self-motivated postdocs, PhD students and interns to join our team. If you are interested in collaborating with me, please send an email to me with your CV and transcript as well as a brief description of your research interests.

Selected Works

Understanding and improving LLM pre-training
- Optimizer design:
  - AdmIRE & Blockwise LR: Speed up convergence by amplifying the dynamics along flat directions.
  - GradPower: Make gradients more informative for faster convergence, requiring only a single-line code change.

A stability theory of implicit regularization
- SGD prefers flat minima via stability: Introduced the stability-based view of flatness bias: NeurIPS 2018
- The anisotropic SGD noise is crucial for sharpness control.: NeurIPS 2022
- Flat minima provably generalize well for two-layer ReLU and diagonal linear nets: ICML 2023
- Stability-inspired algorithms for seeking flatter minima: NeurIPS 2024
- Discovery of the edge of stability (EoS) phenomenon: NeurIPS 2018 (Table 2)

Approximation theory for machine learning
- Approximation is the dual of estimation, and vice versa: AoS 2025
- Kernel and random features: arXiv 2024
- Barron space theory for neural networks: CMS 2019, Contr. Appr. 2021
  - A spectral analysis: JMLR 2022
  - Embedding theorems: JML 2023
- Deep CNNs: NeurIPS 2023