Lei Wu (吴磊)

Lei Wu 


Assistant Professor
School of Mathematical Sciences
Center for Machine Learning Research
Peking University

Office: 静园6院 205
Email: leiwu (at) math (dot) pku (dot) edu (dot) cn

Google Scholar     Github      CV

About Me

I am an Assistant Professor in the School of Mathematical Sciences and Center for Machine Learning Research at Peking University.

Previously, I was a postdoctoral researcher at PACM, Princeton University, and at the Wharton School, University of Pennsylvania. I received my Ph.D. in Computational Mathematics from Peking University in 2018, advised by Prof. Weinan E, and my B.S. in Mathematics from Nankai University in 2012.

Research Vision

My research develops a mathematical understanding of modern machine learning, focusing on how representation, optimization, and scale jointly determine performance:

This perspective is organized around three themes:

  • Approximation and representation: what functions can neural networks efficiently represent?

  • Learning dynamics: how do optimization algorithms such as SGD and Adam behave?

  • Scaling laws: how does performance depend on data, model size, and compute?

Research Highlights

(See also: Full publication list)

Recruiting

We are actively seeking self-motivated postdocs, Ph.D. students, and undergraduate interns to join my group. If you are interested, please email me your CV, transcript, and a brief description of your background and research interests.

Recent News

  • 2026-02: Constant-depth network with smooth activations released on arXiv.

    • Establishes that smooth activations (e.g., GELU, SiLU) enable smoothness adaptivity in constant-depth neural networks, achieving optimal approximation and statistical rates.

  • 2026-02: Fast catch-up, late switching accepted to ICLR 2026.

    • Studies batch-size scheduling under FSL, revealing a fast catch-up effect, which holds across linear regression and LLM pre-training.

  • 2025-09: Functional Scaling Laws accepted to NeurIPS 2025 (Spotlight).

    • Introduces a functional scaling law (FSL) framework that—in contrast to classical scaling laws, which only describe final-step behavior—characterizes the entire loss trajectory, spanning from linear regression to LLM pre-training.