Lei Wu (吴磊)

Lei Wu 


Assistant Professor
School of Mathematical Sciences
Center for Machine Learning Research
Peking University

Office: 静园6院 205
Email: leiwu (at) math (dot) pku (dot) edu (dot) cn

Google Scholar     Github      CV

About Me

I am currently an Assistant Professor in the School of Mathematical Sciences and Center for Machine Learning Research at Peking University.

Previously, I was a postdoc in PACM at Princeton University and in the Wharton Statistics and Data Science Department at the University of Pennsylvania. I completed my Ph.D. in computational mathematics at Peking University in 2018, advised by Prof. Weinan E. I received my B.S. degree in mathematics from Nankai University in 2012.

My research aims to understand the mechanisms behind the success of deep learning, with a particular focus on:

  • The approximation and representation power of neural networks

  • The dynamical behavior of popular optimization algorithms such as SGD and Adam

  • Emergent phenomena in the training of large language models (LLMs)

Recruiting

We are actively seeking self-motivated postdocs, PhD students and interns to join our team. If you are interested in collaborating with me, please send an email to me with your CV and transcript as well as a brief description of your research interests.

Selected Works

  • Understanding and improving LLM pre-training

    • Optimizer design:

      • AdmIRE & Blockwise LR: Speed up convergence by amplifying the dynamics along flat directions.

      • GradPower: Make gradients more informative for faster convergence, requiring only a single-line code change.

  • A stability theory of implicit regularization

    • SGD prefers flat minima via stability: Introduced the stability-based view of flatness bias: NeurIPS 2018

    • The anisotropic SGD noise is crucial for sharpness control.: NeurIPS 2022

    • Flat minima provably generalize well for two-layer ReLU and diagonal linear nets: ICML 2023

    • Stability-inspired algorithms for seeking flatter minima: NeurIPS 2024

    • Discovery of the edge of stability (EoS) phenomenon: NeurIPS 2018 (Table 2)