Publications

* indicates equal contribution; α-β indicates alphabetical author order.

GradPower: Powering Gradients for Faster Language Model Pre-Training
Mingze Wang*, Jinbo Wang*, Jiaqi Zhang, Wei Wang, Peng Pei, Xunliang Cai, Weinan E, Lei Wu
arXiv preprint, 2025.

Functional Scaling Laws in Kernel Regression: Loss Dynamics and Learning Rate Schedules
Binghui Li*, Fengling Chen*, Zixun Huang*, Lean Wang*, Lei Wu
NeurIPS 2025, Spotlight.

Point Cloud Neural Operator for Parametric PDEs on Complex and Variable Geometries
Chenyu Zeng, Yanshu Zhang, Jiayi Zhou, Yuhan Wang, Zilin Wang, Yuhao Liu, Lei Wu, Daniel Zhengyu Huang
Comput. Methods Appl. Mech. Eng., 2025

The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
Jinbo Wang*, Mingze Wang*, Zhanpeng Zhou*, Junchi Yan, Weinan E, Lei Wu
ICML 2025.

Analyzing the role of permutation invariance in linear mode connectivity
Keyao Zhan*, Puheng Li*, Lei Wu
AISTATS 2025.

A duality framework for analyzing random feature and two-layer neural networks
(α-β) Hongrui Chen, Jihao Long, Lei Wu
Annals of Statistics 2025.

How Transformers Get Rich: Approximation and Dynamics Analysis
Mingze Wang, Ruoxi Yu, Lei Wu
arXiv preprint arXiv:2410.11474

Improving generalization and convergence by enhancing implicit regularization
Mingze Wang, Jinbo Wang, Haotian He, Zilin Wang, Guanhua Huang, Feiyu Xiong, Zhiyu Li, Lei Wu
NeurIPS 2024.

Exploring neural network landscapes: star-shaped and geodesic connectivity
Zhanran Lin*, Puheng Li*, Lei Wu
arxiv preprint, 2024.

Optimal Rates and Saturation for Noiseless Kernel Ridge Regression
(α-β) Jihao Long, Xiaojun Peng, Lei Wu
arxiv preprint, 2024.

Parameter symmetry and noise equilibrium of stochastic gradient descent
Liu Ziyin, Mingze Wang, Li HongChao, Lei Wu
NeurIPS 2024.

Why do you grok? A theoretical analysis on grokking modular addition
Mohamad Amin Mohamadi, Zhiyuan Li, Lei Wu, Danica J. Sutherland
ICML 2024.

The local landscape of phase retrieval under limited samples
Kaizhao Liu*, Zihao Wang*, Lei Wu
IEEE Trans. Inform. Theory, 2024.

Achieving margin maximization exponentially fast via progressive norm rescaling
Mingze Wang, Zeping Min, Lei Wu
ICML 2024.

The noise geometry of stochastic gradient descent: A quantitative and analytical characterization
Mingze Wang, Lei Wu
arXiv preprint, 2023.

The L^infty learnability of reproducing kernel Hilbert spaces
(α-β) Hongrui Chen, Jihao Long, Lei Wu
arXiv preprint, 2023.

Embedding inequalities for Barron-type spaces
Lei Wu
Journal of Machine Learning, 2023.

Theoretical analysis of inductive biases in deep convolutional networks
Zihao Wang, Lei Wu
NeurIPS 2023.

The implicit regularization of dynamical stability in stochastic gradient descent
Lei Wu, Weijie J. Su
ICML 2023.

The alignment property of SGD noise and how it helps select flat minima: A stability analysis
Lei Wu, Mingze Wang, Weijie J. Su
NeurIPS 2022.

Beyond the quadratic approximation: the multiscale structure of neural network loss landscapes
Chao Ma, Daniel Kunin, Lei Wu, Lexing Ying
Journal of Machine Learning, 2022.

A spectral-based analysis of the separation between two-layer neural networks and linear methods
Lei Wu, Jihao Long
JMLR 2022.

Learning a single neuron for non-monotonic activation functions
Lei Wu
AISTATS 2022.

A qualitative study of the dynamic behavior of adaptive gradient algorithms
Chao Ma*, Lei Wu*, Weinan E
Mathematical and Scientific Machine Learning (MSML), 2021.

Towards a mathematical understanding of neural network-based machine learning: what we know and what we don't
(α-β) Weinan E, Chao Ma, Stephan Wojtowytsch, Lei Wu
CSIAM Trans. Appl. Math., 2020.

The quenching-activation behavior of the gradient descent dynamics for two-layer neural network models
Chao Ma*, Lei Wu*, Weinan E
arXiv Preprint, 2020.

Machine learning based non-Newtonian fluid model with molecular fidelity
Huan Lei, Lei Wu, Weinan E
Physical Review E, 2020.

Machine learning from a continuous viewpoint, I
(α-β) Weinan E, Chao Ma, Lei Wu
Sci. China Math., 2020.

The slow deterioration of the generalization error of the random feature model
Chao Ma*, Lei Wu*, Weinan E
Mathematical and Scientific Machine Learning (MSML), 2020.

The Barron space and flow-induced function spaces for neural network models
(α-β) Weinan E, Chao Ma, Lei Wu
Constructive Approximation, 2021.

Complexity measures for neural networks with general activation functions using path-based norms
(α-β) Zhong Li, Chao Ma, Lei Wu
arxiv preprint, 2020.

A comparative analysis of the optimization and generalization property of two-layer neural network and random feature models under gradient descent dynamics
(α-β) Weinan E, Chao Ma, Lei Wu
Sci. China Math., 2020.

Analysis of the gradient descent algorithm for a deep neural network model with skip-connections
(α-β) Weinan E, Chao Ma, Lei Wu
arXiv preprint, 2019.

The generalization error of minimum-norm solutions for over-parameterized neural networks
(α-β) Weinan E, Chao Ma, Lei Wu
Journal of Pure and Applied Functional Analysis, 2020.

Global convergence of gradient descent for deep linear residual networks
Lei Wu*, Qingcan Wang*, Chao Ma
NeurIPS 2019.

A priori estimates of the population risk for two-layer neural networks
(α-β) Weinan E, Chao Ma, Lei Wu
Communications in Mathematical Sciences, 2019.

The anisotropic noise in stochastic gradient descent: Its behavior of escaping from minima and regularization effects
Zhanxing Zhu, Jingfeng Wu, Bing Yu, Lei Wu, Jinwen Ma
ICML 2019.

How SGD selects the global minima in over-parameterized learning: A dynamical stability perspective
Lei Wu, Chao Ma, Weinan E
NeurIPS 2018.

Towards understanding and improving the transferability of adversarial examples in deep neural networks
Lei Wu, Zhanxing Zhu
ACML, 2020 [arXiv version].

Irreversible samplers from jump and continuous Markov processes
Yi-An Ma, Emily B Fox, Tianqi Chen, Lei Wu
Statistics and Computing, 2018.

Towards understanding generalization of deep learning: perspective of loss landscapes
Lei Wu, Zhanxing Zhu, Weinan E
Workshop on Principled Approaches to Deep Learning, ICML2017.

Smoothed dissipative particle dynamics model for mesoscopic multiphase flows in the presence of thermal fluctuations
Huan Lei, Nathan A Baker, Lei Wu, Gregory K Schenter, Christopher J Mundy, Alexandre M Tartakovsky
Physical Review E, 2016.