* indicates equal contribution; α-β indicates alphabetical author order.
How transformers implement induction heads: Approximation and optimization analysis Mingze Wang*, Ruoxi Yu*, Lei Wu arXiv preprint arXiv:2410.11474
Improving generalization and convergence by enhancing implicit regularization Mingze Wang,Jinbo Wang, Haotian He, Zilin Wang, Guanhua Huang, Feiyu Xiong, Zhiyu Li, Lei Wu NeurIPS 2024.
Exploring neural network landscapes: star-shaped and geodesic connectivity Zhanran Lin*, Puheng Li*, Lei Wu arxiv preprint, 2024.
A duality analysis of kernel ridge regression in the noiseless regime (α-β) Jihao Long, Xiaojun Peng, Lei Wu arxiv preprint, 2024.
Parameter symmetry and noise equilibrium of stochastic gradient descent Liu Ziyin, Mingze Wang, Li HongChao, Lei Wu NeurIPS 2024.
Why do you grok? A theoretical analysis on grokking modular addition Mohamad Amin Mohamadi, Zhiyuan Li, Lei Wu, Danica J. Sutherland ICML 2024.
The local landscape of phase retrieval under limited samples Kaizhao Liu*, Zihao Wang*, Lei Wu IEEE Trans. Inform. Theory, to appear.
Achieving margin maximization exponentially fast via progressive norm rescaling Mingze Wang, Zeping Min, Lei Wu ICML 2024.
The noise geometry of stochastic gradient descent: A quantitative and analytical characterization Mingze Wang, Lei Wu arXiv preprint, 2023.
The L^infty learnability of reproducing kernel Hilbert spaces (α-β) Hongrui Chen, Jihao Long, Lei Wu arXiv preprint, 2023.
Embedding inequalities for Barron-type spaces Lei Wu Journal of Machine Learning, 2023.
A duality framework for analyzing random feature and two-layer neural networks (α-β) Hongrui Chen, Jihao Long, Lei Wu arXiv preprint, 2023.
Theoretical analysis of inductive biases in deep convolutional networks Zihao Wang, Lei Wu NeurIPS 2023.
The implicit regularization of dynamical stability in stochastic gradient descent Lei Wu, Weijie J. Su ICML 2023.
The alignment property of SGD noise and how it helps select flat minima: A stability analysis Lei Wu, Mingze Wang, Weijie J. Su NeurIPS 2022.
Beyond the quadratic approximation: the multiscale structure of neural network loss landscapes Chao Ma, Daniel Kunin, Lei Wu, Lexing Ying Journal of Machine Learning, 2022.
A spectral-based analysis of the separation between two-layer neural networks and linear methods Lei Wu, Jihao Long JMLR 2022.
Learning a single neuron for non-monotonic activation functions Lei Wu AISTATS 2022.
A qualitative study of the dynamic behavior of adaptive gradient algorithms Chao Ma*, Lei Wu*, Weinan E Mathematical and Scientific Machine Learning (MSML), 2021.
Towards a mathematical understanding of neural network-based machine learning: what we know and what we don't (α-β) Weinan E, Chao Ma, Stephan Wojtowytsch, Lei Wu CSIAM Trans. Appl. Math., 2020.
The quenching-activation behavior of the gradient descent dynamics for two-layer neural network models Chao Ma*, Lei Wu*, Weinan E arXiv Preprint, 2020.
Machine learning based non-Newtonian fluid model with molecular fidelity Huan Lei, Lei Wu, Weinan E Physical Review E, 2020.
Machine learning from a continuous viewpoint, I (α-β) Weinan E, Chao Ma, Lei Wu Sci. China Math., 2020.
The slow deterioration of the generalization error of the random feature model Chao Ma*, Lei Wu*, Weinan E Mathematical and Scientific Machine Learning (MSML), 2020.
The Barron space and flow-induced function spaces for neural network models (α-β) Weinan E, Chao Ma, Lei Wu Constructive Approximation, 2021.
Complexity measures for neural networks with general activation functions using path-based norms (α-β) Zhong Li, Chao Ma, Lei Wu arxiv preprint, 2020.
A comparative analysis of the optimization and generalization property of two-layer neural network and random feature models under gradient descent dynamics (α-β) Weinan E, Chao Ma, Lei Wu Sci. China Math., 2020.
Analysis of the gradient descent algorithm for a deep neural network model with skip-connections (α-β) Weinan E, Chao Ma, Lei Wu arXiv preprint, 2019.
The generalization error of minimum-norm solutions for over-parameterized neural networks (α-β) Weinan E, Chao Ma, Lei Wu Journal of Pure and Applied Functional Analysis, 2020.
Global convergence of gradient descent for deep linear residual networks Lei Wu*, Qingcan Wang*, Chao Ma NeurIPS 2019.
A priori estimates of the population risk for two-layer neural networks (α-β) Weinan E, Chao Ma, Lei Wu Communications in Mathematical Sciences, 2019.
The anisotropic noise in stochastic gradient descent: Its behavior of escaping from minima and regularization effects Zhanxing Zhu, Jingfeng Wu, Bing Yu, Lei Wu, Jinwen Ma ICML 2019.
How SGD selects the global minima in over-parameterized learning: A dynamical stability perspective Lei Wu, Chao Ma, Weinan E NeurIPS 2018.
Towards understanding and improving the transferability of adversarial examples in deep neural networks Lei Wu, Zhanxing Zhu ACML, 2020 [arXiv version].
Irreversible samplers from jump and continuous Markov processes Yi-An Ma, Emily B Fox, Tianqi Chen, Lei Wu Statistics and Computing, 2018.
Towards understanding generalization of deep learning: perspective of loss landscapes Lei Wu, Zhanxing Zhu, Weinan E Workshop on Principled Approaches to Deep Learning, ICML2017.
Smoothed dissipative particle dynamics model for mesoscopic multiphase flows in the presence of thermal fluctuations Huan Lei, Nathan A Baker, Lei Wu, Gregory K Schenter, Christopher J Mundy, Alexandre M Tartakovsky Physical Review E, 2016.