On the Banach Spaces Associated with Multi-Layer ReLU Networks: Function Representation, Approximation Theory and Gradient Descent Dynamics

On the Banach Spaces Associated with Multi-Layer ReLU Networks: Function Representation, Approximation Theory and Gradient Descent Dynamics

Year:    2020

Author:    Weinan E, Stephan Wojtowytsch

CSIAM Transactions on Applied Mathematics, Vol. 1 (2020), Iss. 3 : pp. 387–440

Abstract

We develop Banach spaces for ReLU neural networks of finite depth $L$ and infinite width. The spaces contain all finite fully connected $L$-layer networks and their $L^2$-limiting objects under bounds on the natural path-norm. Under this norm, the unit ball in the space for $L$-layer networks has low Rademacher complexity and thus favorable generalization properties. Functions in these spaces can be approximated by multi-layer neural networks with dimension-independent convergence rates.
The key to this work is a new way of representing functions in some form of expectations, motivated by multi-layer neural networks. This representation allows us to define a new class of continuous models for machine learning. We show that the gradient flow defined this way is the natural continuous analog of the gradient descent dynamics for the associated multi-layer neural networks. We show that the path-norm increases at most polynomially under this continuous gradient flow dynamics.

You do not have full access to this article.

Already a Subscriber? Sign in as an individual or via your institution

Journal Article Details

Publisher Name:    Global Science Press

Language:    English

DOI:    https://doi.org/10.4208/csiam-am.20-211

CSIAM Transactions on Applied Mathematics, Vol. 1 (2020), Iss. 3 : pp. 387–440

Published online:    2020-01

AMS Subject Headings:    Global Science Press

Copyright:    COPYRIGHT: © Global Science Press

Pages:    54

Keywords:    Barron space multi-layer space deep neural network representations of functions machine learning infinitely wide network ReLU activation Banach space path-norm continuous gradient descent dynamics index representation.

Author Details

Weinan E

Stephan Wojtowytsch

  1. A class of dimension-free metrics for the convergence of empirical measures

    Han, Jiequn | Hu, Ruimeng | Long, Jihao

    Stochastic Processes and their Applications, Vol. 164 (2023), Iss. P.242

    https://doi.org/10.1016/j.spa.2023.07.009 [Citations: 1]
  2. Explorations in the Mathematics of Data Science

    Qualitative Neural Network Approximation over $$\mathbb R$$ and $$\mathbb C$$: Elementary Proofs for Analytic and Polynomial Activation

    Park, Josiah | Wojtowytsch, Stephan

    2024

    https://doi.org/10.1007/978-3-031-66497-7_3 [Citations: 0]
  3. Deep blue artificial intelligence for knowledge discovery of the intermediate ocean

    Chen, Ge | Huang, Baoxiang | Yang, Jie | Radenkovic, Milena | Ge, Linyao | Cao, Chuanchuan | Chen, Xiaoyan | Xia, Linghui | Han, Guiyan | Ma, Ying

    Frontiers in Marine Science, Vol. 9 (2023), Iss.

    https://doi.org/10.3389/fmars.2022.1034188 [Citations: 2]
  4. On the Impact of Label Noise in Federated Learning

    Ke, Shuqi | Huang, Chao | Liu, Xin

    2023 21st International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), (2023), P.183

    https://doi.org/10.23919/WiOpt58741.2023.10349830 [Citations: 3]
  5. Numerical solution of Poisson partial differential equation in high dimension using two-layer neural networks

    Dus, Mathias | Ehrlacher, Virginie

    Mathematics of Computation, Vol. 94 (2024), Iss. 351 P.159

    https://doi.org/10.1090/mcom/3971 [Citations: 0]
  6. Neural network approximation and estimation of classifiers with classification boundary in a Barron class

    Caragea, Andrei | Petersen, Philipp | Voigtlaender, Felix

    The Annals of Applied Probability, Vol. 33 (2023), Iss. 4

    https://doi.org/10.1214/22-AAP1884 [Citations: 2]
  7. The difficulty of computing stable and accurate neural networks: On the barriers of deep learning and Smale’s 18th problem

    Colbrook, Matthew J. | Antun, Vegard | Hansen, Anders C.

    Proceedings of the National Academy of Sciences, Vol. 119 (2022), Iss. 12

    https://doi.org/10.1073/pnas.2107151119 [Citations: 63]
  8. Generalization error of GAN from the discriminator’s perspective

    Yang, Hongkang | E, Weinan

    Research in the Mathematical Sciences, Vol. 9 (2022), Iss. 1

    https://doi.org/10.1007/s40687-021-00306-y [Citations: 5]
  9. Landscape Analysis for Shallow Neural Networks: Complete Classification of Critical Points for Affine Target Functions

    Cheridito, Patrick | Jentzen, Arnulf | Rossmannek, Florian

    Journal of Nonlinear Science, Vol. 32 (2022), Iss. 5

    https://doi.org/10.1007/s00332-022-09823-8 [Citations: 5]
  10. Gradient Descent Provably Escapes Saddle Points in the Training of Shallow ReLU Networks

    Cheridito, Patrick | Jentzen, Arnulf | Rossmannek, Florian

    Journal of Optimization Theory and Applications, Vol. (2024), Iss.

    https://doi.org/10.1007/s10957-024-02513-3 [Citations: 0]
  11. Approximation of compositional functions with ReLU neural networks

    Gong, Qi | Kang, Wei | Fahroo, Fariba

    Systems & Control Letters, Vol. 175 (2023), Iss. P.105508

    https://doi.org/10.1016/j.sysconle.2023.105508 [Citations: 5]
  12. Local randomized neural networks with hybridized discontinuous Petrov–Galerkin methods for Stokes–Darcy flows

    Dang, Haoning | Wang, Fei

    Physics of Fluids, Vol. 36 (2024), Iss. 8

    https://doi.org/10.1063/5.0218131 [Citations: 0]
  13. Deep Learning Meets Sparse Regularization: A signal processing perspective

    Parhi, Rahul | Nowak, Robert D.

    IEEE Signal Processing Magazine, Vol. 40 (2023), Iss. 6 P.63

    https://doi.org/10.1109/MSP.2023.3286988 [Citations: 4]