Convergence Analysis for Over-Parameterized Deep Learning

Yuling Jiao; Xiliang Lu; Peiying Wu; Jerry Zhijian Yang

doi:10.4208/cicp.OA-2023-0264

Convergence Analysis for Over-Parameterized Deep Learning

Preview

Add to basket

Year: 2024

Author: Yuling Jiao, Xiliang Lu, Peiying Wu, Jerry Zhijian Yang

Communications in Computational Physics, Vol. 36 (2024), Iss. 1 : pp. 71–103

Abstract

The success of deep learning in various applications has generated a growing interest in understanding its theoretical foundations. This paper presents a theoretical framework that explains why over-parameterized neural networks can perform well. Our analysis begins from the perspective of approximation theory and argues that over-parameterized deep neural networks with bounded norms can effectively approximate the target. Additionally, we demonstrate that the metric entropy of such networks is independent of the number of network parameters. We utilize these findings to derive consistency results for over-parameterized deep regression and the deep Ritz method, respectively. Furthermore, we prove convergence rates when the target has higher regularity, which, to our knowledge, represents the first convergence rate for over-parameterized deep learning.

Submit Article

You do not have full access to this article.

Already a Subscriber? Sign in as an individual or via your institution

Journal Article Details

Publisher Name: Global Science Press

Language: English

DOI: https://doi.org/10.4208/cicp.OA-2023-0264

Communications in Computational Physics, Vol. 36 (2024), Iss. 1 : pp. 71–103

Published online: 2024-01

AMS Subject Headings: Global Science Press

Pages: 33

Keywords: Over-parameterization convergence rate approximation generalization.

Author Details

Yuling Jiao

Xiliang Lu

Peiying Wu

Jerry Zhijian Yang

Journals

Resources

About Us

Open Access

Convergence Analysis for Over-Parameterized Deep Learning

Abstract

Journal Article Details

Author Details

Convergence Analysis for Over-Parameterized Deep Learning

Abstract

Full Text

Additional Information

Journal Article Details

Author Details