Year: 2022
Author: Yuqing Li, Tao Luo, Nung Kwan Yip
CSIAM Transactions on Applied Mathematics, Vol. 3 (2022), Iss. 4 : pp. 692–760
Abstract
Gradient descent yields zero training loss in polynomial time for deep neural networks despite non-convex nature of the objective function. The behavior of network in the infinite width limit trained by gradient descent can be described by the Neural Tangent Kernel (NTK) introduced in [25]. In this paper, we study dynamics of the NTK for finite width Deep Residual Network (ResNet) using the neural tangent hierarchy (NTH) proposed in [24]. For a ResNet with smooth and Lipschitz activation function, we reduce the requirement on the layer width $m$ with respect to the number of training samples $n$ from quartic to cubic. Our analysis suggests strongly that the particular skip-connection structure of ResNet is the main reason for its triumph over fully-connected network.
You do not have full access to this article.
Already a Subscriber? Sign in as an individual or via your institution
Journal Article Details
Publisher Name: Global Science Press
Language: English
DOI: https://doi.org/10.4208/csiam-am.SO-2021-0053
CSIAM Transactions on Applied Mathematics, Vol. 3 (2022), Iss. 4 : pp. 692–760
Published online: 2022-01
AMS Subject Headings: Global Science Press
Copyright: COPYRIGHT: © Global Science Press
Pages: 69
Keywords: Residual networks training process neural tangent kernel neural tangent hierarchy.
Author Details
-
Rectification for Stitched Images with Deformable Meshes and Residual Networks
Fan, Yingbo
Mao, Shanjun
Li, Mei
Wu, Zheng
Kang, Jitong
Li, Ben
Applied Sciences, Vol. 14 (2024), Iss. 7 P.2821
https://doi.org/10.3390/app14072821 [Citations: 0]