Loss Spike in Training Neural Networks

Authors

DOI:

https://doi.org/10.4208/jcm.2412-m2024-0083

Keywords:

Neural Network, Loss Spike, Frequency Principle, Maximum Eigenvalue, Flatness, Generalization, Condensation

Abstract

In this work, we investigate the mechanism underlying loss spikes observed during neural network training. When the training enters a region with a lower-loss-as-sharper structure, the training becomes unstable, and the loss exponentially increases once the loss landscape is too sharp, resulting in the rapid ascent of the loss spike. The training stabilizes when it finds a flat region. From a frequency perspective, we explain the rapid descent in loss as being primarily influenced by low-frequency components. We observe a deviation in the first eigendirection, which can be reasonably explained by the frequency principle, as low-frequency information is captured rapidly, leading to the rapid descent. Inspired by our analysis of loss spikes, we revisit the link between the maximum eigenvalue of the loss Hessian ($λ_{{\rm max}}$), flatness and generalization. We suggest that $λ_{{\rm max}}$ is a good measure of sharpness but not a good measure for generalization. Furthermore, we experimentally observe that loss spikes can facilitate condensation, causing input weights to evolve towards the same direction. And our experiments show that there is a correlation (similar trend) between $λ_{{\rm max}}$ and condensation. This observation may provide valuable insights for further theoretical research on the relationship between loss spikes, $λ_{{\rm max}}$, and generalization.

Author Biographies

  • Xiaolong Li

    School of Mathematical Sciences, Institute of Natural Sciences, MOE-LSC, Shanghai Jiao Tong University, Shanghai 200240, China

  • Zhi-Qin John Xu

    School of Mathematical Sciences, Institute of Natural Sciences, MOE-LSC, Shanghai Jiao Tong University, Shanghai 200240, China; Key Laboratory of Marine Intelligent Equipment and System, Ministry of Education, Shanghai 200240, China

  • Zhongwang Zhang

    School of Mathematical Sciences, Institute of Natural Sciences, MOE-LSC, Shanghai Jiao Tong University, Shanghai 200240, China

Downloads

Published

2025-01-13

Abstract View

  • 112

Pdf View

  • 23

Issue

Section

Articles

How to Cite

Loss Spike in Training Neural Networks. (2025). Journal of Computational Mathematics. https://doi.org/10.4208/jcm.2412-m2024-0083