An $L^2$ Analysis of Reinforcement Learning in High Dimensions with Kernel and Neural Network Approximation

Jihao Long; Jiequn Han; Weinan E

doi:10.4208/csiam-am.SO-2021-0026

An $L^2$ Analysis of Reinforcement Learning in High Dimensions with Kernel and Neural Network Approximation

Preview

Add to basket

Year: 2022

Author: Jihao Long, Jiequn Han, Weinan E

CSIAM Transactions on Applied Mathematics, Vol. 3 (2022), Iss. 2 : pp. 191–220

Abstract

Reinforcement learning (RL) algorithms based on high-dimensional function approximation have achieved tremendous empirical success in large-scale problems with an enormous number of states. However, most analysis of such algorithms gives rise to error bounds that involve either the number of states or the number of features. This paper considers the situation where the function approximation is made either using the kernel method or the two-layer neural network model, in the context of a fitted Q-iteration algorithm with explicit regularization. We establish an $\tilde{O}(H^3|\mathcal{A}|^{\frac{1}{4}} n^{-\frac{1}{4}})$ bound for the optimal policy with $Hn$ samples, where $H$ is the length of each episode and $|\mathcal{A}|$ is the size of action space. Our analysis hinges on analyzing the $L^2$ error of the approximated Q-function using $n$ data points. Even though this result still requires a finite-sized action space, the error bound is independent of the dimensionality of the state space.

Submit Article

You do not have full access to this article.

Already a Subscriber? Sign in as an individual or via your institution

Journal Article Details

Publisher Name: Global Science Press

Language: English

DOI: https://doi.org/10.4208/csiam-am.SO-2021-0026

CSIAM Transactions on Applied Mathematics, Vol. 3 (2022), Iss. 2 : pp. 191–220

Published online: 2022-01

AMS Subject Headings: Global Science Press

Pages: 30

Keywords: Reinforcement learning function approximation neural networks reproducing kernel Hilbert space.

Author Details

Jihao Long

Jiequn Han

Weinan E

Optimal policy evaluation using kernel-based temporal difference methods

Duan, Yaqi

Wang, Mengdi

Wainwright, Martin J.

The Annals of Statistics, Vol. 52 (2024), Iss. 5
https://doi.org/10.1214/24-AOS2399 [Citations: 0]

Journals

Resources

About Us

Open Access

An $L^2$ Analysis of Reinforcement Learning in High Dimensions with Kernel and Neural Network Approximation

Abstract

Journal Article Details

Author Details

Optimal policy evaluation using kernel-based temporal difference methods

An $L^2$ Analysis of Reinforcement Learning in High Dimensions with Kernel and Neural Network Approximation

Abstract

Full Text

Additional Information

Journal Article Details

Author Details

Cited By

Optimal policy evaluation using kernel-based temporal difference methods