Skip to Main content Skip to Navigation
New interface
Conference papers

Boosting reinforcement learning with sparse and rare rewards using Fleming-Viot particle systems

Abstract : We consider reinforcement learning control problems under the average reward criterion in which non-zero rewards are both sparse and rare, that is, they occur in very few states and have a very small steady-state probability. Using Renewal Theory and Fleming-Viot particle systems, we propose a novel approach that exploits prior knowledge on the sparse structure of the environment to boost exploration of the non-zero rewards. We also demonstrate how to combine the methodology with a policy gradient algorithm to construct the FVRL method that is able to efficiently solve structured control problems under these scenarios. We provide theoretical guarantees of the convergence of both the steady-state probability estimator and the policy gradient learner. Finally, we illustrate the method on an M/M/1/K queue control problem where the objective is to determine the optimum blocking threshold K. Our results show that FVRL learns the optimum blocking threshold much more efficiently than vanilla Monte-Carlo reinforcement learning.
Complete list of metadata
Contributor : Daniel Mastropietro Connect in order to contact the contributor
Submitted on : Wednesday, September 7, 2022 - 7:20:31 PM
Last modification on : Wednesday, November 9, 2022 - 9:58:05 AM


2022 - EWRL - Mastropietro, Ma...
Files produced by the author(s)


  • HAL Id : hal-03772025, version 1


Daniel Mastropietro, Szymon Majewski, Urtzi Ayesta, Matthieu Jonckheere. Boosting reinforcement learning with sparse and rare rewards using Fleming-Viot particle systems. 15th European Workshop on Reinforcement Learning (EWRL 2022), Sep 2022, Milano, Italy. ⟨hal-03772025⟩



Record views


Files downloads