Skip to Main content Skip to Navigation
New interface
Conference papers

Tabular and Deep Learning of Whittle Index

Abstract : The Whittle index policy is a heuristic that has shown remarkable good performance (with guaranted asymptotic optimality) when applied to the class of problems known as Restless Multi-Armed Bandit Problems (RMABP). In this paper we present QWI and QWINN, two algorithms capable of learning the Whittle indices for the total discounted criterion. The key feature is the usage of two timescales , a faster one to update the state-action Q-values, and a relatively slower one to update the Whittle indices. In our main theoretical result we show that QWI, which is a tabular implementation, converges to the real Whittle indices. We then present QWINN, an adaptation of QWI algorithm using neural networks to compute the Q-values on the faster timescale , which is able to extrapolate information from one state to another and scales naturally to large state-space environments. Numerical computations show that QWI and QWINN converge much faster than the standard Q-learning algorithm, neural-network based approximate Q-learning and other state of the art algorithms.
Complete list of metadata
Contributor : Francisco Robledo Connect in order to contact the contributor
Submitted on : Monday, September 5, 2022 - 11:57:15 AM
Last modification on : Thursday, November 24, 2022 - 3:39:03 AM


Tabular and Deep Learning of W...
Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License


  • HAL Id : hal-03767324, version 1


Francisco Robledo, Vivek S Borkar, Urtzi Ayesta, Konstantin Avrachenkov. Tabular and Deep Learning of Whittle Index. EWRL 2022 - 15th European Workshop of Reinforcement Learning, Sep 2022, Milan, Italy. ⟨hal-03767324⟩



Record views


Files downloads