File Download

There are no files associated with this item.

Supplementary

Conference Paper: State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits

TitleState-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits
Authors
Issue Date2021
Citation
The 30th International Joint Conference on Artificial Intelligence (IJCAI) How to Cite?
Abstractis a generalization of the multi-armed bandit with non-stationary rewards. Its optimal solution is intractable due to exponentially large state and action spaces with respect to the number of arms. Existing approximation approaches, e.g., Whittle’s index policy, have difficulty capturing either temporal or spatial factors such as impacts from other arms. We propose considering both factors using the attention mechanism, which has achieved great success in deep learning. Our state-aware value function approximation solution comprises an attention-based value function approximator and a Bellman equation solver. The attention-based coordination module capture both spatial and temporal factors for arm coordination. The Bellman equation solver utilizes the decoupling structure of RMABs to acquire solutions with significantly reduced computation overheads. In particular, the time complexity of our approximation is linear in the number of arms. Finally, we illustrate the effectiveness and investigate the properties of our proposed method with numerical experiments.
Persistent Identifierhttp://hdl.handle.net/10722/299754

 

DC FieldValueLanguage
dc.contributor.authorZHAO, J-
dc.contributor.authorWU, S-
dc.contributor.authorTIAN, G-
dc.contributor.authorWANG, J-
dc.date.accessioned2021-05-26T03:28:35Z-
dc.date.available2021-05-26T03:28:35Z-
dc.date.issued2021-
dc.identifier.citationThe 30th International Joint Conference on Artificial Intelligence (IJCAI)-
dc.identifier.urihttp://hdl.handle.net/10722/299754-
dc.description.abstractis a generalization of the multi-armed bandit with non-stationary rewards. Its optimal solution is intractable due to exponentially large state and action spaces with respect to the number of arms. Existing approximation approaches, e.g., Whittle’s index policy, have difficulty capturing either temporal or spatial factors such as impacts from other arms. We propose considering both factors using the attention mechanism, which has achieved great success in deep learning. Our state-aware value function approximation solution comprises an attention-based value function approximator and a Bellman equation solver. The attention-based coordination module capture both spatial and temporal factors for arm coordination. The Bellman equation solver utilizes the decoupling structure of RMABs to acquire solutions with significantly reduced computation overheads. In particular, the time complexity of our approximation is linear in the number of arms. Finally, we illustrate the effectiveness and investigate the properties of our proposed method with numerical experiments.-
dc.languageeng-
dc.relation.ispartofThe 30th International Joint Conference on Artificial Intelligence (IJCAI)-
dc.titleState-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits-
dc.typeConference_Paper-
dc.identifier.hkuros322587-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats