State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits

Wu, S; Zhao, J; Tian, G; Wang, J

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.24963/ijcai.2021/64
Find via

Supplementary

Citations:
Appears in Collections:
- Statistics & Actuarial Science: Conference papers

Conference Paper: State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits

Title	State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits
Authors	Wu, S Zhao, J Tian, G Wang, J
Keywords	Agent-based and Multi-agent Systems: Multi-agent Planning Agent-based and Multi-agent Systems: Resource Allocation Planning and Scheduling: Planning and Scheduling Planning and Scheduling: Markov Decisions Processes
Issue Date	2021
Publisher	International Joint Conference on Artificial Intelligence. The Journal's web site is located at https://www.ijcai.org/past_proceedings
Citation	Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI-21), Virtual Meeting, Montreal, Canada, 19-27 August 2021 , p. 458-464 How to Cite? DOI: http://dx.doi.org/10.24963/ijcai.2021/64
Abstract	The restless multi-armed bandit (RMAB) problem is a generalization of the multi-armed bandit with non-stationary rewards. Its optimal solution is intractable due to exponentially large state and action spaces with respect to the number of arms. Existing approximation approaches, e.g., Whittle's index policy, have difficulty in capturing either temporal or spatial factors such as impacts from other arms. We propose considering both factors using the attention mechanism, which has achieved great success in deep learning. Our state-aware value function approximation solution comprises an attention-based value function approximator and a Bellman equation solver. The attention-based coordination module capture both spatial and temporal factors for arm coordination. The Bellman equation solver utilizes the decoupling structure of RMABs to acquire solutions with significantly reduced computation overheads. In particular, the time complexity of our approximation is linear in the number of arms. Finally, we illustrate the effectiveness and investigate the properties of our proposed method with numerical experiments.
Description	Main Track: Agent-based and Multi-agent Systems
Persistent Identifier	http://hdl.handle.net/10722/299754
ISSN	1045-0823 2020 SCImago Journal Rankings: 0.649

DC Field	Value	Language
dc.contributor.author	Wu, S	-
dc.contributor.author	Zhao, J	-
dc.contributor.author	Tian, G	-
dc.contributor.author	Wang, J	-
dc.date.accessioned	2021-05-26T03:28:35Z	-
dc.date.available	2021-05-26T03:28:35Z	-
dc.date.issued	2021	-
dc.identifier.citation	Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI-21), Virtual Meeting, Montreal, Canada, 19-27 August 2021 , p. 458-464	-
dc.identifier.issn	1045-0823	-
dc.identifier.uri	http://hdl.handle.net/10722/299754	-
dc.description	Main Track: Agent-based and Multi-agent Systems	-
dc.description.abstract	The restless multi-armed bandit (RMAB) problem is a generalization of the multi-armed bandit with non-stationary rewards. Its optimal solution is intractable due to exponentially large state and action spaces with respect to the number of arms. Existing approximation approaches, e.g., Whittle's index policy, have difficulty in capturing either temporal or spatial factors such as impacts from other arms. We propose considering both factors using the attention mechanism, which has achieved great success in deep learning. Our state-aware value function approximation solution comprises an attention-based value function approximator and a Bellman equation solver. The attention-based coordination module capture both spatial and temporal factors for arm coordination. The Bellman equation solver utilizes the decoupling structure of RMABs to acquire solutions with significantly reduced computation overheads. In particular, the time complexity of our approximation is linear in the number of arms. Finally, we illustrate the effectiveness and investigate the properties of our proposed method with numerical experiments.	-
dc.language	eng	-
dc.publisher	International Joint Conference on Artificial Intelligence. The Journal's web site is located at https://www.ijcai.org/past_proceedings	-
dc.relation.ispartof	The 30th International Joint Conference on Artificial Intelligence (IJCAI)	-
dc.subject	Agent-based and Multi-agent Systems: Multi-agent Planning	-
dc.subject	Agent-based and Multi-agent Systems: Resource Allocation	-
dc.subject	Planning and Scheduling: Planning and Scheduling	-
dc.subject	Planning and Scheduling: Markov Decisions Processes	-
dc.title	State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits	-
dc.type	Conference_Paper	-
dc.identifier.doi	10.24963/ijcai.2021/64	-
dc.identifier.hkuros	322587	-
dc.identifier.spage	458	-
dc.identifier.epage	464	-
dc.publisher.place	United States	-
dc.identifier.eisbn	9780999241196	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats