File Download
There are no files associated with this item.
Supplementary
-
Citations:
- Appears in Collections:
Conference Paper: State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits
Title | State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits |
---|---|
Authors | |
Keywords | Agent-based and Multi-agent Systems: Multi-agent Planning Agent-based and Multi-agent Systems: Resource Allocation Planning and Scheduling: Planning and Scheduling Planning and Scheduling: Markov Decisions Processes |
Issue Date | 2021 |
Publisher | International Joint Conference on Artificial Intelligence. The Journal's web site is located at https://www.ijcai.org/past_proceedings |
Citation | Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI-21), Virtual Meeting, Montreal, Canada, 19-27 August 2021 , p. 458-464 How to Cite? |
Abstract | The restless multi-armed bandit (RMAB) problem is a generalization of the multi-armed bandit with non-stationary rewards. Its optimal solution is intractable due to exponentially large state and action spaces with respect to the number of arms. Existing approximation approaches, e.g., Whittle's index policy, have difficulty in capturing either temporal or spatial factors such as impacts from other arms. We propose considering both factors using the attention mechanism, which has achieved great success in deep learning. Our state-aware value function approximation solution comprises an attention-based value function approximator and a Bellman equation solver. The attention-based coordination module capture both spatial and temporal factors for arm coordination. The Bellman equation solver utilizes the decoupling structure of RMABs to acquire solutions with significantly reduced computation overheads. In particular, the time complexity of our approximation is linear in the number of arms. Finally, we illustrate the effectiveness and investigate the properties of our proposed method with numerical experiments. |
Description | Main Track: Agent-based and Multi-agent Systems |
Persistent Identifier | http://hdl.handle.net/10722/299754 |
ISSN | 2020 SCImago Journal Rankings: 0.649 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Wu, S | - |
dc.contributor.author | Zhao, J | - |
dc.contributor.author | Tian, G | - |
dc.contributor.author | Wang, J | - |
dc.date.accessioned | 2021-05-26T03:28:35Z | - |
dc.date.available | 2021-05-26T03:28:35Z | - |
dc.date.issued | 2021 | - |
dc.identifier.citation | Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI-21), Virtual Meeting, Montreal, Canada, 19-27 August 2021 , p. 458-464 | - |
dc.identifier.issn | 1045-0823 | - |
dc.identifier.uri | http://hdl.handle.net/10722/299754 | - |
dc.description | Main Track: Agent-based and Multi-agent Systems | - |
dc.description.abstract | The restless multi-armed bandit (RMAB) problem is a generalization of the multi-armed bandit with non-stationary rewards. Its optimal solution is intractable due to exponentially large state and action spaces with respect to the number of arms. Existing approximation approaches, e.g., Whittle's index policy, have difficulty in capturing either temporal or spatial factors such as impacts from other arms. We propose considering both factors using the attention mechanism, which has achieved great success in deep learning. Our state-aware value function approximation solution comprises an attention-based value function approximator and a Bellman equation solver. The attention-based coordination module capture both spatial and temporal factors for arm coordination. The Bellman equation solver utilizes the decoupling structure of RMABs to acquire solutions with significantly reduced computation overheads. In particular, the time complexity of our approximation is linear in the number of arms. Finally, we illustrate the effectiveness and investigate the properties of our proposed method with numerical experiments. | - |
dc.language | eng | - |
dc.publisher | International Joint Conference on Artificial Intelligence. The Journal's web site is located at https://www.ijcai.org/past_proceedings | - |
dc.relation.ispartof | The 30th International Joint Conference on Artificial Intelligence (IJCAI) | - |
dc.subject | Agent-based and Multi-agent Systems: Multi-agent Planning | - |
dc.subject | Agent-based and Multi-agent Systems: Resource Allocation | - |
dc.subject | Planning and Scheduling: Planning and Scheduling | - |
dc.subject | Planning and Scheduling: Markov Decisions Processes | - |
dc.title | State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits | - |
dc.type | Conference_Paper | - |
dc.identifier.doi | 10.24963/ijcai.2021/64 | - |
dc.identifier.hkuros | 322587 | - |
dc.identifier.spage | 458 | - |
dc.identifier.epage | 464 | - |
dc.publisher.place | United States | - |
dc.identifier.eisbn | 9780999241196 | - |