File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/TPDS.2023.3313779
- Scopus: eid_2-s2.0-85171554225
- Find via
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Article: Task Placement and Resource Allocation for Edge Machine Learning: A GNN-based Multi-Agent Reinforcement Learning Paradigm
Title | Task Placement and Resource Allocation for Edge Machine Learning: A GNN-based Multi-Agent Reinforcement Learning Paradigm |
---|---|
Authors | |
Keywords | Clustering algorithms edge machine learning graph neural networks Graphics processing units multi-agent reinforcement learning Optimal scheduling Reinforcement learning resource allocation Resource management Task analysis Task placement Training |
Issue Date | 1-Jan-2023 |
Publisher | Institute of Electrical and Electronics Engineers |
Citation | IEEE Transactions on Parallel and Distributed Systems, 2023, p. 1-17 How to Cite? |
Abstract | Machine learning (ML) tasks are one of the major workloads in today's edge computing networks. Existing edge-cloud schedulers allocate the requested amounts of resources to each task, falling short of best utilizing the limited edge resources for ML tasks. This paper proposes TapFinger , a distributed scheduler for edge clusters that minimizes the total completion time of ML tasks through co-optimizing task placement and fine-grained multi-resource allocation. To learn the tasks' uncertain resource sensitivity and enable distributed scheduling, we adopt multi-agent reinforcement learning (MARL) and propose several techniques to make it efficient, including a heterogeneous graph attention network as the MARL backbone, a tailored task selection phase in the actor network, and the integration of Bayes' theorem and masking schemes. We first implement a single-task scheduling version, which schedules at most one task each time. Then we generalize to the multi-task scheduling case, in which a sequence of tasks is scheduled simultaneously. Our design can mitigate the expanded decision space and yield fast convergence to optimal scheduling solutions. Extensive experiments using synthetic and test-bed ML task traces show that TapFinger can achieve up to 54.9% reduction in the average task completion time and improve resource efficiency as compared to state-of-the-art schedulers. |
Persistent Identifier | http://hdl.handle.net/10722/331956 |
ISSN | 2023 Impact Factor: 5.6 2023 SCImago Journal Rankings: 2.340 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Li, Yihong | - |
dc.contributor.author | Zhang, Xiaoxi | - |
dc.contributor.author | Zeng, Tianyu | - |
dc.contributor.author | Duan, Jingpu | - |
dc.contributor.author | Wu, Chuan | - |
dc.contributor.author | Wu, Di | - |
dc.contributor.author | Chen, Xu | - |
dc.date.accessioned | 2023-09-28T04:59:51Z | - |
dc.date.available | 2023-09-28T04:59:51Z | - |
dc.date.issued | 2023-01-01 | - |
dc.identifier.citation | IEEE Transactions on Parallel and Distributed Systems, 2023, p. 1-17 | - |
dc.identifier.issn | 1045-9219 | - |
dc.identifier.uri | http://hdl.handle.net/10722/331956 | - |
dc.description.abstract | <p>Machine learning (ML) tasks are one of the major workloads in today's edge computing networks. Existing edge-cloud schedulers allocate the requested amounts of resources to each task, falling short of best utilizing the limited edge resources for ML tasks. This paper proposes TapFinger , a distributed scheduler for edge clusters that minimizes the total completion time of ML tasks through co-optimizing task placement and fine-grained multi-resource allocation. To learn the tasks' uncertain resource sensitivity and enable distributed scheduling, we adopt multi-agent reinforcement learning (MARL) and propose several techniques to make it efficient, including a heterogeneous graph attention network as the MARL backbone, a tailored task selection phase in the actor network, and the integration of Bayes' theorem and masking schemes. We first implement a single-task scheduling version, which schedules at most one task each time. Then we generalize to the multi-task scheduling case, in which a sequence of tasks is scheduled simultaneously. Our design can mitigate the expanded decision space and yield fast convergence to optimal scheduling solutions. Extensive experiments using synthetic and test-bed ML task traces show that TapFinger can achieve up to 54.9% reduction in the average task completion time and improve resource efficiency as compared to state-of-the-art schedulers.<br></p> | - |
dc.language | eng | - |
dc.publisher | Institute of Electrical and Electronics Engineers | - |
dc.relation.ispartof | IEEE Transactions on Parallel and Distributed Systems | - |
dc.subject | Clustering algorithms | - |
dc.subject | edge machine learning | - |
dc.subject | graph neural networks | - |
dc.subject | Graphics processing units | - |
dc.subject | multi-agent reinforcement learning | - |
dc.subject | Optimal scheduling | - |
dc.subject | Reinforcement learning | - |
dc.subject | resource allocation | - |
dc.subject | Resource management | - |
dc.subject | Task analysis | - |
dc.subject | Task placement | - |
dc.subject | Training | - |
dc.title | Task Placement and Resource Allocation for Edge Machine Learning: A GNN-based Multi-Agent Reinforcement Learning Paradigm | - |
dc.type | Article | - |
dc.identifier.doi | 10.1109/TPDS.2023.3313779 | - |
dc.identifier.scopus | eid_2-s2.0-85171554225 | - |
dc.identifier.spage | 1 | - |
dc.identifier.epage | 17 | - |
dc.identifier.eissn | 1558-2183 | - |
dc.identifier.issnl | 1045-9219 | - |