Task Placement and Resource Allocation for Edge Machine Learning: A GNN-based Multi-Agent Reinforcement Learning Paradigm

Li, Yihong; Zhang, Xiaoxi; Zeng, Tianyu; Duan, Jingpu; Wu, Chuan; Wu, Di; Chen, Xu

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TPDS.2023.3313779
Scopus: eid_2-s2.0-85171554225
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Task Placement and Resource Allocation for Edge Machine Learning: A GNN-based Multi-Agent Reinforcement Learning Paradigm

Title	Task Placement and Resource Allocation for Edge Machine Learning: A GNN-based Multi-Agent Reinforcement Learning Paradigm
Authors	Li, Yihong Zhang, Xiaoxi Zeng, Tianyu Duan, Jingpu Wu, Chuan Wu, Di Chen, Xu
Keywords	Clustering algorithms edge machine learning graph neural networks Graphics processing units multi-agent reinforcement learning Optimal scheduling Reinforcement learning resource allocation Resource management Task analysis Task placement Training
Issue Date	1-Jan-2023
Publisher	Institute of Electrical and Electronics Engineers
Citation	IEEE Transactions on Parallel and Distributed Systems, 2023, p. 1-17 How to Cite? DOI: http://dx.doi.org/10.1109/TPDS.2023.3313779
Abstract	Machine learning (ML) tasks are one of the major workloads in today's edge computing networks. Existing edge-cloud schedulers allocate the requested amounts of resources to each task, falling short of best utilizing the limited edge resources for ML tasks. This paper proposes TapFinger , a distributed scheduler for edge clusters that minimizes the total completion time of ML tasks through co-optimizing task placement and fine-grained multi-resource allocation. To learn the tasks' uncertain resource sensitivity and enable distributed scheduling, we adopt multi-agent reinforcement learning (MARL) and propose several techniques to make it efficient, including a heterogeneous graph attention network as the MARL backbone, a tailored task selection phase in the actor network, and the integration of Bayes' theorem and masking schemes. We first implement a single-task scheduling version, which schedules at most one task each time. Then we generalize to the multi-task scheduling case, in which a sequence of tasks is scheduled simultaneously. Our design can mitigate the expanded decision space and yield fast convergence to optimal scheduling solutions. Extensive experiments using synthetic and test-bed ML task traces show that TapFinger can achieve up to 54.9% reduction in the average task completion time and improve resource efficiency as compared to state-of-the-art schedulers.
Persistent Identifier	http://hdl.handle.net/10722/331956
ISSN	1045-9219 2021 Impact Factor: 3.757 2020 SCImago Journal Rankings: 0.760

DC Field	Value	Language
dc.contributor.author	Li, Yihong	-
dc.contributor.author	Zhang, Xiaoxi	-
dc.contributor.author	Zeng, Tianyu	-
dc.contributor.author	Duan, Jingpu	-
dc.contributor.author	Wu, Chuan	-
dc.contributor.author	Wu, Di	-
dc.contributor.author	Chen, Xu	-
dc.date.accessioned	2023-09-28T04:59:51Z	-
dc.date.available	2023-09-28T04:59:51Z	-
dc.date.issued	2023-01-01	-
dc.identifier.citation	IEEE Transactions on Parallel and Distributed Systems, 2023, p. 1-17	-
dc.identifier.issn	1045-9219	-
dc.identifier.uri	http://hdl.handle.net/10722/331956	-
dc.description.abstract	<p>Machine learning (ML) tasks are one of the major workloads in today's edge computing networks. Existing edge-cloud schedulers allocate the requested amounts of resources to each task, falling short of best utilizing the limited edge resources for ML tasks. This paper proposes TapFinger , a distributed scheduler for edge clusters that minimizes the total completion time of ML tasks through co-optimizing task placement and fine-grained multi-resource allocation. To learn the tasks' uncertain resource sensitivity and enable distributed scheduling, we adopt multi-agent reinforcement learning (MARL) and propose several techniques to make it efficient, including a heterogeneous graph attention network as the MARL backbone, a tailored task selection phase in the actor network, and the integration of Bayes' theorem and masking schemes. We first implement a single-task scheduling version, which schedules at most one task each time. Then we generalize to the multi-task scheduling case, in which a sequence of tasks is scheduled simultaneously. Our design can mitigate the expanded decision space and yield fast convergence to optimal scheduling solutions. Extensive experiments using synthetic and test-bed ML task traces show that TapFinger can achieve up to 54.9% reduction in the average task completion time and improve resource efficiency as compared to state-of-the-art schedulers.<br></p>	-
dc.language	eng	-
dc.publisher	Institute of Electrical and Electronics Engineers	-
dc.relation.ispartof	IEEE Transactions on Parallel and Distributed Systems	-
dc.subject	Clustering algorithms	-
dc.subject	edge machine learning	-
dc.subject	graph neural networks	-
dc.subject	Graphics processing units	-
dc.subject	multi-agent reinforcement learning	-
dc.subject	Optimal scheduling	-
dc.subject	Reinforcement learning	-
dc.subject	resource allocation	-
dc.subject	Resource management	-
dc.subject	Task analysis	-
dc.subject	Task placement	-
dc.subject	Training	-
dc.title	Task Placement and Resource Allocation for Edge Machine Learning: A GNN-based Multi-Agent Reinforcement Learning Paradigm	-
dc.type	Article	-
dc.identifier.doi	10.1109/TPDS.2023.3313779	-
dc.identifier.scopus	eid_2-s2.0-85171554225	-
dc.identifier.spage	1	-
dc.identifier.epage	17	-
dc.identifier.eissn	1558-2183	-
dc.identifier.issnl	1045-9219	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Task Placement and Resource Allocation for Edge Machine Learning: A GNN-based Multi-Agent Reinforcement Learning Paradigm

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats