File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Task Placement and Resource Allocation for Edge Machine Learning: A GNN-based Multi-Agent Reinforcement Learning Paradigm

TitleTask Placement and Resource Allocation for Edge Machine Learning: A GNN-based Multi-Agent Reinforcement Learning Paradigm
Authors
KeywordsClustering algorithms
edge machine learning
graph neural networks
Graphics processing units
multi-agent reinforcement learning
Optimal scheduling
Reinforcement learning
resource allocation
Resource management
Task analysis
Task placement
Training
Issue Date1-Jan-2023
PublisherInstitute of Electrical and Electronics Engineers
Citation
IEEE Transactions on Parallel and Distributed Systems, 2023, p. 1-17 How to Cite?
Abstract

Machine learning (ML) tasks are one of the major workloads in today's edge computing networks. Existing edge-cloud schedulers allocate the requested amounts of resources to each task, falling short of best utilizing the limited edge resources for ML tasks. This paper proposes TapFinger , a distributed scheduler for edge clusters that minimizes the total completion time of ML tasks through co-optimizing task placement and fine-grained multi-resource allocation. To learn the tasks' uncertain resource sensitivity and enable distributed scheduling, we adopt multi-agent reinforcement learning (MARL) and propose several techniques to make it efficient, including a heterogeneous graph attention network as the MARL backbone, a tailored task selection phase in the actor network, and the integration of Bayes' theorem and masking schemes. We first implement a single-task scheduling version, which schedules at most one task each time. Then we generalize to the multi-task scheduling case, in which a sequence of tasks is scheduled simultaneously. Our design can mitigate the expanded decision space and yield fast convergence to optimal scheduling solutions. Extensive experiments using synthetic and test-bed ML task traces show that TapFinger can achieve up to 54.9% reduction in the average task completion time and improve resource efficiency as compared to state-of-the-art schedulers.


Persistent Identifierhttp://hdl.handle.net/10722/331956
ISSN
2021 Impact Factor: 3.757
2020 SCImago Journal Rankings: 0.760

 

DC FieldValueLanguage
dc.contributor.authorLi, Yihong-
dc.contributor.authorZhang, Xiaoxi-
dc.contributor.authorZeng, Tianyu-
dc.contributor.authorDuan, Jingpu-
dc.contributor.authorWu, Chuan-
dc.contributor.authorWu, Di-
dc.contributor.authorChen, Xu-
dc.date.accessioned2023-09-28T04:59:51Z-
dc.date.available2023-09-28T04:59:51Z-
dc.date.issued2023-01-01-
dc.identifier.citationIEEE Transactions on Parallel and Distributed Systems, 2023, p. 1-17-
dc.identifier.issn1045-9219-
dc.identifier.urihttp://hdl.handle.net/10722/331956-
dc.description.abstract<p>Machine learning (ML) tasks are one of the major workloads in today's edge computing networks. Existing edge-cloud schedulers allocate the requested amounts of resources to each task, falling short of best utilizing the limited edge resources for ML tasks. This paper proposes TapFinger , a distributed scheduler for edge clusters that minimizes the total completion time of ML tasks through co-optimizing task placement and fine-grained multi-resource allocation. To learn the tasks' uncertain resource sensitivity and enable distributed scheduling, we adopt multi-agent reinforcement learning (MARL) and propose several techniques to make it efficient, including a heterogeneous graph attention network as the MARL backbone, a tailored task selection phase in the actor network, and the integration of Bayes' theorem and masking schemes. We first implement a single-task scheduling version, which schedules at most one task each time. Then we generalize to the multi-task scheduling case, in which a sequence of tasks is scheduled simultaneously. Our design can mitigate the expanded decision space and yield fast convergence to optimal scheduling solutions. Extensive experiments using synthetic and test-bed ML task traces show that TapFinger can achieve up to 54.9% reduction in the average task completion time and improve resource efficiency as compared to state-of-the-art schedulers.<br></p>-
dc.languageeng-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relation.ispartofIEEE Transactions on Parallel and Distributed Systems-
dc.subjectClustering algorithms-
dc.subjectedge machine learning-
dc.subjectgraph neural networks-
dc.subjectGraphics processing units-
dc.subjectmulti-agent reinforcement learning-
dc.subjectOptimal scheduling-
dc.subjectReinforcement learning-
dc.subjectresource allocation-
dc.subjectResource management-
dc.subjectTask analysis-
dc.subjectTask placement-
dc.subjectTraining-
dc.titleTask Placement and Resource Allocation for Edge Machine Learning: A GNN-based Multi-Agent Reinforcement Learning Paradigm-
dc.typeArticle-
dc.identifier.doi10.1109/TPDS.2023.3313779-
dc.identifier.scopuseid_2-s2.0-85171554225-
dc.identifier.spage1-
dc.identifier.epage17-
dc.identifier.eissn1558-2183-
dc.identifier.issnl1045-9219-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats