File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Scheduling Placement-Sensitive BSP Jobs with Inaccurate Execution Time Estimation

TitleScheduling Placement-Sensitive BSP Jobs with Inaccurate Execution Time Estimation
Authors
Keywordsapproximation theory
computational complexity
graph theory
learning (artificial intelligence)
scheduling
Issue Date2020
PublisherIEEE Computer Society. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000359
Citation
Proceedings of IEEE INFOCOM 2020 - IEEE Conference on Computer Communications, Toronto, ON, Canada, 6-9 July 2020, p. 1053-1062 How to Cite?
AbstractThe Bulk Synchronous Parallel (BSP) paradigm is gaining tremendous importance recently because of the pop-ularity of computations such as distributed machine learning and graph computation. In a typical BSP job, multiple workers concurrently conduct iterative computations, where frequent synchronization is required. Therefore, the workers should be scheduled simultaneously and their placement on different computing devices could significantly affect the performance. Simply retrofitting a traditional scheduling discipline will likely not yield the desired performance due to the unique characteristics of BSP jobs. In this work, we derive SPIN, a novel scheduling designed for BSP jobs with placement-sensitive execution to minimize the makespan of all jobs. We first prove the problem approximation hardness and then present how SPIN solves it with a rounding-based randomized approximation approach. Our analysis indicates SPIN achieves a good performance guarantee efficiently. Moreover, SPIN is robust against misestimation of job execution time by theoretically bounding its negative impact. We implement SPIN on a production-trace driven testbed with 40 GPUs. Our extensive experiments show that SPIN can reduce the job makespan and the average job completion time by up to 3× and 4.68×, respectively. Our approach also demonstrates better robustness to execution time misestimation compared with heuristic baselines.
Persistent Identifierhttp://hdl.handle.net/10722/293457
ISSN
2020 SCImago Journal Rankings: 1.183

 

DC FieldValueLanguage
dc.contributor.authorHan, Z-
dc.contributor.authorTan, H-
dc.contributor.authorJiang, S-
dc.contributor.authorFu, X-
dc.contributor.authorCao, W-
dc.contributor.authorLau, FCM-
dc.date.accessioned2020-11-23T08:17:02Z-
dc.date.available2020-11-23T08:17:02Z-
dc.date.issued2020-
dc.identifier.citationProceedings of IEEE INFOCOM 2020 - IEEE Conference on Computer Communications, Toronto, ON, Canada, 6-9 July 2020, p. 1053-1062-
dc.identifier.issn0743-166X-
dc.identifier.urihttp://hdl.handle.net/10722/293457-
dc.description.abstractThe Bulk Synchronous Parallel (BSP) paradigm is gaining tremendous importance recently because of the pop-ularity of computations such as distributed machine learning and graph computation. In a typical BSP job, multiple workers concurrently conduct iterative computations, where frequent synchronization is required. Therefore, the workers should be scheduled simultaneously and their placement on different computing devices could significantly affect the performance. Simply retrofitting a traditional scheduling discipline will likely not yield the desired performance due to the unique characteristics of BSP jobs. In this work, we derive SPIN, a novel scheduling designed for BSP jobs with placement-sensitive execution to minimize the makespan of all jobs. We first prove the problem approximation hardness and then present how SPIN solves it with a rounding-based randomized approximation approach. Our analysis indicates SPIN achieves a good performance guarantee efficiently. Moreover, SPIN is robust against misestimation of job execution time by theoretically bounding its negative impact. We implement SPIN on a production-trace driven testbed with 40 GPUs. Our extensive experiments show that SPIN can reduce the job makespan and the average job completion time by up to 3× and 4.68×, respectively. Our approach also demonstrates better robustness to execution time misestimation compared with heuristic baselines.-
dc.languageeng-
dc.publisherIEEE Computer Society. The Journal's web site is located at http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000359-
dc.relation.ispartofIEEE INFOCOM - IEEE Conference on Computer Communications-
dc.rightsIEEE INFOCOM - IEEE Conference on Computer Communications. Copyright © IEEE Computer Society.-
dc.rights©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.-
dc.subjectapproximation theory-
dc.subjectcomputational complexity-
dc.subjectgraph theory-
dc.subjectlearning (artificial intelligence)-
dc.subjectscheduling-
dc.titleScheduling Placement-Sensitive BSP Jobs with Inaccurate Execution Time Estimation-
dc.typeConference_Paper-
dc.identifier.emailLau, FCM: fcmlau@cs.hku.hk-
dc.identifier.authorityLau, FCM=rp00221-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1109/INFOCOM41043.2020.9155445-
dc.identifier.scopuseid_2-s2.0-85090283333-
dc.identifier.hkuros319178-
dc.identifier.spage1053-
dc.identifier.epage1062-
dc.publisher.placeUnited States-
dc.identifier.issnl0743-166X-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats