File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Optimizing distributed training deployment in heterogeneous GPU clusters

TitleOptimizing distributed training deployment in heterogeneous GPU clusters
Authors
KeywordsDistributed training
heterogeneous environment
deep learning
Issue Date2020
PublisherAssociation for Computing Machinery (ACM)
Citation
Proceedings of the 16th International Conference on emerging Networking EXperiments and Technologies (CoNEXT '20), Barcelona, Spain, 2-4 December 2020, p. 93-107 How to Cite?
AbstractThis paper proposes HeteroG, an automatic module to accelerate deep neural network training in heterogeneous GPU clusters. To train a deep learning model with large amounts of data, distributed training using data or model parallelism has been widely adopted, mostly over homogeneous devices (GPUs, network bandwidth). Heterogeneous training environments may often exist in shared clusters with GPUs of different models purchased in different batches and network connections of different bandwidth availability (e.g., due to contention). Classic data parallelism does not work well in a heterogeneous cluster, while model-parallel training is hard to plan. HeteroG enables highly-efficient distributed training over heterogeneous devices, by automatically converting a single-GPU training model to a distributed one according to the deep learning graph and available resources. HeteroG embraces operation-level hybrid parallelism, communication architecture selection and execution scheduling, based on a carefully designed strategy framework exploiting both GNN-based learning and combinatorial optimization. We compare HeteroG with existing parallelism schemes and show that it achieves up-to 222% training speed-up. HeteroG also enables efficient training of large models over a set of heterogeneous devices where simple parallelism is infeasible.
Persistent Identifierhttp://hdl.handle.net/10722/301293
ISBN

 

DC FieldValueLanguage
dc.contributor.authorYi, X-
dc.contributor.authorZhang, S-
dc.contributor.authorLuo, Z-
dc.contributor.authorLong, G-
dc.contributor.authorDiao, L-
dc.contributor.authorWu, C-
dc.contributor.authorZheng, Z-
dc.contributor.authorYang, J-
dc.contributor.authorLin, W-
dc.date.accessioned2021-07-27T08:08:58Z-
dc.date.available2021-07-27T08:08:58Z-
dc.date.issued2020-
dc.identifier.citationProceedings of the 16th International Conference on emerging Networking EXperiments and Technologies (CoNEXT '20), Barcelona, Spain, 2-4 December 2020, p. 93-107-
dc.identifier.isbn9781450379489-
dc.identifier.urihttp://hdl.handle.net/10722/301293-
dc.description.abstractThis paper proposes HeteroG, an automatic module to accelerate deep neural network training in heterogeneous GPU clusters. To train a deep learning model with large amounts of data, distributed training using data or model parallelism has been widely adopted, mostly over homogeneous devices (GPUs, network bandwidth). Heterogeneous training environments may often exist in shared clusters with GPUs of different models purchased in different batches and network connections of different bandwidth availability (e.g., due to contention). Classic data parallelism does not work well in a heterogeneous cluster, while model-parallel training is hard to plan. HeteroG enables highly-efficient distributed training over heterogeneous devices, by automatically converting a single-GPU training model to a distributed one according to the deep learning graph and available resources. HeteroG embraces operation-level hybrid parallelism, communication architecture selection and execution scheduling, based on a carefully designed strategy framework exploiting both GNN-based learning and combinatorial optimization. We compare HeteroG with existing parallelism schemes and show that it achieves up-to 222% training speed-up. HeteroG also enables efficient training of large models over a set of heterogeneous devices where simple parallelism is infeasible.-
dc.languageeng-
dc.publisherAssociation for Computing Machinery (ACM)-
dc.relation.ispartofProceedings of the 16th International Conference on emerging Networking EXperiments and Technologies-
dc.subjectDistributed training-
dc.subjectheterogeneous environment-
dc.subjectdeep learning-
dc.titleOptimizing distributed training deployment in heterogeneous GPU clusters-
dc.typeConference_Paper-
dc.identifier.emailWu, C: cwu@cs.hku.hk-
dc.identifier.authorityWu, C=rp01397-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1145/3386367.3432728-
dc.identifier.scopuseid_2-s2.0-85097614872-
dc.identifier.hkuros323512-
dc.identifier.spage93-
dc.identifier.epage107-
dc.publisher.placeNew York, NY-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats