File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1145/3423211.3425675
- Scopus: eid_2-s2.0-85098523720
- WOS: WOS:000684175200008
Supplementary
- Citations:
- Appears in Collections:
Conference Paper: Fast Training of Deep Learning Models over Multiple GPUs
Title | Fast Training of Deep Learning Models over Multiple GPUs |
---|---|
Authors | |
Keywords | Distributed training data parallel model parallel |
Issue Date | 2020 |
Publisher | Association for Computing Machinery (ACM). |
Citation | Proceedings of the 21st International Middleware Conference 2020 (Middleware '20), Virtual Confernece, Delft, the Netherlands, 7-11 December 2020, p. 105-118 How to Cite? |
Abstract | This paper proposes FastT, a transparent module to work with the TensorFlow framework for automatically identifying a satisfying deployment and execution order of operations in DNN models over multiple GPUs, for expedited model training. We propose white-box algorithms to compute the strategies with small computing resource consumption in a short time. Recently, similar studies have been done to optimize device placement using reinforcement learning. Compared to those works which learn to optimize device placement of operations in several hours using large amounts of computing resources, our approach can find excellent device placement and execution order within minutes using the same computing node as for training. We design a list of scheduling algorithms to compute the device placement and execution order for each operation and also design an algorithm to split operations in the critical path to support fine-grained (mixed) data and model parallelism to further improve the training speed in each iteration. We compare FastT with representative strategies and obtain insights on the best strategies for training different types of DNN models based on extensive testbed experiments. |
Persistent Identifier | http://hdl.handle.net/10722/301416 |
ISBN | |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | YI, X | - |
dc.contributor.author | Luo, Z | - |
dc.contributor.author | Meng, C | - |
dc.contributor.author | Wang, M | - |
dc.contributor.author | Long, G | - |
dc.contributor.author | Wu, C | - |
dc.contributor.author | Yang, J | - |
dc.contributor.author | Lin, W | - |
dc.date.accessioned | 2021-07-27T08:10:44Z | - |
dc.date.available | 2021-07-27T08:10:44Z | - |
dc.date.issued | 2020 | - |
dc.identifier.citation | Proceedings of the 21st International Middleware Conference 2020 (Middleware '20), Virtual Confernece, Delft, the Netherlands, 7-11 December 2020, p. 105-118 | - |
dc.identifier.isbn | 9781450381536 | - |
dc.identifier.uri | http://hdl.handle.net/10722/301416 | - |
dc.description.abstract | This paper proposes FastT, a transparent module to work with the TensorFlow framework for automatically identifying a satisfying deployment and execution order of operations in DNN models over multiple GPUs, for expedited model training. We propose white-box algorithms to compute the strategies with small computing resource consumption in a short time. Recently, similar studies have been done to optimize device placement using reinforcement learning. Compared to those works which learn to optimize device placement of operations in several hours using large amounts of computing resources, our approach can find excellent device placement and execution order within minutes using the same computing node as for training. We design a list of scheduling algorithms to compute the device placement and execution order for each operation and also design an algorithm to split operations in the critical path to support fine-grained (mixed) data and model parallelism to further improve the training speed in each iteration. We compare FastT with representative strategies and obtain insights on the best strategies for training different types of DNN models based on extensive testbed experiments. | - |
dc.language | eng | - |
dc.publisher | Association for Computing Machinery (ACM). | - |
dc.relation.ispartof | Proceedings of the 21st International Middleware Conference (Middleware '20) | - |
dc.subject | Distributed training | - |
dc.subject | data parallel | - |
dc.subject | model parallel | - |
dc.title | Fast Training of Deep Learning Models over Multiple GPUs | - |
dc.type | Conference_Paper | - |
dc.identifier.email | Wu, C: cwu@cs.hku.hk | - |
dc.identifier.authority | Wu, C=rp01397 | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1145/3423211.3425675 | - |
dc.identifier.scopus | eid_2-s2.0-85098523720 | - |
dc.identifier.hkuros | 323511 | - |
dc.identifier.spage | 105 | - |
dc.identifier.epage | 118 | - |
dc.identifier.isi | WOS:000684175200008 | - |
dc.publisher.place | New York, NY | - |