Fast Training of Deep Learning Models over Multiple GPUs

YI, X; Luo, Z; Meng, C; Wang, M; Long, G; Wu, C; Yang, J; Lin, W

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/3423211.3425675
Scopus: eid_2-s2.0-85098523720
WOS: WOS:000684175200008

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Fast Training of Deep Learning Models over Multiple GPUs

Title	Fast Training of Deep Learning Models over Multiple GPUs
Authors	YI, X Luo, Z Meng, C Wang, M Long, G Wu, C Yang, J Lin, W
Keywords	Distributed training data parallel model parallel
Issue Date	2020
Publisher	Association for Computing Machinery (ACM).
Citation	Proceedings of the 21st International Middleware Conference 2020 (Middleware '20), Virtual Confernece, Delft, the Netherlands, 7-11 December 2020, p. 105-118 How to Cite? DOI: http://dx.doi.org/10.1145/3423211.3425675
Abstract	This paper proposes FastT, a transparent module to work with the TensorFlow framework for automatically identifying a satisfying deployment and execution order of operations in DNN models over multiple GPUs, for expedited model training. We propose white-box algorithms to compute the strategies with small computing resource consumption in a short time. Recently, similar studies have been done to optimize device placement using reinforcement learning. Compared to those works which learn to optimize device placement of operations in several hours using large amounts of computing resources, our approach can find excellent device placement and execution order within minutes using the same computing node as for training. We design a list of scheduling algorithms to compute the device placement and execution order for each operation and also design an algorithm to split operations in the critical path to support fine-grained (mixed) data and model parallelism to further improve the training speed in each iteration. We compare FastT with representative strategies and obtain insights on the best strategies for training different types of DNN models based on extensive testbed experiments.
Persistent Identifier	http://hdl.handle.net/10722/301416
ISBN	9781450381536
ISI Accession Number ID	WOS:000684175200008

DC Field	Value	Language
dc.contributor.author	YI, X	-
dc.contributor.author	Luo, Z	-
dc.contributor.author	Meng, C	-
dc.contributor.author	Wang, M	-
dc.contributor.author	Long, G	-
dc.contributor.author	Wu, C	-
dc.contributor.author	Yang, J	-
dc.contributor.author	Lin, W	-
dc.date.accessioned	2021-07-27T08:10:44Z	-
dc.date.available	2021-07-27T08:10:44Z	-
dc.date.issued	2020	-
dc.identifier.citation	Proceedings of the 21st International Middleware Conference 2020 (Middleware '20), Virtual Confernece, Delft, the Netherlands, 7-11 December 2020, p. 105-118	-
dc.identifier.isbn	9781450381536	-
dc.identifier.uri	http://hdl.handle.net/10722/301416	-
dc.description.abstract	This paper proposes FastT, a transparent module to work with the TensorFlow framework for automatically identifying a satisfying deployment and execution order of operations in DNN models over multiple GPUs, for expedited model training. We propose white-box algorithms to compute the strategies with small computing resource consumption in a short time. Recently, similar studies have been done to optimize device placement using reinforcement learning. Compared to those works which learn to optimize device placement of operations in several hours using large amounts of computing resources, our approach can find excellent device placement and execution order within minutes using the same computing node as for training. We design a list of scheduling algorithms to compute the device placement and execution order for each operation and also design an algorithm to split operations in the critical path to support fine-grained (mixed) data and model parallelism to further improve the training speed in each iteration. We compare FastT with representative strategies and obtain insights on the best strategies for training different types of DNN models based on extensive testbed experiments.	-
dc.language	eng	-
dc.publisher	Association for Computing Machinery (ACM).	-
dc.relation.ispartof	Proceedings of the 21st International Middleware Conference (Middleware '20)	-
dc.subject	Distributed training	-
dc.subject	data parallel	-
dc.subject	model parallel	-
dc.title	Fast Training of Deep Learning Models over Multiple GPUs	-
dc.type	Conference_Paper	-
dc.identifier.email	Wu, C: cwu@cs.hku.hk	-
dc.identifier.authority	Wu, C=rp01397	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1145/3423211.3425675	-
dc.identifier.scopus	eid_2-s2.0-85098523720	-
dc.identifier.hkuros	323511	-
dc.identifier.spage	105	-
dc.identifier.epage	118	-
dc.identifier.isi	WOS:000684175200008	-
dc.publisher.place	New York, NY	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Fast Training of Deep Learning Models over Multiple GPUs

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats