NASPipe: high performance and reproducible pipeline parallel supernet training via causal synchronous parallelism

ZHAO, S; LI, F; CHEN, X; SHEN, T; Chen, L; Wang, S; Zhang, N; Li, C; Cui, H

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/3503222.3507735
WOS: WOS:000810486300027

Supplementary

Citations:
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: NASPipe: high performance and reproducible pipeline parallel supernet training via causal synchronous parallelism

Title	NASPipe: high performance and reproducible pipeline parallel supernet training via causal synchronous parallelism
Authors	ZHAO, S LI, F CHEN, X SHEN, T Chen, L Wang, S Zhang, N Li, C Cui, H
Issue Date	2022
Publisher	Association for Computing Machinery (ACM).
Citation	Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2022), Lausanne, Switzerland, 28 February - 4 March 2022, p. 374-387 How to Cite? DOI: http://dx.doi.org/10.1145/3503222.3507735
Abstract	Supernet training, a prevalent and important paradigm in Neural Architecture Search, embeds the whole DNN architecture search space into one monolithic supernet, iteratively activates a subset of the supernet (i.e., a subnet) for fitting each batch of data, and searches a high-quality subnet which meets specific requirements. Although training subnets in parallel on multiple GPUs is desirable for acceleration, there inherently exists a race hazard that concurrent subnets may access the same DNN layers. Existing systems support neither efficiently parallelizing subnets’ training executions, nor resolving the race hazard deterministically, leading to unreproducible training procedures and potentiallly non-trivial accuracy loss. We present NASPipe, the first high-performance and reproducible distributed supernet training system via causal synchronous parallel (CSP) pipeline scheduling abstraction: NASPipe partitions a supernet across GPUs and concurrently executes multiple generated sub-tasks (subnets) in a pipelined manner; meanwhile, it oversees the correlations between the subnets and deterministically resolves any causal dependency caused by subnets’ layer sharing. To obtain high performance, NASPipe’s CSP scheduler exploits the fact that the larger a supernet spans, the fewer dependencies manifest between chronologically close subnets; therefore, it aggressively schedules the subnets with larger chronological orders into execution, only if they are not causally dependent on unfinished precedent subnets. Moreover, to relieve the excessive GPU memory burden for holding the whole supernet’s parameters, NASPipe uses a context switch technique that stashes the whole supernet in CPU memory, precisely predicts the subnets’ schedule, and pre-fetches/evicts a subnet before/after its execution. The evaluation shows that NASPipe is the only system that retains supernet training reproducibility, while achieving a comparable and even higher performance (up to 7.8X) compared to three recent pipeline training systems (e.g., GPipe).
Persistent Identifier	http://hdl.handle.net/10722/312764
ISBN	9781450392051
ISI Accession Number ID	WOS:000810486300027

DC Field	Value	Language
dc.contributor.author	ZHAO, S	-
dc.contributor.author	LI, F	-
dc.contributor.author	CHEN, X	-
dc.contributor.author	SHEN, T	-
dc.contributor.author	Chen, L	-
dc.contributor.author	Wang, S	-
dc.contributor.author	Zhang, N	-
dc.contributor.author	Li, C	-
dc.contributor.author	Cui, H	-
dc.date.accessioned	2022-05-12T10:55:15Z	-
dc.date.available	2022-05-12T10:55:15Z	-
dc.date.issued	2022	-
dc.identifier.citation	Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2022), Lausanne, Switzerland, 28 February - 4 March 2022, p. 374-387	-
dc.identifier.isbn	9781450392051	-
dc.identifier.uri	http://hdl.handle.net/10722/312764	-
dc.description.abstract	Supernet training, a prevalent and important paradigm in Neural Architecture Search, embeds the whole DNN architecture search space into one monolithic supernet, iteratively activates a subset of the supernet (i.e., a subnet) for fitting each batch of data, and searches a high-quality subnet which meets specific requirements. Although training subnets in parallel on multiple GPUs is desirable for acceleration, there inherently exists a race hazard that concurrent subnets may access the same DNN layers. Existing systems support neither efficiently parallelizing subnets’ training executions, nor resolving the race hazard deterministically, leading to unreproducible training procedures and potentiallly non-trivial accuracy loss. We present NASPipe, the first high-performance and reproducible distributed supernet training system via causal synchronous parallel (CSP) pipeline scheduling abstraction: NASPipe partitions a supernet across GPUs and concurrently executes multiple generated sub-tasks (subnets) in a pipelined manner; meanwhile, it oversees the correlations between the subnets and deterministically resolves any causal dependency caused by subnets’ layer sharing. To obtain high performance, NASPipe’s CSP scheduler exploits the fact that the larger a supernet spans, the fewer dependencies manifest between chronologically close subnets; therefore, it aggressively schedules the subnets with larger chronological orders into execution, only if they are not causally dependent on unfinished precedent subnets. Moreover, to relieve the excessive GPU memory burden for holding the whole supernet’s parameters, NASPipe uses a context switch technique that stashes the whole supernet in CPU memory, precisely predicts the subnets’ schedule, and pre-fetches/evicts a subnet before/after its execution. The evaluation shows that NASPipe is the only system that retains supernet training reproducibility, while achieving a comparable and even higher performance (up to 7.8X) compared to three recent pipeline training systems (e.g., GPipe).	-
dc.language	eng	-
dc.publisher	Association for Computing Machinery (ACM).	-
dc.relation.ispartof	Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2022)	-
dc.title	NASPipe: high performance and reproducible pipeline parallel supernet training via causal synchronous parallelism	-
dc.type	Conference_Paper	-
dc.identifier.email	Cui, H: heming@cs.hku.hk	-
dc.identifier.authority	Cui, H=rp02008	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1145/3503222.3507735	-
dc.identifier.hkuros	333064	-
dc.identifier.spage	374	-
dc.identifier.epage	387	-
dc.identifier.isi	WOS:000810486300027	-
dc.publisher.place	New York, NY	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: NASPipe: high performance and reproducible pipeline parallel supernet training via causal synchronous parallelism

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats