DPACS: Hardware Accelerated Dynamic Neural Network Pruning through Algorithm-Architecture Co-design

Gao, Y; Zhang, B; Qi, X; So, HKH

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/3575693.3575728

Supplementary

Citations:
Appears in Collections:
- Electrical & Electronic Engineering: Conference papers

Conference Paper: DPACS: Hardware Accelerated Dynamic Neural Network Pruning through Algorithm-Architecture Co-design

Title	DPACS: Hardware Accelerated Dynamic Neural Network Pruning through Algorithm-Architecture Co-design
Authors	Gao, Y Zhang, B Qi, X So, HKH
Keywords	Dynamic pruning DPACS FPGA Hardware acceleration Algorithm architecture co-design
Issue Date	2023
Publisher	ACM.
Citation	28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Vancouver, BC., Canada, March 25-29, 2023. In ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, v. 2 n. Jan 2023, p. 237-251 How to Cite? DOI: http://dx.doi.org/10.1145/3575693.3575728
Abstract	By eliminating compute operations intelligently based on the run time input, dynamic pruning (DP) promises to improve deep neural network inference speed substantially without incurring a major impact on their accuracy. Although many DP algorithms with good pruning performance have been proposed, it remains a challenge to translate these theoretical reductions in compute operations into satisfactory end-to-end speedups in practical real-world implementations. The overhead of identifying operations to be pruned during run time, the need to efficiently process the resulting dynamic dataflow, and the non-trivial memory I/O bottleneck that emerges as the number of compute operations reduces, have all contributed to the challenge of implementing practical DP systems. In this paper, the design and implementation of DPACS are presented to address these challenges. DPACS utilizes a hardware-aware dynamic spatial and channel pruning algorithm in conjunction with a dynamic dataflow engine in hardware to facilitate efficient processing of the pruned network. A channel mask precomputation scheme is designed to reduce memory I/O, and a dedicated inter-layer pipeline is used to achieve efficient indexing and dataflow of sparse activation. Extensive design space exploration has been performed using two architectural variations implemented on FPGA to accelerate multiple networks from the ResNet family on the ImageNet and CIFAR10 dataset across a wide range of pruning ratios. Across the spectrum of configurations, DPACS is able to achieve to 1.1imes to 3.9imes end-to-end speedup over a baseline hardware implementation without pruning. Analysis of the tradeoff among accuracy, compute, and memory I/O performance highlights the importance of algorithm-architecture codesign in developing DP systems.
Persistent Identifier	http://hdl.handle.net/10722/324825

DC Field	Value	Language
dc.contributor.author	Gao, Y	-
dc.contributor.author	Zhang, B	-
dc.contributor.author	Qi, X	-
dc.contributor.author	So, HKH	-
dc.date.accessioned	2023-02-20T01:38:28Z	-
dc.date.available	2023-02-20T01:38:28Z	-
dc.date.issued	2023	-
dc.identifier.citation	28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Vancouver, BC., Canada, March 25-29, 2023. In ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, v. 2 n. Jan 2023, p. 237-251	-
dc.identifier.uri	http://hdl.handle.net/10722/324825	-
dc.description.abstract	By eliminating compute operations intelligently based on the run time input, dynamic pruning (DP) promises to improve deep neural network inference speed substantially without incurring a major impact on their accuracy. Although many DP algorithms with good pruning performance have been proposed, it remains a challenge to translate these theoretical reductions in compute operations into satisfactory end-to-end speedups in practical real-world implementations. The overhead of identifying operations to be pruned during run time, the need to efficiently process the resulting dynamic dataflow, and the non-trivial memory I/O bottleneck that emerges as the number of compute operations reduces, have all contributed to the challenge of implementing practical DP systems. In this paper, the design and implementation of DPACS are presented to address these challenges. DPACS utilizes a hardware-aware dynamic spatial and channel pruning algorithm in conjunction with a dynamic dataflow engine in hardware to facilitate efficient processing of the pruned network. A channel mask precomputation scheme is designed to reduce memory I/O, and a dedicated inter-layer pipeline is used to achieve efficient indexing and dataflow of sparse activation. Extensive design space exploration has been performed using two architectural variations implemented on FPGA to accelerate multiple networks from the ResNet family on the ImageNet and CIFAR10 dataset across a wide range of pruning ratios. Across the spectrum of configurations, DPACS is able to achieve to 1.1imes to 3.9imes end-to-end speedup over a baseline hardware implementation without pruning. Analysis of the tradeoff among accuracy, compute, and memory I/O performance highlights the importance of algorithm-architecture codesign in developing DP systems.	-
dc.language	eng	-
dc.publisher	ACM.	-
dc.relation.ispartof	ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems	-
dc.subject	Dynamic pruning	-
dc.subject	DPACS	-
dc.subject	FPGA	-
dc.subject	Hardware acceleration	-
dc.subject	Algorithm architecture co-design	-
dc.title	DPACS: Hardware Accelerated Dynamic Neural Network Pruning through Algorithm-Architecture Co-design	-
dc.type	Conference_Paper	-
dc.identifier.email	Qi, X: xjqi@eee.hku.hk	-
dc.identifier.email	So, HKH: hso@eee.hku.hk	-
dc.identifier.authority	Qi, X=rp02666	-
dc.identifier.authority	So, HKH=rp00169	-
dc.identifier.doi	10.1145/3575693.3575728	-
dc.identifier.hkuros	344033	-
dc.identifier.volume	2	-
dc.identifier.issue	Jan 2023	-
dc.identifier.spage	237	-
dc.identifier.epage	251	-
dc.publisher.place	United States	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: DPACS: Hardware Accelerated Dynamic Neural Network Pruning through Algorithm-Architecture Co-design

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats