File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: DPACS: Hardware Accelerated Dynamic Neural Network Pruning through Algorithm-Architecture Co-design

TitleDPACS: Hardware Accelerated Dynamic Neural Network Pruning through Algorithm-Architecture Co-design
Authors
KeywordsDynamic pruning
DPACS
FPGA
Hardware acceleration
Algorithm architecture co-design
Issue Date2023
PublisherACM.
Citation
28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Vancouver, BC., Canada, March 25-29, 2023. In ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, v. 2 n. Jan 2023, p. 237-251 How to Cite?
AbstractBy eliminating compute operations intelligently based on the run time input, dynamic pruning (DP) promises to improve deep neural network inference speed substantially without incurring a major impact on their accuracy. Although many DP algorithms with good pruning performance have been proposed, it remains a challenge to translate these theoretical reductions in compute operations into satisfactory end-to-end speedups in practical real-world implementations. The overhead of identifying operations to be pruned during run time, the need to efficiently process the resulting dynamic dataflow, and the non-trivial memory I/O bottleneck that emerges as the number of compute operations reduces, have all contributed to the challenge of implementing practical DP systems. In this paper, the design and implementation of DPACS are presented to address these challenges. DPACS utilizes a hardware-aware dynamic spatial and channel pruning algorithm in conjunction with a dynamic dataflow engine in hardware to facilitate efficient processing of the pruned network. A channel mask precomputation scheme is designed to reduce memory I/O, and a dedicated inter-layer pipeline is used to achieve efficient indexing and dataflow of sparse activation. Extensive design space exploration has been performed using two architectural variations implemented on FPGA to accelerate multiple networks from the ResNet family on the ImageNet and CIFAR10 dataset across a wide range of pruning ratios. Across the spectrum of configurations, DPACS is able to achieve to 1.1imes to 3.9imes end-to-end speedup over a baseline hardware implementation without pruning. Analysis of the tradeoff among accuracy, compute, and memory I/O performance highlights the importance of algorithm-architecture codesign in developing DP systems.
Persistent Identifierhttp://hdl.handle.net/10722/324825

 

DC FieldValueLanguage
dc.contributor.authorGao, Y-
dc.contributor.authorZhang, B-
dc.contributor.authorQi, X-
dc.contributor.authorSo, HKH-
dc.date.accessioned2023-02-20T01:38:28Z-
dc.date.available2023-02-20T01:38:28Z-
dc.date.issued2023-
dc.identifier.citation28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Vancouver, BC., Canada, March 25-29, 2023. In ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, v. 2 n. Jan 2023, p. 237-251-
dc.identifier.urihttp://hdl.handle.net/10722/324825-
dc.description.abstractBy eliminating compute operations intelligently based on the run time input, dynamic pruning (DP) promises to improve deep neural network inference speed substantially without incurring a major impact on their accuracy. Although many DP algorithms with good pruning performance have been proposed, it remains a challenge to translate these theoretical reductions in compute operations into satisfactory end-to-end speedups in practical real-world implementations. The overhead of identifying operations to be pruned during run time, the need to efficiently process the resulting dynamic dataflow, and the non-trivial memory I/O bottleneck that emerges as the number of compute operations reduces, have all contributed to the challenge of implementing practical DP systems. In this paper, the design and implementation of DPACS are presented to address these challenges. DPACS utilizes a hardware-aware dynamic spatial and channel pruning algorithm in conjunction with a dynamic dataflow engine in hardware to facilitate efficient processing of the pruned network. A channel mask precomputation scheme is designed to reduce memory I/O, and a dedicated inter-layer pipeline is used to achieve efficient indexing and dataflow of sparse activation. Extensive design space exploration has been performed using two architectural variations implemented on FPGA to accelerate multiple networks from the ResNet family on the ImageNet and CIFAR10 dataset across a wide range of pruning ratios. Across the spectrum of configurations, DPACS is able to achieve to 1.1imes to 3.9imes end-to-end speedup over a baseline hardware implementation without pruning. Analysis of the tradeoff among accuracy, compute, and memory I/O performance highlights the importance of algorithm-architecture codesign in developing DP systems.-
dc.languageeng-
dc.publisherACM.-
dc.relation.ispartofASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems-
dc.subjectDynamic pruning-
dc.subjectDPACS-
dc.subjectFPGA-
dc.subjectHardware acceleration-
dc.subjectAlgorithm architecture co-design-
dc.titleDPACS: Hardware Accelerated Dynamic Neural Network Pruning through Algorithm-Architecture Co-design-
dc.typeConference_Paper-
dc.identifier.emailQi, X: xjqi@eee.hku.hk-
dc.identifier.emailSo, HKH: hso@eee.hku.hk-
dc.identifier.authorityQi, X=rp02666-
dc.identifier.authoritySo, HKH=rp00169-
dc.identifier.doi10.1145/3575693.3575728-
dc.identifier.hkuros344033-
dc.identifier.volume2-
dc.identifier.issueJan 2023-
dc.identifier.spage237-
dc.identifier.epage251-
dc.publisher.placeUnited States-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats