Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

Chang, SE; Li, Y; Sun, M; Shi, R; So, HKH; Qian, X; Wang, Y; Lin, X

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/HPCA51647.2021.00027
WOS: WOS:000671076000016

Supplementary

Citations:
- Web of Science: 0
Appears in Collections:
- Electrical & Electronic Engineering: Conference papers

Conference Paper: Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

Title	Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework
Authors	Chang, SE Li, Y Sun, M Shi, R So, HKH Qian, X Wang, Y Lin, X
Keywords	Embedded systems Field programmable gate arrays Learning (artificial intelligence) Matrix multiplication Neural nets
Issue Date	2021
Publisher	IEEE Computer Society.
Citation	IEEE International Symposium on High-Performance Computer Architecture (HPCA) (Virtual Event), Seoul, Korea, 27 February-3 March, 2021. In Proceedings: 27th IEEE International Symposium on High-Performance Computer Architecture, 27 February-3 March 2021, p. 208-220 How to Cite? DOI: http://dx.doi.org/10.1109/HPCA51647.2021.00027
Abstract	Deep Neural Networks (DNNs) have achieved extraordinary performance in various application domains. To support diverse DNN models, efficient implementations of DNN inference on edge-computing platforms, e.g., ASICs, FPGAs, and embedded systems, are extensively investigated. Due to the huge model size and computation amount, model compression is a critical step to deploy DNN models on edge devices. This paper focuses on weight quantization, a hardware-friendly model compression approach that is complementary to weight pruning. Unlike existing methods that use the same quantization scheme for all weights, we propose the first solution that applies different quantization schemes for different rows of the weight matrix. It is motivated by (1) the distribution of the weights in the different rows are not the same; and (2) the potential of achieving better utilization of heterogeneous FPGA hardware resources. To achieve that, we first propose a hardware-friendly quantization scheme named sum-of-power-of-2 (SP2) suitable for Gaussian-like weight distribution, in which the multiplication arithmetic can be replaced with logic shifter and adder, thereby enabling highly efficient implementations with the FPGA LUT resources. In contrast, the existing fixed-point quantization is suitable for Uniform-like weight distribution and can be implemented efficiently by DSP. Then to fully explore the resources, we propose an FPGA-centric mixed scheme quantization (MSQ) with an ensemble of the proposed SP2 and the fixed-point schemes. Combining the two schemes can maintain, or even increase accuracy due to better matching with weight distributions.
Persistent Identifier	http://hdl.handle.net/10722/323349
ISI Accession Number ID	WOS:000671076000016

DC Field	Value	Language
dc.contributor.author	Chang, SE	-
dc.contributor.author	Li, Y	-
dc.contributor.author	Sun, M	-
dc.contributor.author	Shi, R	-
dc.contributor.author	So, HKH	-
dc.contributor.author	Qian, X	-
dc.contributor.author	Wang, Y	-
dc.contributor.author	Lin, X	-
dc.date.accessioned	2022-12-09T10:45:18Z	-
dc.date.available	2022-12-09T10:45:18Z	-
dc.date.issued	2021	-
dc.identifier.citation	IEEE International Symposium on High-Performance Computer Architecture (HPCA) (Virtual Event), Seoul, Korea, 27 February-3 March, 2021. In Proceedings: 27th IEEE International Symposium on High-Performance Computer Architecture, 27 February-3 March 2021, p. 208-220	-
dc.identifier.uri	http://hdl.handle.net/10722/323349	-
dc.description.abstract	Deep Neural Networks (DNNs) have achieved extraordinary performance in various application domains. To support diverse DNN models, efficient implementations of DNN inference on edge-computing platforms, e.g., ASICs, FPGAs, and embedded systems, are extensively investigated. Due to the huge model size and computation amount, model compression is a critical step to deploy DNN models on edge devices. This paper focuses on weight quantization, a hardware-friendly model compression approach that is complementary to weight pruning. Unlike existing methods that use the same quantization scheme for all weights, we propose the first solution that applies different quantization schemes for different rows of the weight matrix. It is motivated by (1) the distribution of the weights in the different rows are not the same; and (2) the potential of achieving better utilization of heterogeneous FPGA hardware resources. To achieve that, we first propose a hardware-friendly quantization scheme named sum-of-power-of-2 (SP2) suitable for Gaussian-like weight distribution, in which the multiplication arithmetic can be replaced with logic shifter and adder, thereby enabling highly efficient implementations with the FPGA LUT resources. In contrast, the existing fixed-point quantization is suitable for Uniform-like weight distribution and can be implemented efficiently by DSP. Then to fully explore the resources, we propose an FPGA-centric mixed scheme quantization (MSQ) with an ensemble of the proposed SP2 and the fixed-point schemes. Combining the two schemes can maintain, or even increase accuracy due to better matching with weight distributions.	-
dc.language	eng	-
dc.publisher	IEEE Computer Society.	-
dc.relation.ispartof	Proceedings: 27th IEEE International Symposium on High-Performance Computer Architecture, 27 February-3 March 2021	-
dc.rights	Proceedings: 27th IEEE International Symposium on High-Performance Computer Architecture, 27 February-3 March 2021. Copyright © IEEE Computer Society.	-
dc.subject	Embedded systems	-
dc.subject	Field programmable gate arrays	-
dc.subject	Learning (artificial intelligence)	-
dc.subject	Matrix multiplication	-
dc.subject	Neural nets	-
dc.title	Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework	-
dc.type	Conference_Paper	-
dc.identifier.email	So, HKH: hso@eee.hku.hk	-
dc.identifier.authority	So, HKH=rp00169	-
dc.identifier.doi	10.1109/HPCA51647.2021.00027	-
dc.identifier.hkuros	342981	-
dc.identifier.spage	208	-
dc.identifier.epage	220	-
dc.identifier.isi	WOS:000671076000016	-
dc.publisher.place	United States	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats