File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators

TitleMixed Precision Quantization for ReRAM-based DNN Inference Accelerators
Authors
KeywordsMixed precision quantization
ReRAM
DNN inference accelerators
Issue Date2021
PublisherAssociation for Computing Machinery. The Proceedings' web site is located at https://dl.acm.org/conference/aspdac/proceedings {http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000194)
Citation
Proceedings of the 26th Asia and South Pacific Design Automation Conference (ASP-DAC), Virtual Conference, Tokyo, Japan, 18-21 January 2021, p. 372–377 How to Cite?
AbstractReRAM-based accelerators have shown great potential for accelerating DNN inference because ReRAM crossbars can perform analog matrix-vector multiplication operations with low latency and energy consumption. However, these crossbars require the use of ADCs which constitute a significant fraction of the cost of MVM operations. The overhead of ADCs can be mitigated via partial sum quantization. However, prior quantization flows for DNN inference accelerators do not consider partial sum quantization which is not highly relevant to traditional digital architectures. To address this issue, we propose a mixed precision quantization scheme for ReRAM-based DNN inference accelerators where weight quantization, input quantization, and partial sum quantization are jointly applied for each DNN layer. We also propose an automated quantization flow powered by deep reinforcement learning to search for the best quantization configuration in the large design space. Our evaluation shows that the proposed mixed precision quantization scheme and quantization flow reduce inference latency and energy consumption by up to 3.89× and 4.84×, respectively, while only losing 1.18% in DNN inference accuracy.
Persistent Identifierhttp://hdl.handle.net/10722/305214
ISBN
ISSN
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorHuang, S-
dc.contributor.authorAnkit, A-
dc.contributor.authorSilveira, P-
dc.contributor.authorAntunes, R-
dc.contributor.authorChalamalasetti, SR-
dc.contributor.authorHajj, IE-
dc.contributor.authorKim, DE-
dc.contributor.authorAguiar, G-
dc.contributor.authorBruel, P-
dc.contributor.authorSerebryakov, G-
dc.contributor.authorXu, C-
dc.contributor.authorLi, C-
dc.contributor.authorFaraboschi, P-
dc.contributor.authorStrachan, JP-
dc.contributor.authorChen, D-
dc.contributor.authorRoy, K-
dc.contributor.authorHwu, WW-
dc.contributor.authorMilojicic, D-
dc.date.accessioned2021-10-20T10:06:15Z-
dc.date.available2021-10-20T10:06:15Z-
dc.date.issued2021-
dc.identifier.citationProceedings of the 26th Asia and South Pacific Design Automation Conference (ASP-DAC), Virtual Conference, Tokyo, Japan, 18-21 January 2021, p. 372–377-
dc.identifier.isbn9781450379991-
dc.identifier.issn2153-6961-
dc.identifier.urihttp://hdl.handle.net/10722/305214-
dc.description.abstractReRAM-based accelerators have shown great potential for accelerating DNN inference because ReRAM crossbars can perform analog matrix-vector multiplication operations with low latency and energy consumption. However, these crossbars require the use of ADCs which constitute a significant fraction of the cost of MVM operations. The overhead of ADCs can be mitigated via partial sum quantization. However, prior quantization flows for DNN inference accelerators do not consider partial sum quantization which is not highly relevant to traditional digital architectures. To address this issue, we propose a mixed precision quantization scheme for ReRAM-based DNN inference accelerators where weight quantization, input quantization, and partial sum quantization are jointly applied for each DNN layer. We also propose an automated quantization flow powered by deep reinforcement learning to search for the best quantization configuration in the large design space. Our evaluation shows that the proposed mixed precision quantization scheme and quantization flow reduce inference latency and energy consumption by up to 3.89× and 4.84×, respectively, while only losing 1.18% in DNN inference accuracy.-
dc.languageeng-
dc.publisherAssociation for Computing Machinery. The Proceedings' web site is located at https://dl.acm.org/conference/aspdac/proceedings {http://ieeexplore.ieee.org/xpl/conhome.jsp?punumber=1000194)-
dc.relation.ispartofAsia and South Pacific Design Automation Conference Proceedings-
dc.rightsAsia and South Pacific Design Automation Conference Proceedings. Copyright © Association for Computing Machinery.-
dc.subjectMixed precision quantization-
dc.subjectReRAM-
dc.subjectDNN inference accelerators-
dc.titleMixed Precision Quantization for ReRAM-based DNN Inference Accelerators-
dc.typeConference_Paper-
dc.identifier.emailLi, C: canl@hku.hk-
dc.identifier.authorityLi, C=rp02706-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1145/3394885.3431554-
dc.identifier.scopuseid_2-s2.0-85100574970-
dc.identifier.hkuros328219-
dc.identifier.spage372-
dc.identifier.epage377-
dc.identifier.isiWOS:000668583700069-
dc.publisher.placeUnited States-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats