Binary Weight Multi-Bit Activation Quantization for Compute-in-Memory CNN Accelerators

Zhou, Wenyong; Liu, Zhengwu; Ren, Yuan; Wong, Ngai

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TCAD.2025.3595830
Scopus: eid_2-s2.0-105013344940
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Electrical & Electronic Engineering: Journal/Magazine Articles

Article: Binary Weight Multi-Bit Activation Quantization for Compute-in-Memory CNN Accelerators

Title	Binary Weight Multi-Bit Activation Quantization for Compute-in-Memory CNN Accelerators
Authors	Zhou, Wenyong Liu, Zhengwu Ren, Yuan Wong, Ngai
Keywords	Compute-in-Memory FeFET Model Qquantization RRAM SRAM
Issue Date	1-Jan-2025
Publisher	Institute of Electrical and Electronics Engineers
Citation	IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2025 How to Cite? DOI: http://dx.doi.org/10.1109/TCAD.2025.3595830
Abstract	Compute-in-memory (CIM) accelerators have emerged as a promising way for enhancing the energy efficiency of convolutional neural networks (CNNs). Deploying CNNs on CIM platforms generally requires quantization of network weights and activations to meet hardware constraints. However, existing approaches either prioritize hardware efficiency with binary weight and activation quantization at the cost of accuracy, or utilize multi-bit weights and activations for greater accuracy but limited efficiency. In this paper, we introduce a novel binary weight multi-bit activation (BWMA) method for CNNs on CIM-based accelerators. Our contributions include: deriving closed-form solutions for weight quantization in each layer, significantly improving the representational capabilities of binarized weights; and developing a differentiable function for activation quantization, approximating the ideal multi-bit function while bypassing the extensive search for optimal settings. Through comprehensive experiments on CIFAR-10 and ImageNet datasets, we show that BWMA achieves notable accuracy improvements over existing methods, registering gains of 1.44%-5.46% and 0.35%-5.37% on respective datasets. Moreover, hardware simulation results indicate that 4-bit activation quantization strikes the optimal balance between hardware cost and model performance.
Persistent Identifier	http://hdl.handle.net/10722/362535
ISSN	0278-0070 2023 Impact Factor: 2.7 2023 SCImago Journal Rankings: 0.957

DC Field	Value	Language
dc.contributor.author	Zhou, Wenyong	-
dc.contributor.author	Liu, Zhengwu	-
dc.contributor.author	Ren, Yuan	-
dc.contributor.author	Wong, Ngai	-
dc.date.accessioned	2025-09-26T00:35:59Z	-
dc.date.available	2025-09-26T00:35:59Z	-
dc.date.issued	2025-01-01	-
dc.identifier.citation	IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2025	-
dc.identifier.issn	0278-0070	-
dc.identifier.uri	http://hdl.handle.net/10722/362535	-
dc.description.abstract	Compute-in-memory (CIM) accelerators have emerged as a promising way for enhancing the energy efficiency of convolutional neural networks (CNNs). Deploying CNNs on CIM platforms generally requires quantization of network weights and activations to meet hardware constraints. However, existing approaches either prioritize hardware efficiency with binary weight and activation quantization at the cost of accuracy, or utilize multi-bit weights and activations for greater accuracy but limited efficiency. In this paper, we introduce a novel binary weight multi-bit activation (BWMA) method for CNNs on CIM-based accelerators. Our contributions include: deriving closed-form solutions for weight quantization in each layer, significantly improving the representational capabilities of binarized weights; and developing a differentiable function for activation quantization, approximating the ideal multi-bit function while bypassing the extensive search for optimal settings. Through comprehensive experiments on CIFAR-10 and ImageNet datasets, we show that BWMA achieves notable accuracy improvements over existing methods, registering gains of 1.44%-5.46% and 0.35%-5.37% on respective datasets. Moreover, hardware simulation results indicate that 4-bit activation quantization strikes the optimal balance between hardware cost and model performance.	-
dc.language	eng	-
dc.publisher	Institute of Electrical and Electronics Engineers	-
dc.relation.ispartof	IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems	-
dc.subject	Compute-in-Memory	-
dc.subject	FeFET	-
dc.subject	Model Qquantization	-
dc.subject	RRAM	-
dc.subject	SRAM	-
dc.title	Binary Weight Multi-Bit Activation Quantization for Compute-in-Memory CNN Accelerators	-
dc.type	Article	-
dc.identifier.doi	10.1109/TCAD.2025.3595830	-
dc.identifier.scopus	eid_2-s2.0-105013344940	-
dc.identifier.eissn	1937-4151	-
dc.identifier.issnl	0278-0070	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Binary Weight Multi-Bit Activation Quantization for Compute-in-Memory CNN Accelerators

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats