File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: A Reconfigurable Processing Element for Multiple-Precision Floating/Fixed-Point HPC

TitleA Reconfigurable Processing Element for Multiple-Precision Floating/Fixed-Point HPC
Authors
KeywordsArtificial neural networks
Clocks
Deep learning
Energy efficiency
fixed-point
floating-point
Hardware
HPC
MAC
Multiple-precision
PE
Random access memory
Training
Issue Date5-Oct-2023
PublisherInstitute of Electrical and Electronics Engineers
Citation
IEEE Transactions on Circuits and Systems II: Express Briefs, 2023 How to Cite?
Abstract

High-performance computing (HPC) can facilitate deep neural network (DNN) training and inference. Previous works have proposed multiple-precision floating-and fixed-point designs, but most can only handle either one independently. This brief proposes a novel reconfigurable processing element (PE) supporting both energy-efficient floating-point and fixed-point multiply-accumulate (MAC) operations. This PE can support 9×BFloat16 (BF16), 4×half-precision (FP16), 4×TensorFloat-32 (TF32) and 1×single-precision (FP32) MAC operation with 100% multiplication hardware utilization in one clock cycle. Besides, it can also support 72×INT2, 36×INT4 and 9×INT8 dot product plus one 32-bit addend. The design is realized in a 28nm-process at a 1.471GHz slow-corner clock frequency. Compared with state-of-the-art (SOTA) multiple-precision PEs, the proposed work exhibits the best energy efficiency of 834.35GFLOPS/W and 1761.41GFLOPS/W at TF32 and BF16 with at least 10× and 4× improvement, respectively, for deep learning training. Meanwhile, this design supports energy-efficient fixed-point computing with a small hardware overhead for deep learning inference.


Persistent Identifierhttp://hdl.handle.net/10722/339469
ISSN
2023 Impact Factor: 4.0
2023 SCImago Journal Rankings: 1.523

 

DC FieldValueLanguage
dc.contributor.authorLi, Boyu-
dc.contributor.authorLi, Kai-
dc.contributor.authorZhou, Jiajun-
dc.contributor.authorRen, Yuan-
dc.contributor.authorMao, Wei-
dc.contributor.authorYu, Hao-
dc.contributor.authorWong, Ngai-
dc.date.accessioned2024-03-11T10:36:53Z-
dc.date.available2024-03-11T10:36:53Z-
dc.date.issued2023-10-05-
dc.identifier.citationIEEE Transactions on Circuits and Systems II: Express Briefs, 2023-
dc.identifier.issn1549-7747-
dc.identifier.urihttp://hdl.handle.net/10722/339469-
dc.description.abstract<p>High-performance computing (HPC) can facilitate deep neural network (DNN) training and inference. Previous works have proposed multiple-precision floating-and fixed-point designs, but most can only handle either one independently. This brief proposes a novel reconfigurable processing element (PE) supporting both energy-efficient floating-point and fixed-point multiply-accumulate (MAC) operations. This PE can support 9×BFloat16 (BF16), 4×half-precision (FP16), 4×TensorFloat-32 (TF32) and 1×single-precision (FP32) MAC operation with 100% multiplication hardware utilization in one clock cycle. Besides, it can also support 72×INT2, 36×INT4 and 9×INT8 dot product plus one 32-bit addend. The design is realized in a 28nm-process at a 1.471GHz slow-corner clock frequency. Compared with state-of-the-art (SOTA) multiple-precision PEs, the proposed work exhibits the best energy efficiency of 834.35GFLOPS/W and 1761.41GFLOPS/W at TF32 and BF16 with at least 10× and 4× improvement, respectively, for deep learning training. Meanwhile, this design supports energy-efficient fixed-point computing with a small hardware overhead for deep learning inference.<br></p>-
dc.languageeng-
dc.publisherInstitute of Electrical and Electronics Engineers-
dc.relation.ispartofIEEE Transactions on Circuits and Systems II: Express Briefs-
dc.subjectArtificial neural networks-
dc.subjectClocks-
dc.subjectDeep learning-
dc.subjectEnergy efficiency-
dc.subjectfixed-point-
dc.subjectfloating-point-
dc.subjectHardware-
dc.subjectHPC-
dc.subjectMAC-
dc.subjectMultiple-precision-
dc.subjectPE-
dc.subjectRandom access memory-
dc.subjectTraining-
dc.titleA Reconfigurable Processing Element for Multiple-Precision Floating/Fixed-Point HPC-
dc.typeArticle-
dc.identifier.doi10.1109/TCSII.2023.3322259-
dc.identifier.scopuseid_2-s2.0-85174812317-
dc.identifier.eissn1558-3791-
dc.identifier.issnl1549-7747-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats