FTDL: An FPGA-tailored Architecture for Deep Learning Systems

Shi, R; Ding, Y; Wei, X; Liu, H; So, HKH; Ding, C

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1145/3373087.3375384

Supplementary

Citations:
Appears in Collections:
- Electrical & Electronic Engineering: Conference papers

Conference Paper: FTDL: An FPGA-tailored Architecture for Deep Learning Systems

Title	FTDL: An FPGA-tailored Architecture for Deep Learning Systems
Authors	Shi, R Ding, Y Wei, X Liu, H So, HKH Ding, C
Issue Date	2020
Publisher	Association for Computing Machinery (ACM).
Citation	Proceedings of the 28th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2020), Seaside CA, USA, 23-25 February 2020, p. 320 How to Cite? DOI: http://dx.doi.org/10.1145/3373087.3375384
Abstract	Hardware acceleration of deep learning (DL) systems has been increasingly studied to achieve desirable performance and energy efficiency. The FPGA strikes a balance between high energy efficiency and fast development cycle and therefore is widely used as a DNN accelerator. However, there exists an architecture-layout mismatch in the current designs, which introduces scalability and flexibility issues, leading to irregular routing and resource imbalance problems. To address these limitations, in this work, we propose FTDL, an FPGA-tailored architecture with a parameterized and hierarchical hardware that is adaptive to different FPGA devices. FTDL has the following novelties: (i) At the architecture level, FTDL consists of Tiled Processing Elements (TPE) and super blocks, to achieve a near-to-theoretical digital signal processing (DSP) operating-frequency of 650 MHz. More importantly, FTDL is configurable and delivers good scalability, i.e., the timing is stabilized even when the design is scaled-up to 100% resource utilization for different deep learning systems. (ii) In workload compilation, FTDL provides a compiler that manages to map the DL workloads to the architecture level in an optimal manner. Experimental results show that for most benchmark layers in MLPerf, FTDL achieves an over 80% hardware efficiency.
Description	Poster Session II
Persistent Identifier	http://hdl.handle.net/10722/287980
ISBN	9781450370998

DC Field	Value	Language
dc.contributor.author	Shi, R	-
dc.contributor.author	Ding, Y	-
dc.contributor.author	Wei, X	-
dc.contributor.author	Liu, H	-
dc.contributor.author	So, HKH	-
dc.contributor.author	Ding, C	-
dc.date.accessioned	2020-10-05T12:06:04Z	-
dc.date.available	2020-10-05T12:06:04Z	-
dc.date.issued	2020	-
dc.identifier.citation	Proceedings of the 28th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2020), Seaside CA, USA, 23-25 February 2020, p. 320	-
dc.identifier.isbn	9781450370998	-
dc.identifier.uri	http://hdl.handle.net/10722/287980	-
dc.description	Poster Session II	-
dc.description.abstract	Hardware acceleration of deep learning (DL) systems has been increasingly studied to achieve desirable performance and energy efficiency. The FPGA strikes a balance between high energy efficiency and fast development cycle and therefore is widely used as a DNN accelerator. However, there exists an architecture-layout mismatch in the current designs, which introduces scalability and flexibility issues, leading to irregular routing and resource imbalance problems. To address these limitations, in this work, we propose FTDL, an FPGA-tailored architecture with a parameterized and hierarchical hardware that is adaptive to different FPGA devices. FTDL has the following novelties: (i) At the architecture level, FTDL consists of Tiled Processing Elements (TPE) and super blocks, to achieve a near-to-theoretical digital signal processing (DSP) operating-frequency of 650 MHz. More importantly, FTDL is configurable and delivers good scalability, i.e., the timing is stabilized even when the design is scaled-up to 100% resource utilization for different deep learning systems. (ii) In workload compilation, FTDL provides a compiler that manages to map the DL workloads to the architecture level in an optimal manner. Experimental results show that for most benchmark layers in MLPerf, FTDL achieves an over 80% hardware efficiency.	-
dc.language	eng	-
dc.publisher	Association for Computing Machinery (ACM).	-
dc.relation.ispartof	The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays	-
dc.title	FTDL: An FPGA-tailored Architecture for Deep Learning Systems	-
dc.type	Conference_Paper	-
dc.identifier.email	So, HKH: hso@eee.hku.hk	-
dc.identifier.authority	So, HKH=rp00169	-
dc.description.nature	abstract	-
dc.identifier.doi	10.1145/3373087.3375384	-
dc.identifier.hkuros	315346	-
dc.identifier.spage	320	-
dc.identifier.epage	320	-
dc.publisher.place	New York, NY	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: FTDL: An FPGA-tailored Architecture for Deep Learning Systems

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats