File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: FTDL: An FPGA-tailored Architecture for Deep Learning Systems

TitleFTDL: An FPGA-tailored Architecture for Deep Learning Systems
Authors
Issue Date2020
PublisherAssociation for Computing Machinery (ACM).
Citation
Proceedings of the 28th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2020), Seaside CA, USA, 23-25 February 2020, p. 320 How to Cite?
AbstractHardware acceleration of deep learning (DL) systems has been increasingly studied to achieve desirable performance and energy efficiency. The FPGA strikes a balance between high energy efficiency and fast development cycle and therefore is widely used as a DNN accelerator. However, there exists an architecture-layout mismatch in the current designs, which introduces scalability and flexibility issues, leading to irregular routing and resource imbalance problems. To address these limitations, in this work, we propose FTDL, an FPGA-tailored architecture with a parameterized and hierarchical hardware that is adaptive to different FPGA devices. FTDL has the following novelties: (i) At the architecture level, FTDL consists of Tiled Processing Elements (TPE) and super blocks, to achieve a near-to-theoretical digital signal processing (DSP) operating-frequency of 650 MHz. More importantly, FTDL is configurable and delivers good scalability, i.e., the timing is stabilized even when the design is scaled-up to 100% resource utilization for different deep learning systems. (ii) In workload compilation, FTDL provides a compiler that manages to map the DL workloads to the architecture level in an optimal manner. Experimental results show that for most benchmark layers in MLPerf, FTDL achieves an over 80% hardware efficiency.
DescriptionPoster Session II
Persistent Identifierhttp://hdl.handle.net/10722/287980
ISBN

 

DC FieldValueLanguage
dc.contributor.authorShi, R-
dc.contributor.authorDing, Y-
dc.contributor.authorWei, X-
dc.contributor.authorLiu, H-
dc.contributor.authorSo, HKH-
dc.contributor.authorDing, C-
dc.date.accessioned2020-10-05T12:06:04Z-
dc.date.available2020-10-05T12:06:04Z-
dc.date.issued2020-
dc.identifier.citationProceedings of the 28th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2020), Seaside CA, USA, 23-25 February 2020, p. 320-
dc.identifier.isbn9781450370998-
dc.identifier.urihttp://hdl.handle.net/10722/287980-
dc.descriptionPoster Session II-
dc.description.abstractHardware acceleration of deep learning (DL) systems has been increasingly studied to achieve desirable performance and energy efficiency. The FPGA strikes a balance between high energy efficiency and fast development cycle and therefore is widely used as a DNN accelerator. However, there exists an architecture-layout mismatch in the current designs, which introduces scalability and flexibility issues, leading to irregular routing and resource imbalance problems. To address these limitations, in this work, we propose FTDL, an FPGA-tailored architecture with a parameterized and hierarchical hardware that is adaptive to different FPGA devices. FTDL has the following novelties: (i) At the architecture level, FTDL consists of Tiled Processing Elements (TPE) and super blocks, to achieve a near-to-theoretical digital signal processing (DSP) operating-frequency of 650 MHz. More importantly, FTDL is configurable and delivers good scalability, i.e., the timing is stabilized even when the design is scaled-up to 100% resource utilization for different deep learning systems. (ii) In workload compilation, FTDL provides a compiler that manages to map the DL workloads to the architecture level in an optimal manner. Experimental results show that for most benchmark layers in MLPerf, FTDL achieves an over 80% hardware efficiency.-
dc.languageeng-
dc.publisherAssociation for Computing Machinery (ACM).-
dc.relation.ispartofThe 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays-
dc.titleFTDL: An FPGA-tailored Architecture for Deep Learning Systems-
dc.typeConference_Paper-
dc.identifier.emailSo, HKH: hso@eee.hku.hk-
dc.identifier.authoritySo, HKH=rp00169-
dc.description.natureabstract-
dc.identifier.doi10.1145/3373087.3375384-
dc.identifier.hkuros315346-
dc.identifier.spage320-
dc.identifier.epage320-
dc.publisher.placeNew York, NY-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats