File Download
Supplementary

postgraduate thesis: Efficient deep neural network inference on custom edge accelerators

TitleEfficient deep neural network inference on custom edge accelerators
Authors
Advisors
Advisor(s):So, HKHShum, HC
Issue Date2025
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Ding, Y. [丁昱昊]. (2025). Efficient deep neural network inference on custom edge accelerators. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractDeep Neural Networks (DNNs) are the foundation of recent waves of artificial intelligence (AI), powering applications from computer vision to natural language processing. Due to increasing demands for low-latency, energy-efficient, and privacy-preserving DNN inference, developing efficient edge DNN accelerators is becoming a necessity. Although general-purpose edge accelerators, such as embedded Graphics Processing Units (GPUs) and edge Tensor Processing Units (TPUs), support the acceleration of a wide variety of DNNs, they often fall short in terms of computational and energy efficiency on specific DNNs compared to custom-designed accelerators. Customized accelerators can be tailored to the specific requirements of DNNs and deployment scenarios to achieve higher performance and lower energy consumption. However, designing custom DNN accelerators poses significant challenges. First, the vast diversity of DNN architectures and configurations makes it time-consuming to develop a highly optimized design for each network. Second, the accelerator design is also influenced by the deployment scenario. For example, accelerators for wearable devices must prioritize ultra-low power, whereas those for autonomous vehicles demands high performance. Finally, to further improve the efficiency of DNN inference, an algorithm-architecture co-design approach is also desired. To address these challenges, this thesis first proposes AGNA, an extensible design space exploration (DSE) framework for DNN accelerator generation. We demonstrate AGNA through the design of a highly parameterized DNN accelerator architecture template, which can be customized to meet the requirements of various DNN models. The DSE process is formulated as mixed-integer geometric programming (MIGP) problem, which is then efficiently decoupled and solved by AGNA. AGNA can quickly generate optimized DNN accelerators at different scales for various DNN models with performance comparable to state-of-the-art designs. Then we showcase the extensibility of AGNA by extending it to support ESDA, an event-based sparse dataflow architecture. Unlike traditional architecture that processes frame-based dense input in parallelized manner, ESDA is designed to process event-based sparse feature stream through a composable dataflow. Despite its fundamentally different design principle, ESDA can still be generated through the unified formulation framework in AGNA. The generated ESDA accelerators demonstrate significant performance improvement over embedded GPUs. In addition to accelerator design automation, we also investigate adopting algorithm-architecture co-design to tailor DNNs for edge inference, which is critical for end-to-end deployment. We propose QUDA, a novel quantization-aware unsupervised domain adaptation framework for DNNs. With a three-stage training flow, QUDA can efficiently adapt DNNs to target domain and quantize the DNNs to low precision simultaneously. The resulting models are immediately deployable on edge accelerators. By tackling both edge DNN accelerator design automation and algorithm-architecture co-design, this thesis provides a comprehensive solution for highly efficient, end-to-end DNN deployment at the edge. The proposed frameworks significantly reduce the design effort and time when deploying DNNs on edge accelerators, enabling efficient and effective deployment of DNNs in more edge scenarios.
DegreeDoctor of Philosophy
SubjectDeep learning (Machine learning)
Neural networks (Computer science)
Edge computing
Dept/ProgramElectrical and Electronic Engineering
Persistent Identifierhttp://hdl.handle.net/10722/367392

 

DC FieldValueLanguage
dc.contributor.advisorSo, HKH-
dc.contributor.advisorShum, HC-
dc.contributor.authorDing, Yuhao-
dc.contributor.author丁昱昊-
dc.date.accessioned2025-12-11T06:41:39Z-
dc.date.available2025-12-11T06:41:39Z-
dc.date.issued2025-
dc.identifier.citationDing, Y. [丁昱昊]. (2025). Efficient deep neural network inference on custom edge accelerators. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/367392-
dc.description.abstractDeep Neural Networks (DNNs) are the foundation of recent waves of artificial intelligence (AI), powering applications from computer vision to natural language processing. Due to increasing demands for low-latency, energy-efficient, and privacy-preserving DNN inference, developing efficient edge DNN accelerators is becoming a necessity. Although general-purpose edge accelerators, such as embedded Graphics Processing Units (GPUs) and edge Tensor Processing Units (TPUs), support the acceleration of a wide variety of DNNs, they often fall short in terms of computational and energy efficiency on specific DNNs compared to custom-designed accelerators. Customized accelerators can be tailored to the specific requirements of DNNs and deployment scenarios to achieve higher performance and lower energy consumption. However, designing custom DNN accelerators poses significant challenges. First, the vast diversity of DNN architectures and configurations makes it time-consuming to develop a highly optimized design for each network. Second, the accelerator design is also influenced by the deployment scenario. For example, accelerators for wearable devices must prioritize ultra-low power, whereas those for autonomous vehicles demands high performance. Finally, to further improve the efficiency of DNN inference, an algorithm-architecture co-design approach is also desired. To address these challenges, this thesis first proposes AGNA, an extensible design space exploration (DSE) framework for DNN accelerator generation. We demonstrate AGNA through the design of a highly parameterized DNN accelerator architecture template, which can be customized to meet the requirements of various DNN models. The DSE process is formulated as mixed-integer geometric programming (MIGP) problem, which is then efficiently decoupled and solved by AGNA. AGNA can quickly generate optimized DNN accelerators at different scales for various DNN models with performance comparable to state-of-the-art designs. Then we showcase the extensibility of AGNA by extending it to support ESDA, an event-based sparse dataflow architecture. Unlike traditional architecture that processes frame-based dense input in parallelized manner, ESDA is designed to process event-based sparse feature stream through a composable dataflow. Despite its fundamentally different design principle, ESDA can still be generated through the unified formulation framework in AGNA. The generated ESDA accelerators demonstrate significant performance improvement over embedded GPUs. In addition to accelerator design automation, we also investigate adopting algorithm-architecture co-design to tailor DNNs for edge inference, which is critical for end-to-end deployment. We propose QUDA, a novel quantization-aware unsupervised domain adaptation framework for DNNs. With a three-stage training flow, QUDA can efficiently adapt DNNs to target domain and quantize the DNNs to low precision simultaneously. The resulting models are immediately deployable on edge accelerators. By tackling both edge DNN accelerator design automation and algorithm-architecture co-design, this thesis provides a comprehensive solution for highly efficient, end-to-end DNN deployment at the edge. The proposed frameworks significantly reduce the design effort and time when deploying DNNs on edge accelerators, enabling efficient and effective deployment of DNNs in more edge scenarios.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshDeep learning (Machine learning)-
dc.subject.lcshNeural networks (Computer science)-
dc.subject.lcshEdge computing-
dc.titleEfficient deep neural network inference on custom edge accelerators-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineElectrical and Electronic Engineering-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2025-
dc.identifier.mmsid991045147149503414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats