Efficient deep neural network inference on custom edge accelerators

Ding, Yuhao; 丁昱昊

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Electrical & Electronic Engineering: Theses

postgraduate thesis: Efficient deep neural network inference on custom edge accelerators

Title	Efficient deep neural network inference on custom edge accelerators
Authors	Ding, Yuhao 丁昱昊
Advisors	Advisor(s):So, HKH Shum, HC
Issue Date	2025
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Ding, Y. [丁昱昊]. (2025). Efficient deep neural network inference on custom edge accelerators. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Deep Neural Networks (DNNs) are the foundation of recent waves of artificial intelligence (AI), powering applications from computer vision to natural language processing. Due to increasing demands for low-latency, energy-efficient, and privacy-preserving DNN inference, developing efficient edge DNN accelerators is becoming a necessity. Although general-purpose edge accelerators, such as embedded Graphics Processing Units (GPUs) and edge Tensor Processing Units (TPUs), support the acceleration of a wide variety of DNNs, they often fall short in terms of computational and energy efficiency on specific DNNs compared to custom-designed accelerators. Customized accelerators can be tailored to the specific requirements of DNNs and deployment scenarios to achieve higher performance and lower energy consumption. However, designing custom DNN accelerators poses significant challenges. First, the vast diversity of DNN architectures and configurations makes it time-consuming to develop a highly optimized design for each network. Second, the accelerator design is also influenced by the deployment scenario. For example, accelerators for wearable devices must prioritize ultra-low power, whereas those for autonomous vehicles demands high performance. Finally, to further improve the efficiency of DNN inference, an algorithm-architecture co-design approach is also desired. To address these challenges, this thesis first proposes AGNA, an extensible design space exploration (DSE) framework for DNN accelerator generation. We demonstrate AGNA through the design of a highly parameterized DNN accelerator architecture template, which can be customized to meet the requirements of various DNN models. The DSE process is formulated as mixed-integer geometric programming (MIGP) problem, which is then efficiently decoupled and solved by AGNA. AGNA can quickly generate optimized DNN accelerators at different scales for various DNN models with performance comparable to state-of-the-art designs. Then we showcase the extensibility of AGNA by extending it to support ESDA, an event-based sparse dataflow architecture. Unlike traditional architecture that processes frame-based dense input in parallelized manner, ESDA is designed to process event-based sparse feature stream through a composable dataflow. Despite its fundamentally different design principle, ESDA can still be generated through the unified formulation framework in AGNA. The generated ESDA accelerators demonstrate significant performance improvement over embedded GPUs. In addition to accelerator design automation, we also investigate adopting algorithm-architecture co-design to tailor DNNs for edge inference, which is critical for end-to-end deployment. We propose QUDA, a novel quantization-aware unsupervised domain adaptation framework for DNNs. With a three-stage training flow, QUDA can efficiently adapt DNNs to target domain and quantize the DNNs to low precision simultaneously. The resulting models are immediately deployable on edge accelerators. By tackling both edge DNN accelerator design automation and algorithm-architecture co-design, this thesis provides a comprehensive solution for highly efficient, end-to-end DNN deployment at the edge. The proposed frameworks significantly reduce the design effort and time when deploying DNNs on edge accelerators, enabling efficient and effective deployment of DNNs in more edge scenarios.
Degree	Doctor of Philosophy
Subject	Deep learning (Machine learning) Neural networks (Computer science) Edge computing
Dept/Program	Electrical and Electronic Engineering
Persistent Identifier	http://hdl.handle.net/10722/367392

DC Field	Value	Language
dc.contributor.advisor	So, HKH	-
dc.contributor.advisor	Shum, HC	-
dc.contributor.author	Ding, Yuhao	-
dc.contributor.author	丁昱昊	-
dc.date.accessioned	2025-12-11T06:41:39Z	-
dc.date.available	2025-12-11T06:41:39Z	-
dc.date.issued	2025	-
dc.identifier.citation	Ding, Y. [丁昱昊]. (2025). Efficient deep neural network inference on custom edge accelerators. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/367392	-
dc.description.abstract	Deep Neural Networks (DNNs) are the foundation of recent waves of artificial intelligence (AI), powering applications from computer vision to natural language processing. Due to increasing demands for low-latency, energy-efficient, and privacy-preserving DNN inference, developing efficient edge DNN accelerators is becoming a necessity. Although general-purpose edge accelerators, such as embedded Graphics Processing Units (GPUs) and edge Tensor Processing Units (TPUs), support the acceleration of a wide variety of DNNs, they often fall short in terms of computational and energy efficiency on specific DNNs compared to custom-designed accelerators. Customized accelerators can be tailored to the specific requirements of DNNs and deployment scenarios to achieve higher performance and lower energy consumption. However, designing custom DNN accelerators poses significant challenges. First, the vast diversity of DNN architectures and configurations makes it time-consuming to develop a highly optimized design for each network. Second, the accelerator design is also influenced by the deployment scenario. For example, accelerators for wearable devices must prioritize ultra-low power, whereas those for autonomous vehicles demands high performance. Finally, to further improve the efficiency of DNN inference, an algorithm-architecture co-design approach is also desired. To address these challenges, this thesis first proposes AGNA, an extensible design space exploration (DSE) framework for DNN accelerator generation. We demonstrate AGNA through the design of a highly parameterized DNN accelerator architecture template, which can be customized to meet the requirements of various DNN models. The DSE process is formulated as mixed-integer geometric programming (MIGP) problem, which is then efficiently decoupled and solved by AGNA. AGNA can quickly generate optimized DNN accelerators at different scales for various DNN models with performance comparable to state-of-the-art designs. Then we showcase the extensibility of AGNA by extending it to support ESDA, an event-based sparse dataflow architecture. Unlike traditional architecture that processes frame-based dense input in parallelized manner, ESDA is designed to process event-based sparse feature stream through a composable dataflow. Despite its fundamentally different design principle, ESDA can still be generated through the unified formulation framework in AGNA. The generated ESDA accelerators demonstrate significant performance improvement over embedded GPUs. In addition to accelerator design automation, we also investigate adopting algorithm-architecture co-design to tailor DNNs for edge inference, which is critical for end-to-end deployment. We propose QUDA, a novel quantization-aware unsupervised domain adaptation framework for DNNs. With a three-stage training flow, QUDA can efficiently adapt DNNs to target domain and quantize the DNNs to low precision simultaneously. The resulting models are immediately deployable on edge accelerators. By tackling both edge DNN accelerator design automation and algorithm-architecture co-design, this thesis provides a comprehensive solution for highly efficient, end-to-end DNN deployment at the edge. The proposed frameworks significantly reduce the design effort and time when deploying DNNs on edge accelerators, enabling efficient and effective deployment of DNNs in more edge scenarios.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Deep learning (Machine learning)	-
dc.subject.lcsh	Neural networks (Computer science)	-
dc.subject.lcsh	Edge computing	-
dc.title	Efficient deep neural network inference on custom edge accelerators	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Electrical and Electronic Engineering	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2025	-
dc.identifier.mmsid	991045147149503414	-

File Download

Supplementary

postgraduate thesis: Efficient deep neural network inference on custom edge accelerators

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats