File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Efficient deep neural network inference on custom edge accelerators
| Title | Efficient deep neural network inference on custom edge accelerators |
|---|---|
| Authors | |
| Advisors | |
| Issue Date | 2025 |
| Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
| Citation | Ding, Y. [丁昱昊]. (2025). Efficient deep neural network inference on custom edge accelerators. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
| Abstract | Deep Neural Networks (DNNs) are the foundation of recent waves of artificial intelligence (AI), powering applications from computer vision to natural language processing. Due to increasing demands for low-latency, energy-efficient, and privacy-preserving DNN inference, developing efficient edge DNN accelerators is becoming a necessity. Although general-purpose edge accelerators, such as embedded Graphics Processing Units (GPUs) and edge Tensor Processing Units (TPUs), support the acceleration of a wide variety of DNNs, they often fall short in terms of computational and energy efficiency on specific DNNs compared to custom-designed accelerators. Customized accelerators can be tailored to the specific requirements of DNNs and deployment scenarios to achieve higher performance and lower energy consumption.
However, designing custom DNN accelerators poses significant challenges. First, the vast diversity of DNN architectures and configurations makes it time-consuming to develop a highly optimized design for each network. Second, the accelerator design is also influenced by the deployment scenario. For example, accelerators for wearable devices must prioritize ultra-low power, whereas those for autonomous vehicles demands high performance. Finally, to further improve the efficiency of DNN inference, an algorithm-architecture co-design approach is also desired.
To address these challenges, this thesis first proposes AGNA, an extensible design space exploration (DSE) framework for DNN accelerator generation. We demonstrate AGNA through the design of a highly parameterized DNN accelerator architecture template, which can be customized to meet the requirements of various DNN models. The DSE process is formulated as mixed-integer geometric programming (MIGP) problem, which is then efficiently decoupled and solved by AGNA. AGNA can quickly generate optimized DNN accelerators at different scales for various DNN models with performance comparable to state-of-the-art designs.
Then we showcase the extensibility of AGNA by extending it to support ESDA, an event-based sparse dataflow architecture. Unlike traditional architecture that processes frame-based dense input in parallelized manner, ESDA is designed to process event-based sparse feature stream through a composable dataflow. Despite its fundamentally different design principle, ESDA can still be generated through the unified formulation framework in AGNA. The generated ESDA accelerators demonstrate significant performance improvement over embedded GPUs.
In addition to accelerator design automation, we also investigate adopting algorithm-architecture co-design to tailor DNNs for edge inference, which is critical for end-to-end deployment. We propose QUDA, a novel quantization-aware unsupervised domain adaptation framework for DNNs. With a three-stage training flow, QUDA can efficiently adapt DNNs to target domain and quantize the DNNs to low precision simultaneously. The resulting models are immediately deployable on edge accelerators.
By tackling both edge DNN accelerator design automation and algorithm-architecture co-design, this thesis provides a comprehensive solution for highly efficient, end-to-end DNN deployment at the edge. The proposed frameworks significantly reduce the design effort and time when deploying DNNs on edge accelerators, enabling efficient and effective deployment of DNNs in more edge scenarios. |
| Degree | Doctor of Philosophy |
| Subject | Deep learning (Machine learning) Neural networks (Computer science) Edge computing |
| Dept/Program | Electrical and Electronic Engineering |
| Persistent Identifier | http://hdl.handle.net/10722/367392 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.advisor | So, HKH | - |
| dc.contributor.advisor | Shum, HC | - |
| dc.contributor.author | Ding, Yuhao | - |
| dc.contributor.author | 丁昱昊 | - |
| dc.date.accessioned | 2025-12-11T06:41:39Z | - |
| dc.date.available | 2025-12-11T06:41:39Z | - |
| dc.date.issued | 2025 | - |
| dc.identifier.citation | Ding, Y. [丁昱昊]. (2025). Efficient deep neural network inference on custom edge accelerators. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
| dc.identifier.uri | http://hdl.handle.net/10722/367392 | - |
| dc.description.abstract | Deep Neural Networks (DNNs) are the foundation of recent waves of artificial intelligence (AI), powering applications from computer vision to natural language processing. Due to increasing demands for low-latency, energy-efficient, and privacy-preserving DNN inference, developing efficient edge DNN accelerators is becoming a necessity. Although general-purpose edge accelerators, such as embedded Graphics Processing Units (GPUs) and edge Tensor Processing Units (TPUs), support the acceleration of a wide variety of DNNs, they often fall short in terms of computational and energy efficiency on specific DNNs compared to custom-designed accelerators. Customized accelerators can be tailored to the specific requirements of DNNs and deployment scenarios to achieve higher performance and lower energy consumption. However, designing custom DNN accelerators poses significant challenges. First, the vast diversity of DNN architectures and configurations makes it time-consuming to develop a highly optimized design for each network. Second, the accelerator design is also influenced by the deployment scenario. For example, accelerators for wearable devices must prioritize ultra-low power, whereas those for autonomous vehicles demands high performance. Finally, to further improve the efficiency of DNN inference, an algorithm-architecture co-design approach is also desired. To address these challenges, this thesis first proposes AGNA, an extensible design space exploration (DSE) framework for DNN accelerator generation. We demonstrate AGNA through the design of a highly parameterized DNN accelerator architecture template, which can be customized to meet the requirements of various DNN models. The DSE process is formulated as mixed-integer geometric programming (MIGP) problem, which is then efficiently decoupled and solved by AGNA. AGNA can quickly generate optimized DNN accelerators at different scales for various DNN models with performance comparable to state-of-the-art designs. Then we showcase the extensibility of AGNA by extending it to support ESDA, an event-based sparse dataflow architecture. Unlike traditional architecture that processes frame-based dense input in parallelized manner, ESDA is designed to process event-based sparse feature stream through a composable dataflow. Despite its fundamentally different design principle, ESDA can still be generated through the unified formulation framework in AGNA. The generated ESDA accelerators demonstrate significant performance improvement over embedded GPUs. In addition to accelerator design automation, we also investigate adopting algorithm-architecture co-design to tailor DNNs for edge inference, which is critical for end-to-end deployment. We propose QUDA, a novel quantization-aware unsupervised domain adaptation framework for DNNs. With a three-stage training flow, QUDA can efficiently adapt DNNs to target domain and quantize the DNNs to low precision simultaneously. The resulting models are immediately deployable on edge accelerators. By tackling both edge DNN accelerator design automation and algorithm-architecture co-design, this thesis provides a comprehensive solution for highly efficient, end-to-end DNN deployment at the edge. The proposed frameworks significantly reduce the design effort and time when deploying DNNs on edge accelerators, enabling efficient and effective deployment of DNNs in more edge scenarios. | - |
| dc.language | eng | - |
| dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
| dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
| dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
| dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
| dc.subject.lcsh | Deep learning (Machine learning) | - |
| dc.subject.lcsh | Neural networks (Computer science) | - |
| dc.subject.lcsh | Edge computing | - |
| dc.title | Efficient deep neural network inference on custom edge accelerators | - |
| dc.type | PG_Thesis | - |
| dc.description.thesisname | Doctor of Philosophy | - |
| dc.description.thesislevel | Doctoral | - |
| dc.description.thesisdiscipline | Electrical and Electronic Engineering | - |
| dc.description.nature | published_or_final_version | - |
| dc.date.hkucongregation | 2025 | - |
| dc.identifier.mmsid | 991045147149503414 | - |
