File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Neural network compression & domain-specific efficient architecture design
Title | Neural network compression & domain-specific efficient architecture design |
---|---|
Authors | |
Issue Date | 2024 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Li, J. C. L. [李駿諾]. (2024). Neural network compression & domain-specific efficient architecture design. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Deep neural networks (DNNs) have transformed fields like computer vision, natural language processing, and reinforcement learning, with the scalable Transformer architecture at the forefront of these advances. However, the substantial computational resources needed to deploy these models present significant practical challenges. As AI technology becomes more embedded in everyday applications, developing efficient models that function within limited resource constraints is crucial.
This thesis addresses the development of efficient DNNs for deployment on resource-limited edge devices, which face stringent memory, energy, and latency constraints. It focuses on two pivotal areas: neural network compression and domain-specific architecture design, each presenting innovative approaches to enhance the efficiency and adaptability of DNNs across various applications.
The first part of the thesis focuses on neural network compression and introduces two significant advancements. The All-Deformable-Butterfly (All-DeBut) network uses a novel approach with sparse diagonal matrices to achieve profound network compression. This network systematically applies Deformable Butterfly (DeBut) matrices across all layers, facilitated by an innovative automated chain generation scheme. It maintains high performance through contrastive knowledge distillation as the training framework and demonstrates practical deployment on Field Programmable Gate Array (FPGA) platforms. The second contribution is a unifying tensor decomposition framework encapsulating various lightweight convolutional neural network (CNN) architectures. This framework redefines CNN kernels by reshaping them into 3D tensors and applying various tensor decomposition methods. It also integrates efficient zero-parameter, zero-flop shift layers and introduces a novel shift layer pruning technique that preserves accuracy while significantly reducing model size.
In the second part of the thesis, domain-specific efficient architecture design is explored. We introduce ultra-compact Hundred-Kilobyte Lookup Tables (HKLUTs) to address the storage limitations of traditional Lookup Tables (LUT)-based methods in Single-Image Super-Resolution (SISR). These HKLUTs reduce the number of input pixels and employ an asymmetric parallel structure to dramatically decrease storage size. A multistage architecture with progressive upsampling is also implemented, further reducing LUT size and enhancing performance. HKLUTs achieve an impressively small size of just 100KB, ten times smaller than the nearest competitor, and offer superior energy efficiency and reduced latency on edge devices, making them ideal for such applications. Apart from that, the Activation-Sharing Multi-Resolution (ASMR) coordinate network is introduced, significantly enhancing the efficiency of implicit neural representations (INRs). By leveraging activation sharing across multi-resolution grids, ASMR drastically reduces inference costs, effectively decoupling these costs from network depth to achieve near $O(1)$ complexity. Experimental results show that ASMR can reduce multiply-accumulate (MAC) operations by up to 500 times compared to baseline methods, while also improving reconstruction quality across various domains, including image, audio, video, and 3D shapes. This makes ASMR an exceptional solution for cost-effective, high-performance INRs.
Together, this thesis provides practical solutions for deploying DNNs in resource-constrained settings across various applications, including high-level classification, low-level super-resolution, and signal-fitting tasks. It paves the way for future research into efficient AI technologies. |
Degree | Doctor of Philosophy |
Subject | Neural networks (Computer science) Deep learning (Machine learning) |
Dept/Program | Electrical and Electronic Engineering |
Persistent Identifier | http://hdl.handle.net/10722/352695 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Li, Jason Chun Lok | - |
dc.contributor.author | 李駿諾 | - |
dc.date.accessioned | 2024-12-19T09:27:24Z | - |
dc.date.available | 2024-12-19T09:27:24Z | - |
dc.date.issued | 2024 | - |
dc.identifier.citation | Li, J. C. L. [李駿諾]. (2024). Neural network compression & domain-specific efficient architecture design. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/352695 | - |
dc.description.abstract | Deep neural networks (DNNs) have transformed fields like computer vision, natural language processing, and reinforcement learning, with the scalable Transformer architecture at the forefront of these advances. However, the substantial computational resources needed to deploy these models present significant practical challenges. As AI technology becomes more embedded in everyday applications, developing efficient models that function within limited resource constraints is crucial. This thesis addresses the development of efficient DNNs for deployment on resource-limited edge devices, which face stringent memory, energy, and latency constraints. It focuses on two pivotal areas: neural network compression and domain-specific architecture design, each presenting innovative approaches to enhance the efficiency and adaptability of DNNs across various applications. The first part of the thesis focuses on neural network compression and introduces two significant advancements. The All-Deformable-Butterfly (All-DeBut) network uses a novel approach with sparse diagonal matrices to achieve profound network compression. This network systematically applies Deformable Butterfly (DeBut) matrices across all layers, facilitated by an innovative automated chain generation scheme. It maintains high performance through contrastive knowledge distillation as the training framework and demonstrates practical deployment on Field Programmable Gate Array (FPGA) platforms. The second contribution is a unifying tensor decomposition framework encapsulating various lightweight convolutional neural network (CNN) architectures. This framework redefines CNN kernels by reshaping them into 3D tensors and applying various tensor decomposition methods. It also integrates efficient zero-parameter, zero-flop shift layers and introduces a novel shift layer pruning technique that preserves accuracy while significantly reducing model size. In the second part of the thesis, domain-specific efficient architecture design is explored. We introduce ultra-compact Hundred-Kilobyte Lookup Tables (HKLUTs) to address the storage limitations of traditional Lookup Tables (LUT)-based methods in Single-Image Super-Resolution (SISR). These HKLUTs reduce the number of input pixels and employ an asymmetric parallel structure to dramatically decrease storage size. A multistage architecture with progressive upsampling is also implemented, further reducing LUT size and enhancing performance. HKLUTs achieve an impressively small size of just 100KB, ten times smaller than the nearest competitor, and offer superior energy efficiency and reduced latency on edge devices, making them ideal for such applications. Apart from that, the Activation-Sharing Multi-Resolution (ASMR) coordinate network is introduced, significantly enhancing the efficiency of implicit neural representations (INRs). By leveraging activation sharing across multi-resolution grids, ASMR drastically reduces inference costs, effectively decoupling these costs from network depth to achieve near $O(1)$ complexity. Experimental results show that ASMR can reduce multiply-accumulate (MAC) operations by up to 500 times compared to baseline methods, while also improving reconstruction quality across various domains, including image, audio, video, and 3D shapes. This makes ASMR an exceptional solution for cost-effective, high-performance INRs. Together, this thesis provides practical solutions for deploying DNNs in resource-constrained settings across various applications, including high-level classification, low-level super-resolution, and signal-fitting tasks. It paves the way for future research into efficient AI technologies. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Neural networks (Computer science) | - |
dc.subject.lcsh | Deep learning (Machine learning) | - |
dc.title | Neural network compression & domain-specific efficient architecture design | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Electrical and Electronic Engineering | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2024 | - |
dc.identifier.mmsid | 991044891407203414 | - |