Neural network compression & domain-specific efficient architecture design

Li, Jason Chun Lok; 李駿諾

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Electrical & Electronic Engineering: Theses

postgraduate thesis: Neural network compression & domain-specific efficient architecture design

Title	Neural network compression & domain-specific efficient architecture design
Authors	Li, Jason Chun Lok 李駿諾
Issue Date	2024
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Li, J. C. L. [李駿諾]. (2024). Neural network compression & domain-specific efficient architecture design. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Deep neural networks (DNNs) have transformed fields like computer vision, natural language processing, and reinforcement learning, with the scalable Transformer architecture at the forefront of these advances. However, the substantial computational resources needed to deploy these models present significant practical challenges. As AI technology becomes more embedded in everyday applications, developing efficient models that function within limited resource constraints is crucial. This thesis addresses the development of efficient DNNs for deployment on resource-limited edge devices, which face stringent memory, energy, and latency constraints. It focuses on two pivotal areas: neural network compression and domain-specific architecture design, each presenting innovative approaches to enhance the efficiency and adaptability of DNNs across various applications. The first part of the thesis focuses on neural network compression and introduces two significant advancements. The All-Deformable-Butterfly (All-DeBut) network uses a novel approach with sparse diagonal matrices to achieve profound network compression. This network systematically applies Deformable Butterfly (DeBut) matrices across all layers, facilitated by an innovative automated chain generation scheme. It maintains high performance through contrastive knowledge distillation as the training framework and demonstrates practical deployment on Field Programmable Gate Array (FPGA) platforms. The second contribution is a unifying tensor decomposition framework encapsulating various lightweight convolutional neural network (CNN) architectures. This framework redefines CNN kernels by reshaping them into 3D tensors and applying various tensor decomposition methods. It also integrates efficient zero-parameter, zero-flop shift layers and introduces a novel shift layer pruning technique that preserves accuracy while significantly reducing model size. In the second part of the thesis, domain-specific efficient architecture design is explored. We introduce ultra-compact Hundred-Kilobyte Lookup Tables (HKLUTs) to address the storage limitations of traditional Lookup Tables (LUT)-based methods in Single-Image Super-Resolution (SISR). These HKLUTs reduce the number of input pixels and employ an asymmetric parallel structure to dramatically decrease storage size. A multistage architecture with progressive upsampling is also implemented, further reducing LUT size and enhancing performance. HKLUTs achieve an impressively small size of just 100KB, ten times smaller than the nearest competitor, and offer superior energy efficiency and reduced latency on edge devices, making them ideal for such applications. Apart from that, the Activation-Sharing Multi-Resolution (ASMR) coordinate network is introduced, significantly enhancing the efficiency of implicit neural representations (INRs). By leveraging activation sharing across multi-resolution grids, ASMR drastically reduces inference costs, effectively decoupling these costs from network depth to achieve near $O(1)$ complexity. Experimental results show that ASMR can reduce multiply-accumulate (MAC) operations by up to 500 times compared to baseline methods, while also improving reconstruction quality across various domains, including image, audio, video, and 3D shapes. This makes ASMR an exceptional solution for cost-effective, high-performance INRs. Together, this thesis provides practical solutions for deploying DNNs in resource-constrained settings across various applications, including high-level classification, low-level super-resolution, and signal-fitting tasks. It paves the way for future research into efficient AI technologies.
Degree	Doctor of Philosophy
Subject	Neural networks (Computer science) Deep learning (Machine learning)
Dept/Program	Electrical and Electronic Engineering
Persistent Identifier	http://hdl.handle.net/10722/352695

DC Field	Value	Language
dc.contributor.author	Li, Jason Chun Lok	-
dc.contributor.author	李駿諾	-
dc.date.accessioned	2024-12-19T09:27:24Z	-
dc.date.available	2024-12-19T09:27:24Z	-
dc.date.issued	2024	-
dc.identifier.citation	Li, J. C. L. [李駿諾]. (2024). Neural network compression & domain-specific efficient architecture design. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/352695	-
dc.description.abstract	Deep neural networks (DNNs) have transformed fields like computer vision, natural language processing, and reinforcement learning, with the scalable Transformer architecture at the forefront of these advances. However, the substantial computational resources needed to deploy these models present significant practical challenges. As AI technology becomes more embedded in everyday applications, developing efficient models that function within limited resource constraints is crucial. This thesis addresses the development of efficient DNNs for deployment on resource-limited edge devices, which face stringent memory, energy, and latency constraints. It focuses on two pivotal areas: neural network compression and domain-specific architecture design, each presenting innovative approaches to enhance the efficiency and adaptability of DNNs across various applications. The first part of the thesis focuses on neural network compression and introduces two significant advancements. The All-Deformable-Butterfly (All-DeBut) network uses a novel approach with sparse diagonal matrices to achieve profound network compression. This network systematically applies Deformable Butterfly (DeBut) matrices across all layers, facilitated by an innovative automated chain generation scheme. It maintains high performance through contrastive knowledge distillation as the training framework and demonstrates practical deployment on Field Programmable Gate Array (FPGA) platforms. The second contribution is a unifying tensor decomposition framework encapsulating various lightweight convolutional neural network (CNN) architectures. This framework redefines CNN kernels by reshaping them into 3D tensors and applying various tensor decomposition methods. It also integrates efficient zero-parameter, zero-flop shift layers and introduces a novel shift layer pruning technique that preserves accuracy while significantly reducing model size. In the second part of the thesis, domain-specific efficient architecture design is explored. We introduce ultra-compact Hundred-Kilobyte Lookup Tables (HKLUTs) to address the storage limitations of traditional Lookup Tables (LUT)-based methods in Single-Image Super-Resolution (SISR). These HKLUTs reduce the number of input pixels and employ an asymmetric parallel structure to dramatically decrease storage size. A multistage architecture with progressive upsampling is also implemented, further reducing LUT size and enhancing performance. HKLUTs achieve an impressively small size of just 100KB, ten times smaller than the nearest competitor, and offer superior energy efficiency and reduced latency on edge devices, making them ideal for such applications. Apart from that, the Activation-Sharing Multi-Resolution (ASMR) coordinate network is introduced, significantly enhancing the efficiency of implicit neural representations (INRs). By leveraging activation sharing across multi-resolution grids, ASMR drastically reduces inference costs, effectively decoupling these costs from network depth to achieve near $O(1)$ complexity. Experimental results show that ASMR can reduce multiply-accumulate (MAC) operations by up to 500 times compared to baseline methods, while also improving reconstruction quality across various domains, including image, audio, video, and 3D shapes. This makes ASMR an exceptional solution for cost-effective, high-performance INRs. Together, this thesis provides practical solutions for deploying DNNs in resource-constrained settings across various applications, including high-level classification, low-level super-resolution, and signal-fitting tasks. It paves the way for future research into efficient AI technologies.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Neural networks (Computer science)	-
dc.subject.lcsh	Deep learning (Machine learning)	-
dc.title	Neural network compression & domain-specific efficient architecture design	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Electrical and Electronic Engineering	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2024	-
dc.identifier.mmsid	991044891407203414	-

File Download

Supplementary

postgraduate thesis: Neural network compression & domain-specific efficient architecture design

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats