File Download
Supplementary

postgraduate thesis: Neural network compression & domain-specific efficient architecture design

TitleNeural network compression & domain-specific efficient architecture design
Authors
Issue Date2024
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Li, J. C. L. [李駿諾]. (2024). Neural network compression & domain-specific efficient architecture design. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractDeep neural networks (DNNs) have transformed fields like computer vision, natural language processing, and reinforcement learning, with the scalable Transformer architecture at the forefront of these advances. However, the substantial computational resources needed to deploy these models present significant practical challenges. As AI technology becomes more embedded in everyday applications, developing efficient models that function within limited resource constraints is crucial. This thesis addresses the development of efficient DNNs for deployment on resource-limited edge devices, which face stringent memory, energy, and latency constraints. It focuses on two pivotal areas: neural network compression and domain-specific architecture design, each presenting innovative approaches to enhance the efficiency and adaptability of DNNs across various applications. The first part of the thesis focuses on neural network compression and introduces two significant advancements. The All-Deformable-Butterfly (All-DeBut) network uses a novel approach with sparse diagonal matrices to achieve profound network compression. This network systematically applies Deformable Butterfly (DeBut) matrices across all layers, facilitated by an innovative automated chain generation scheme. It maintains high performance through contrastive knowledge distillation as the training framework and demonstrates practical deployment on Field Programmable Gate Array (FPGA) platforms. The second contribution is a unifying tensor decomposition framework encapsulating various lightweight convolutional neural network (CNN) architectures. This framework redefines CNN kernels by reshaping them into 3D tensors and applying various tensor decomposition methods. It also integrates efficient zero-parameter, zero-flop shift layers and introduces a novel shift layer pruning technique that preserves accuracy while significantly reducing model size. In the second part of the thesis, domain-specific efficient architecture design is explored. We introduce ultra-compact Hundred-Kilobyte Lookup Tables (HKLUTs) to address the storage limitations of traditional Lookup Tables (LUT)-based methods in Single-Image Super-Resolution (SISR). These HKLUTs reduce the number of input pixels and employ an asymmetric parallel structure to dramatically decrease storage size. A multistage architecture with progressive upsampling is also implemented, further reducing LUT size and enhancing performance. HKLUTs achieve an impressively small size of just 100KB, ten times smaller than the nearest competitor, and offer superior energy efficiency and reduced latency on edge devices, making them ideal for such applications. Apart from that, the Activation-Sharing Multi-Resolution (ASMR) coordinate network is introduced, significantly enhancing the efficiency of implicit neural representations (INRs). By leveraging activation sharing across multi-resolution grids, ASMR drastically reduces inference costs, effectively decoupling these costs from network depth to achieve near $O(1)$ complexity. Experimental results show that ASMR can reduce multiply-accumulate (MAC) operations by up to 500 times compared to baseline methods, while also improving reconstruction quality across various domains, including image, audio, video, and 3D shapes. This makes ASMR an exceptional solution for cost-effective, high-performance INRs. Together, this thesis provides practical solutions for deploying DNNs in resource-constrained settings across various applications, including high-level classification, low-level super-resolution, and signal-fitting tasks. It paves the way for future research into efficient AI technologies.
DegreeDoctor of Philosophy
SubjectNeural networks (Computer science)
Deep learning (Machine learning)
Dept/ProgramElectrical and Electronic Engineering
Persistent Identifierhttp://hdl.handle.net/10722/352695

 

DC FieldValueLanguage
dc.contributor.authorLi, Jason Chun Lok-
dc.contributor.author李駿諾-
dc.date.accessioned2024-12-19T09:27:24Z-
dc.date.available2024-12-19T09:27:24Z-
dc.date.issued2024-
dc.identifier.citationLi, J. C. L. [李駿諾]. (2024). Neural network compression & domain-specific efficient architecture design. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/352695-
dc.description.abstractDeep neural networks (DNNs) have transformed fields like computer vision, natural language processing, and reinforcement learning, with the scalable Transformer architecture at the forefront of these advances. However, the substantial computational resources needed to deploy these models present significant practical challenges. As AI technology becomes more embedded in everyday applications, developing efficient models that function within limited resource constraints is crucial. This thesis addresses the development of efficient DNNs for deployment on resource-limited edge devices, which face stringent memory, energy, and latency constraints. It focuses on two pivotal areas: neural network compression and domain-specific architecture design, each presenting innovative approaches to enhance the efficiency and adaptability of DNNs across various applications. The first part of the thesis focuses on neural network compression and introduces two significant advancements. The All-Deformable-Butterfly (All-DeBut) network uses a novel approach with sparse diagonal matrices to achieve profound network compression. This network systematically applies Deformable Butterfly (DeBut) matrices across all layers, facilitated by an innovative automated chain generation scheme. It maintains high performance through contrastive knowledge distillation as the training framework and demonstrates practical deployment on Field Programmable Gate Array (FPGA) platforms. The second contribution is a unifying tensor decomposition framework encapsulating various lightweight convolutional neural network (CNN) architectures. This framework redefines CNN kernels by reshaping them into 3D tensors and applying various tensor decomposition methods. It also integrates efficient zero-parameter, zero-flop shift layers and introduces a novel shift layer pruning technique that preserves accuracy while significantly reducing model size. In the second part of the thesis, domain-specific efficient architecture design is explored. We introduce ultra-compact Hundred-Kilobyte Lookup Tables (HKLUTs) to address the storage limitations of traditional Lookup Tables (LUT)-based methods in Single-Image Super-Resolution (SISR). These HKLUTs reduce the number of input pixels and employ an asymmetric parallel structure to dramatically decrease storage size. A multistage architecture with progressive upsampling is also implemented, further reducing LUT size and enhancing performance. HKLUTs achieve an impressively small size of just 100KB, ten times smaller than the nearest competitor, and offer superior energy efficiency and reduced latency on edge devices, making them ideal for such applications. Apart from that, the Activation-Sharing Multi-Resolution (ASMR) coordinate network is introduced, significantly enhancing the efficiency of implicit neural representations (INRs). By leveraging activation sharing across multi-resolution grids, ASMR drastically reduces inference costs, effectively decoupling these costs from network depth to achieve near $O(1)$ complexity. Experimental results show that ASMR can reduce multiply-accumulate (MAC) operations by up to 500 times compared to baseline methods, while also improving reconstruction quality across various domains, including image, audio, video, and 3D shapes. This makes ASMR an exceptional solution for cost-effective, high-performance INRs. Together, this thesis provides practical solutions for deploying DNNs in resource-constrained settings across various applications, including high-level classification, low-level super-resolution, and signal-fitting tasks. It paves the way for future research into efficient AI technologies.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshNeural networks (Computer science)-
dc.subject.lcshDeep learning (Machine learning)-
dc.titleNeural network compression & domain-specific efficient architecture design-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineElectrical and Electronic Engineering-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2024-
dc.identifier.mmsid991044891407203414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats