File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Novel compression techniques for deep neural networks in vision tasks
Title | Novel compression techniques for deep neural networks in vision tasks |
---|---|
Authors | |
Issue Date | 2023 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Ran, J. [冉婕]. (2023). Novel compression techniques for deep neural networks in vision tasks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Deep neural networks (DNNs) have made immense progress in various fields such as classification and object detection. Although DNNs have been successful in improving their performance by increasing the number of parameters and using deeper structures, they pose significant challenges when deployed on edge devices with limited hardware resources. Therefore, researchers have been motivated to explore DNN compression approaches to obtain compact models that are storage-friendly and provide fast inference without compromising accuracy. This thesis explores two existing compression techniques including low-rank factorization and pruning, and another two promising but under-explored directions which are sparse linear transform and product quantization, to tackle the problem of DNN compression. Exploring these directions can lead to more insights into compressing DNNs for efficient use in edge devices.
Low-rank factorization methods aim to reduce the complexity of fully connected and convolutional layers by replacing them with low-rank factors. However, most existing techniques in this category use fixed ranks and consider kernel tensors separately. This thesis acknowledges the unexplored potential of tensor ranks and proposes a regularizer that dynamically and globally searches for decomposed tensor ranks during training. This approach allows for achieving a delicate balance between model size and performance.
Pruning is a well-known technique used to minimize the computational cost of neural networks. While most existing pruning schemes operate in the spatial domain, it is worth noting that information exploration in the frequency domain is relatively less common. Therefore, this thesis aims to bridge this gap by connecting a previously mysterious rank-based metric in the spatial domain to a novel, analytical view in the frequency domain. This approach provides a more comprehensive understanding of filter importance. Along this route, an efficient energy-zone metric based on Fast Fourier Transform (FFT) is proposed to evaluate filters' importance from a spectral perspective. This innovative approach has the potential to enhance the performance of pruning algorithms in deep learning.
Sparse and structured matrix factorization is a new compression strategy with untapped potential. Existing works in this area are limited in terms of matrix shapes and may not result in significant compression. This thesis proposes a new, flexible sparse linear transform based on traditional butterfly matrices. By inheriting the learnable hierarchy of butterflies, this approach yields more lightweight networks without sacrificing accuracy. It has significant implications for network compression, providing a more efficient and effective method.
Instead of linear transform, a novel DNN has been developed that uses product quantization. It has both angle- and distance-based measures for similarity matching of prototypes to find the right complexity-accuracy tradeoff. The distance-based version of the network uses only adders and omits multipliers. The proposed structure can be trained end-to-end and inference is done through a similarity search protocol that resembles a content addressable memory (CAM). It is lightweight and hardware-generic which makes it perfect for edge AI. Comparable accuracy can be achieved without the use of multipliers in contrast to multi-bit networks. The framework is an interesting development that is expected to lead to further advancements. (Total words: 496) |
Degree | Doctor of Philosophy |
Subject | Computer vision Deep learning (Machine learning) Neural networks (Computer science) |
Dept/Program | Electrical and Electronic Engineering |
Persistent Identifier | http://hdl.handle.net/10722/335121 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Ran, Jie | - |
dc.contributor.author | 冉婕 | - |
dc.date.accessioned | 2023-11-13T07:44:41Z | - |
dc.date.available | 2023-11-13T07:44:41Z | - |
dc.date.issued | 2023 | - |
dc.identifier.citation | Ran, J. [冉婕]. (2023). Novel compression techniques for deep neural networks in vision tasks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/335121 | - |
dc.description.abstract | Deep neural networks (DNNs) have made immense progress in various fields such as classification and object detection. Although DNNs have been successful in improving their performance by increasing the number of parameters and using deeper structures, they pose significant challenges when deployed on edge devices with limited hardware resources. Therefore, researchers have been motivated to explore DNN compression approaches to obtain compact models that are storage-friendly and provide fast inference without compromising accuracy. This thesis explores two existing compression techniques including low-rank factorization and pruning, and another two promising but under-explored directions which are sparse linear transform and product quantization, to tackle the problem of DNN compression. Exploring these directions can lead to more insights into compressing DNNs for efficient use in edge devices. Low-rank factorization methods aim to reduce the complexity of fully connected and convolutional layers by replacing them with low-rank factors. However, most existing techniques in this category use fixed ranks and consider kernel tensors separately. This thesis acknowledges the unexplored potential of tensor ranks and proposes a regularizer that dynamically and globally searches for decomposed tensor ranks during training. This approach allows for achieving a delicate balance between model size and performance. Pruning is a well-known technique used to minimize the computational cost of neural networks. While most existing pruning schemes operate in the spatial domain, it is worth noting that information exploration in the frequency domain is relatively less common. Therefore, this thesis aims to bridge this gap by connecting a previously mysterious rank-based metric in the spatial domain to a novel, analytical view in the frequency domain. This approach provides a more comprehensive understanding of filter importance. Along this route, an efficient energy-zone metric based on Fast Fourier Transform (FFT) is proposed to evaluate filters' importance from a spectral perspective. This innovative approach has the potential to enhance the performance of pruning algorithms in deep learning. Sparse and structured matrix factorization is a new compression strategy with untapped potential. Existing works in this area are limited in terms of matrix shapes and may not result in significant compression. This thesis proposes a new, flexible sparse linear transform based on traditional butterfly matrices. By inheriting the learnable hierarchy of butterflies, this approach yields more lightweight networks without sacrificing accuracy. It has significant implications for network compression, providing a more efficient and effective method. Instead of linear transform, a novel DNN has been developed that uses product quantization. It has both angle- and distance-based measures for similarity matching of prototypes to find the right complexity-accuracy tradeoff. The distance-based version of the network uses only adders and omits multipliers. The proposed structure can be trained end-to-end and inference is done through a similarity search protocol that resembles a content addressable memory (CAM). It is lightweight and hardware-generic which makes it perfect for edge AI. Comparable accuracy can be achieved without the use of multipliers in contrast to multi-bit networks. The framework is an interesting development that is expected to lead to further advancements. (Total words: 496) | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Computer vision | - |
dc.subject.lcsh | Deep learning (Machine learning) | - |
dc.subject.lcsh | Neural networks (Computer science) | - |
dc.title | Novel compression techniques for deep neural networks in vision tasks | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Electrical and Electronic Engineering | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2024 | - |
dc.identifier.mmsid | 991044736607503414 | - |