Novel compression techniques for deep neural networks in vision tasks

Ran, Jie; 冉婕

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Electrical & Electronic Engineering: Theses

postgraduate thesis: Novel compression techniques for deep neural networks in vision tasks

Title	Novel compression techniques for deep neural networks in vision tasks
Authors	Ran, Jie 冉婕
Issue Date	2023
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Ran, J. [冉婕]. (2023). Novel compression techniques for deep neural networks in vision tasks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Deep neural networks (DNNs) have made immense progress in various fields such as classification and object detection. Although DNNs have been successful in improving their performance by increasing the number of parameters and using deeper structures, they pose significant challenges when deployed on edge devices with limited hardware resources. Therefore, researchers have been motivated to explore DNN compression approaches to obtain compact models that are storage-friendly and provide fast inference without compromising accuracy. This thesis explores two existing compression techniques including low-rank factorization and pruning, and another two promising but under-explored directions which are sparse linear transform and product quantization, to tackle the problem of DNN compression. Exploring these directions can lead to more insights into compressing DNNs for efficient use in edge devices. Low-rank factorization methods aim to reduce the complexity of fully connected and convolutional layers by replacing them with low-rank factors. However, most existing techniques in this category use fixed ranks and consider kernel tensors separately. This thesis acknowledges the unexplored potential of tensor ranks and proposes a regularizer that dynamically and globally searches for decomposed tensor ranks during training. This approach allows for achieving a delicate balance between model size and performance. Pruning is a well-known technique used to minimize the computational cost of neural networks. While most existing pruning schemes operate in the spatial domain, it is worth noting that information exploration in the frequency domain is relatively less common. Therefore, this thesis aims to bridge this gap by connecting a previously mysterious rank-based metric in the spatial domain to a novel, analytical view in the frequency domain. This approach provides a more comprehensive understanding of filter importance. Along this route, an efficient energy-zone metric based on Fast Fourier Transform (FFT) is proposed to evaluate filters' importance from a spectral perspective. This innovative approach has the potential to enhance the performance of pruning algorithms in deep learning. Sparse and structured matrix factorization is a new compression strategy with untapped potential. Existing works in this area are limited in terms of matrix shapes and may not result in significant compression. This thesis proposes a new, flexible sparse linear transform based on traditional butterfly matrices. By inheriting the learnable hierarchy of butterflies, this approach yields more lightweight networks without sacrificing accuracy. It has significant implications for network compression, providing a more efficient and effective method. Instead of linear transform, a novel DNN has been developed that uses product quantization. It has both angle- and distance-based measures for similarity matching of prototypes to find the right complexity-accuracy tradeoff. The distance-based version of the network uses only adders and omits multipliers. The proposed structure can be trained end-to-end and inference is done through a similarity search protocol that resembles a content addressable memory (CAM). It is lightweight and hardware-generic which makes it perfect for edge AI. Comparable accuracy can be achieved without the use of multipliers in contrast to multi-bit networks. The framework is an interesting development that is expected to lead to further advancements. (Total words: 496)
Degree	Doctor of Philosophy
Subject	Computer vision Deep learning (Machine learning) Neural networks (Computer science)
Dept/Program	Electrical and Electronic Engineering
Persistent Identifier	http://hdl.handle.net/10722/335121

DC Field	Value	Language
dc.contributor.author	Ran, Jie	-
dc.contributor.author	冉婕	-
dc.date.accessioned	2023-11-13T07:44:41Z	-
dc.date.available	2023-11-13T07:44:41Z	-
dc.date.issued	2023	-
dc.identifier.citation	Ran, J. [冉婕]. (2023). Novel compression techniques for deep neural networks in vision tasks. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/335121	-
dc.description.abstract	Deep neural networks (DNNs) have made immense progress in various fields such as classification and object detection. Although DNNs have been successful in improving their performance by increasing the number of parameters and using deeper structures, they pose significant challenges when deployed on edge devices with limited hardware resources. Therefore, researchers have been motivated to explore DNN compression approaches to obtain compact models that are storage-friendly and provide fast inference without compromising accuracy. This thesis explores two existing compression techniques including low-rank factorization and pruning, and another two promising but under-explored directions which are sparse linear transform and product quantization, to tackle the problem of DNN compression. Exploring these directions can lead to more insights into compressing DNNs for efficient use in edge devices. Low-rank factorization methods aim to reduce the complexity of fully connected and convolutional layers by replacing them with low-rank factors. However, most existing techniques in this category use fixed ranks and consider kernel tensors separately. This thesis acknowledges the unexplored potential of tensor ranks and proposes a regularizer that dynamically and globally searches for decomposed tensor ranks during training. This approach allows for achieving a delicate balance between model size and performance. Pruning is a well-known technique used to minimize the computational cost of neural networks. While most existing pruning schemes operate in the spatial domain, it is worth noting that information exploration in the frequency domain is relatively less common. Therefore, this thesis aims to bridge this gap by connecting a previously mysterious rank-based metric in the spatial domain to a novel, analytical view in the frequency domain. This approach provides a more comprehensive understanding of filter importance. Along this route, an efficient energy-zone metric based on Fast Fourier Transform (FFT) is proposed to evaluate filters' importance from a spectral perspective. This innovative approach has the potential to enhance the performance of pruning algorithms in deep learning. Sparse and structured matrix factorization is a new compression strategy with untapped potential. Existing works in this area are limited in terms of matrix shapes and may not result in significant compression. This thesis proposes a new, flexible sparse linear transform based on traditional butterfly matrices. By inheriting the learnable hierarchy of butterflies, this approach yields more lightweight networks without sacrificing accuracy. It has significant implications for network compression, providing a more efficient and effective method. Instead of linear transform, a novel DNN has been developed that uses product quantization. It has both angle- and distance-based measures for similarity matching of prototypes to find the right complexity-accuracy tradeoff. The distance-based version of the network uses only adders and omits multipliers. The proposed structure can be trained end-to-end and inference is done through a similarity search protocol that resembles a content addressable memory (CAM). It is lightweight and hardware-generic which makes it perfect for edge AI. Comparable accuracy can be achieved without the use of multipliers in contrast to multi-bit networks. The framework is an interesting development that is expected to lead to further advancements. (Total words: 496)	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Computer vision	-
dc.subject.lcsh	Deep learning (Machine learning)	-
dc.subject.lcsh	Neural networks (Computer science)	-
dc.title	Novel compression techniques for deep neural networks in vision tasks	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Electrical and Electronic Engineering	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2024	-
dc.identifier.mmsid	991044736607503414	-

File Download

Supplementary

postgraduate thesis: Novel compression techniques for deep neural networks in vision tasks

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats