File Download
Supplementary

postgraduate thesis: Neural network pruning : the applications in inference acceleration and more efficient pruning algorithms

TitleNeural network pruning : the applications in inference acceleration and more efficient pruning algorithms
Authors
Advisors
Issue Date2020
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Liu, J. [刘俊杰]. (2020). Neural network pruning : the applications in inference acceleration and more efficient pruning algorithms. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractDeep neural networks (DNN) have achieved remarkable success in many challenging tasks. However, the inference process of deep neural network models is highly memory-intensive and computation-intensive due to the over-parameterization property of deep neural networks, which impedes the deployment of DNN models in resource-limited and latency-sensitive scenarios. Model compression has been considered as the remedy to improve the storage and computation efficiency of deep neural networks. Among all the approaches to model compression, network pruning can remove over 90% of the model parameters with little loss of performance. Network pruning also helps to avoid over-fitting therefore better generalization performance can be achieved with properly pruned neural networks. Typical pruning methods adopt a three-stage pipeline: 1) training a dense overparameterized model, 2) prune some portion of the less important parameter in the pre-trained dense model, 3) fine-tune the pruned sparse model to regain the model performance. However, this traditional three-stage pipeline has two critical problems. Firstly, the expensive pruning and fine-tuning iterations require many additional training epochs. Secondly, the process of updating network parameters and the process of finding the optimal sparse structure is decoupled, which fails to obtain the optimal sparse subnetwork. The contributions in this thesis consist of two parts. Firstly, we extend the traditional three-stage pipeline on recurrent neural networks especially the long short-term memory network (LSTM), and propose hidden-state pruning that achieves higher compression and acceleration ratio. Secondly, we propose a novel sparse training algorithm that seamlessly combines the training and pruning process. Thus the expensive pruning and fine-tuning iterations are circumvented and optimal sparse structure can be revealed with the same budget as training dense models. Besides, the potential contribution of network pruning on network architecture design is studied with the proposed dynamic pruning methods.
DegreeMaster of Philosophy
SubjectNeural networks (Computer science)
Dept/ProgramElectrical and Electronic Engineering
Persistent Identifierhttp://hdl.handle.net/10722/286784

 

DC FieldValueLanguage
dc.contributor.advisorSo, HKH-
dc.contributor.advisorWong Lui, KS-
dc.contributor.authorLiu, Junjie-
dc.contributor.author刘俊杰-
dc.date.accessioned2020-09-05T01:20:55Z-
dc.date.available2020-09-05T01:20:55Z-
dc.date.issued2020-
dc.identifier.citationLiu, J. [刘俊杰]. (2020). Neural network pruning : the applications in inference acceleration and more efficient pruning algorithms. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/286784-
dc.description.abstractDeep neural networks (DNN) have achieved remarkable success in many challenging tasks. However, the inference process of deep neural network models is highly memory-intensive and computation-intensive due to the over-parameterization property of deep neural networks, which impedes the deployment of DNN models in resource-limited and latency-sensitive scenarios. Model compression has been considered as the remedy to improve the storage and computation efficiency of deep neural networks. Among all the approaches to model compression, network pruning can remove over 90% of the model parameters with little loss of performance. Network pruning also helps to avoid over-fitting therefore better generalization performance can be achieved with properly pruned neural networks. Typical pruning methods adopt a three-stage pipeline: 1) training a dense overparameterized model, 2) prune some portion of the less important parameter in the pre-trained dense model, 3) fine-tune the pruned sparse model to regain the model performance. However, this traditional three-stage pipeline has two critical problems. Firstly, the expensive pruning and fine-tuning iterations require many additional training epochs. Secondly, the process of updating network parameters and the process of finding the optimal sparse structure is decoupled, which fails to obtain the optimal sparse subnetwork. The contributions in this thesis consist of two parts. Firstly, we extend the traditional three-stage pipeline on recurrent neural networks especially the long short-term memory network (LSTM), and propose hidden-state pruning that achieves higher compression and acceleration ratio. Secondly, we propose a novel sparse training algorithm that seamlessly combines the training and pruning process. Thus the expensive pruning and fine-tuning iterations are circumvented and optimal sparse structure can be revealed with the same budget as training dense models. Besides, the potential contribution of network pruning on network architecture design is studied with the proposed dynamic pruning methods.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshNeural networks (Computer science)-
dc.titleNeural network pruning : the applications in inference acceleration and more efficient pruning algorithms-
dc.typePG_Thesis-
dc.description.thesisnameMaster of Philosophy-
dc.description.thesislevelMaster-
dc.description.thesisdisciplineElectrical and Electronic Engineering-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2020-
dc.identifier.mmsid991044268205703414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats