Language models in NLP : from architecture design to downstream application

Gao, Jiahui; 高佳慧

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Statistics & Actuarial Science: Theses

postgraduate thesis: Language models in NLP : from architecture design to downstream application

Title	Language models in NLP : from architecture design to downstream application
Authors	Gao, Jiahui 高佳慧
Advisors	Advisor(s):Lee, SMS
Issue Date	2023
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Gao, J. [高佳慧]. (2023). Language models in NLP : from architecture design to downstream application. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Natural language processing (NLP) is a rapidly evolving field, and language models (LMs) play a critical role in advancing research in various NLP tasks, such as language generation, machine translation, sentiment analysis, and question-answering. This thesis presents our contributions toward advancing the research in language models from two perspectives: the design of language model architecture and downstream application. In the first part of the thesis, we aim to enhance the ability of pre-trained language models by discovering an efficient and powerful architecture. Instead of resorting to manual design, we pioneer an approach to automatically discover novel pre-trained language model (PLM) backbone within a flexible search space. To this end, we introduce an efficient Neural Architecture Search (NAS) method, termed OP-NAS, which concurrently optimizes the search algorithm and the evaluation of prospective models. The architecture discovered through this process, referred to as AutoBERT-Zero, significantly surpasses the performance of BERT and its variants across various downstream tasks, while also exhibiting exceptional transfer and scaling abilities. In the second part of this thesis, we explore the practical applications of language models, drawing upon their recent success in the field. Specifically, we examine two primary directions: effective downstream adaptation and the extension of language models to broader domains beyond natural language processing (NLP). In particular, we first introduce SunGen, a novel framework that enables the efficient adaptation of pre-trained language models (PLMs) to downstream tasks. SunGen enhances the quality of PLM-generated data, allowing for the training of a compact task-specific model with substantially fewer parameters. This approach not only achieves superior performance to that of the original PLM but also offers greater efficiency during training and inference. Then, we demonstrate the potential of language models beyond NLP by presenting a novel unpaired cross-lingual method for generating image captions. This method enables captioning tasks to be performed for languages without any caption annotations, effectively bridging the gap between vision and language understanding across different languages. Overall, this thesis contributes to realizing the full potential of language models and provides new insights for future research in this rapidly evolving field.
Degree	Doctor of Philosophy
Subject	Natural language processing (Computer science)
Dept/Program	Statistics and Actuarial Science
Persistent Identifier	http://hdl.handle.net/10722/332194

DC Field	Value	Language
dc.contributor.advisor	Lee, SMS	-
dc.contributor.author	Gao, Jiahui	-
dc.contributor.author	高佳慧	-
dc.date.accessioned	2023-10-04T04:54:38Z	-
dc.date.available	2023-10-04T04:54:38Z	-
dc.date.issued	2023	-
dc.identifier.citation	Gao, J. [高佳慧]. (2023). Language models in NLP : from architecture design to downstream application. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/332194	-
dc.description.abstract	Natural language processing (NLP) is a rapidly evolving field, and language models (LMs) play a critical role in advancing research in various NLP tasks, such as language generation, machine translation, sentiment analysis, and question-answering. This thesis presents our contributions toward advancing the research in language models from two perspectives: the design of language model architecture and downstream application. In the first part of the thesis, we aim to enhance the ability of pre-trained language models by discovering an efficient and powerful architecture. Instead of resorting to manual design, we pioneer an approach to automatically discover novel pre-trained language model (PLM) backbone within a flexible search space. To this end, we introduce an efficient Neural Architecture Search (NAS) method, termed OP-NAS, which concurrently optimizes the search algorithm and the evaluation of prospective models. The architecture discovered through this process, referred to as AutoBERT-Zero, significantly surpasses the performance of BERT and its variants across various downstream tasks, while also exhibiting exceptional transfer and scaling abilities. In the second part of this thesis, we explore the practical applications of language models, drawing upon their recent success in the field. Specifically, we examine two primary directions: effective downstream adaptation and the extension of language models to broader domains beyond natural language processing (NLP). In particular, we first introduce SunGen, a novel framework that enables the efficient adaptation of pre-trained language models (PLMs) to downstream tasks. SunGen enhances the quality of PLM-generated data, allowing for the training of a compact task-specific model with substantially fewer parameters. This approach not only achieves superior performance to that of the original PLM but also offers greater efficiency during training and inference. Then, we demonstrate the potential of language models beyond NLP by presenting a novel unpaired cross-lingual method for generating image captions. This method enables captioning tasks to be performed for languages without any caption annotations, effectively bridging the gap between vision and language understanding across different languages. Overall, this thesis contributes to realizing the full potential of language models and provides new insights for future research in this rapidly evolving field.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Natural language processing (Computer science)	-
dc.title	Language models in NLP : from architecture design to downstream application	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Statistics and Actuarial Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2023	-
dc.identifier.mmsid	991044723911203414	-

File Download

Supplementary

postgraduate thesis: Language models in NLP : from architecture design to downstream application

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats