File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Language models in NLP : from architecture design to downstream application
Title | Language models in NLP : from architecture design to downstream application |
---|---|
Authors | |
Advisors | Advisor(s):Lee, SMS |
Issue Date | 2023 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Gao, J. [高佳慧]. (2023). Language models in NLP : from architecture design to downstream application. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Natural language processing (NLP) is a rapidly evolving field, and language models (LMs) play a critical role in advancing research in various NLP tasks, such as language generation, machine translation, sentiment analysis, and question-answering. This thesis presents our contributions toward advancing the research in language models from two perspectives: the design of language model architecture and downstream application.
In the first part of the thesis, we aim to enhance the ability of pre-trained language models by discovering an efficient and powerful architecture. Instead of resorting to manual design, we pioneer an approach to automatically discover novel pre-trained language model (PLM) backbone within a flexible search space. To this end, we introduce an efficient Neural Architecture Search (NAS) method, termed OP-NAS, which concurrently optimizes the search algorithm and the evaluation of prospective models. The architecture discovered through this process, referred to as AutoBERT-Zero, significantly surpasses the performance of BERT and its variants across various downstream tasks, while also exhibiting exceptional transfer and scaling abilities.
In the second part of this thesis, we explore the practical applications of language models, drawing upon their recent success in the field. Specifically, we examine two primary directions: effective downstream adaptation and the extension of language models to broader domains beyond natural language processing (NLP). In particular, we first introduce SunGen, a novel framework that enables the efficient adaptation of pre-trained language models (PLMs) to downstream tasks. SunGen enhances the quality of PLM-generated data, allowing for the training of a compact task-specific model with substantially fewer parameters. This approach not only achieves superior performance to that of the original PLM but also offers greater efficiency during training and inference. Then, we demonstrate the potential of language models beyond NLP by presenting a novel unpaired cross-lingual method for generating image captions. This method enables captioning tasks to be performed for languages without any caption annotations, effectively bridging the gap between vision and language understanding across different languages. Overall, this thesis contributes to realizing the full potential of language models and provides new insights for future research in this rapidly evolving field.
|
Degree | Doctor of Philosophy |
Subject | Natural language processing (Computer science) |
Dept/Program | Statistics and Actuarial Science |
Persistent Identifier | http://hdl.handle.net/10722/332194 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Lee, SMS | - |
dc.contributor.author | Gao, Jiahui | - |
dc.contributor.author | 高佳慧 | - |
dc.date.accessioned | 2023-10-04T04:54:38Z | - |
dc.date.available | 2023-10-04T04:54:38Z | - |
dc.date.issued | 2023 | - |
dc.identifier.citation | Gao, J. [高佳慧]. (2023). Language models in NLP : from architecture design to downstream application. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/332194 | - |
dc.description.abstract | Natural language processing (NLP) is a rapidly evolving field, and language models (LMs) play a critical role in advancing research in various NLP tasks, such as language generation, machine translation, sentiment analysis, and question-answering. This thesis presents our contributions toward advancing the research in language models from two perspectives: the design of language model architecture and downstream application. In the first part of the thesis, we aim to enhance the ability of pre-trained language models by discovering an efficient and powerful architecture. Instead of resorting to manual design, we pioneer an approach to automatically discover novel pre-trained language model (PLM) backbone within a flexible search space. To this end, we introduce an efficient Neural Architecture Search (NAS) method, termed OP-NAS, which concurrently optimizes the search algorithm and the evaluation of prospective models. The architecture discovered through this process, referred to as AutoBERT-Zero, significantly surpasses the performance of BERT and its variants across various downstream tasks, while also exhibiting exceptional transfer and scaling abilities. In the second part of this thesis, we explore the practical applications of language models, drawing upon their recent success in the field. Specifically, we examine two primary directions: effective downstream adaptation and the extension of language models to broader domains beyond natural language processing (NLP). In particular, we first introduce SunGen, a novel framework that enables the efficient adaptation of pre-trained language models (PLMs) to downstream tasks. SunGen enhances the quality of PLM-generated data, allowing for the training of a compact task-specific model with substantially fewer parameters. This approach not only achieves superior performance to that of the original PLM but also offers greater efficiency during training and inference. Then, we demonstrate the potential of language models beyond NLP by presenting a novel unpaired cross-lingual method for generating image captions. This method enables captioning tasks to be performed for languages without any caption annotations, effectively bridging the gap between vision and language understanding across different languages. Overall, this thesis contributes to realizing the full potential of language models and provides new insights for future research in this rapidly evolving field. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Natural language processing (Computer science) | - |
dc.title | Language models in NLP : from architecture design to downstream application | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Statistics and Actuarial Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2023 | - |
dc.identifier.mmsid | 991044723911203414 | - |