File Download
Supplementary

postgraduate thesis: Language models in NLP : from architecture design to downstream application

TitleLanguage models in NLP : from architecture design to downstream application
Authors
Advisors
Advisor(s):Lee, SMS
Issue Date2023
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Gao, J. [高佳慧]. (2023). Language models in NLP : from architecture design to downstream application. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractNatural language processing (NLP) is a rapidly evolving field, and language models (LMs) play a critical role in advancing research in various NLP tasks, such as language generation, machine translation, sentiment analysis, and question-answering. This thesis presents our contributions toward advancing the research in language models from two perspectives: the design of language model architecture and downstream application. In the first part of the thesis, we aim to enhance the ability of pre-trained language models by discovering an efficient and powerful architecture. Instead of resorting to manual design, we pioneer an approach to automatically discover novel pre-trained language model (PLM) backbone within a flexible search space. To this end, we introduce an efficient Neural Architecture Search (NAS) method, termed OP-NAS, which concurrently optimizes the search algorithm and the evaluation of prospective models. The architecture discovered through this process, referred to as AutoBERT-Zero, significantly surpasses the performance of BERT and its variants across various downstream tasks, while also exhibiting exceptional transfer and scaling abilities. In the second part of this thesis, we explore the practical applications of language models, drawing upon their recent success in the field. Specifically, we examine two primary directions: effective downstream adaptation and the extension of language models to broader domains beyond natural language processing (NLP). In particular, we first introduce SunGen, a novel framework that enables the efficient adaptation of pre-trained language models (PLMs) to downstream tasks. SunGen enhances the quality of PLM-generated data, allowing for the training of a compact task-specific model with substantially fewer parameters. This approach not only achieves superior performance to that of the original PLM but also offers greater efficiency during training and inference. Then, we demonstrate the potential of language models beyond NLP by presenting a novel unpaired cross-lingual method for generating image captions. This method enables captioning tasks to be performed for languages without any caption annotations, effectively bridging the gap between vision and language understanding across different languages. Overall, this thesis contributes to realizing the full potential of language models and provides new insights for future research in this rapidly evolving field.
DegreeDoctor of Philosophy
SubjectNatural language processing (Computer science)
Dept/ProgramStatistics and Actuarial Science
Persistent Identifierhttp://hdl.handle.net/10722/332194

 

DC FieldValueLanguage
dc.contributor.advisorLee, SMS-
dc.contributor.authorGao, Jiahui-
dc.contributor.author高佳慧-
dc.date.accessioned2023-10-04T04:54:38Z-
dc.date.available2023-10-04T04:54:38Z-
dc.date.issued2023-
dc.identifier.citationGao, J. [高佳慧]. (2023). Language models in NLP : from architecture design to downstream application. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/332194-
dc.description.abstractNatural language processing (NLP) is a rapidly evolving field, and language models (LMs) play a critical role in advancing research in various NLP tasks, such as language generation, machine translation, sentiment analysis, and question-answering. This thesis presents our contributions toward advancing the research in language models from two perspectives: the design of language model architecture and downstream application. In the first part of the thesis, we aim to enhance the ability of pre-trained language models by discovering an efficient and powerful architecture. Instead of resorting to manual design, we pioneer an approach to automatically discover novel pre-trained language model (PLM) backbone within a flexible search space. To this end, we introduce an efficient Neural Architecture Search (NAS) method, termed OP-NAS, which concurrently optimizes the search algorithm and the evaluation of prospective models. The architecture discovered through this process, referred to as AutoBERT-Zero, significantly surpasses the performance of BERT and its variants across various downstream tasks, while also exhibiting exceptional transfer and scaling abilities. In the second part of this thesis, we explore the practical applications of language models, drawing upon their recent success in the field. Specifically, we examine two primary directions: effective downstream adaptation and the extension of language models to broader domains beyond natural language processing (NLP). In particular, we first introduce SunGen, a novel framework that enables the efficient adaptation of pre-trained language models (PLMs) to downstream tasks. SunGen enhances the quality of PLM-generated data, allowing for the training of a compact task-specific model with substantially fewer parameters. This approach not only achieves superior performance to that of the original PLM but also offers greater efficiency during training and inference. Then, we demonstrate the potential of language models beyond NLP by presenting a novel unpaired cross-lingual method for generating image captions. This method enables captioning tasks to be performed for languages without any caption annotations, effectively bridging the gap between vision and language understanding across different languages. Overall, this thesis contributes to realizing the full potential of language models and provides new insights for future research in this rapidly evolving field. -
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshNatural language processing (Computer science)-
dc.titleLanguage models in NLP : from architecture design to downstream application-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineStatistics and Actuarial Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2023-
dc.identifier.mmsid991044723911203414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats