File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Modeling sequential dependence in statistics and machine learning
Title | Modeling sequential dependence in statistics and machine learning |
---|---|
Authors | |
Advisors | Advisor(s):Li, G |
Issue Date | 2023 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Huang, F.. (2023). Modeling sequential dependence in statistics and machine learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | In the era of big data, sequential data modeling has a wide scope of applications ranging from weather forecast, energy and stock markets prediction to music generation and machine translation. The key to modeling sequential data lies in learning the sequential dependence aka serial dependence therein. It has attracted the attention from both statisticians and machine learning practitioners, and many models have been proposed to tackle large sequential data. In statistics, models such as VAR and VARMA are commonly adopted to analyze high-dimensional time series. On the other hand, machine learning algorithms apply deep neural networks, including recurrent networks and Transformer-based models, to learn from audio, video or language datasets. This thesis proposes methods from statistical and machine learning viewpoints, to capture the sequential dependence with improved forecasting performance and sample efficiency against other existing methods.
In the first part of this thesis, a general framework is proposed for modeling high-dimensional low-rank linear time series. Specifically, we develop the estimation method and algorithm for high-dimensional general linear processes (GLP), with detailed statistical and convergence analysis. Albeit being the most general linear time series model, GLP has not yet been studied systematically in the high-dimensional literature. This thesis contributes to filling this gap. Simulations are conducted to verify the theoretical results, and empirical studies demonstrate the usefulness of the proposed method.
Secondly, this thesis introduces a novel VARMA variant, which not only preserves the parsimony and rich temporal dependence structures of VARMA, but also avoids its two notorious drawbacks, i.e. non-identifiability and computational intractability, even for moderate-dimensional data. Moreover, its parameter estimation is scalable with respect to the complexity of temporal dependence, namely the number of decay patterns constituting the autoregressive structure; hence it is called the scalable ARMA (SARMA) model. In the high-dimensional setup, we further impose a low-Tucker-rank assumption on the coefficient tensor of the proposed model. This further leads to desirable dynamic factor interpretations, making the model especially suited for financial and economic data. We derive non-asymptotic error bounds for the proposed estimator and propose a tractable alternating least squares algorithm. Theoretical and computational properties of the proposed method are verified by simulation studies, and the advantages over existing methods are illustrated in real applications.
Lastly, inspired by derivation of the SARMA model, this thesis applies the same technique to rewrite an RNN layer into a lightweight positional encoding matrix of a self-attention, named the Recurrence Encoding Matrix (REM). This motivates a solution to seamlessly incorporate recurrent dynamics of an RNN into a Transformer, leading to the newly proposed Self-Attention with Recurrence (RSA) module. The proposed module can leverage the recurrent inductive bias of REMs to model the recurrent signals, while the self-attention is used to model the remaining non-recurrent signals. The relative proportions of these two components are controlled by a data-driven gated mechanism, which is the key to a better sample efficiency than its corresponding baseline Transformer. The effectiveness of RSA modules are demonstrated by four sequential learning tasks in machine learning. |
Degree | Doctor of Philosophy |
Subject | Sequential analysis Machine learning |
Dept/Program | Statistics and Actuarial Science |
Persistent Identifier | http://hdl.handle.net/10722/328600 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Li, G | - |
dc.contributor.author | Huang, Feiqing | - |
dc.date.accessioned | 2023-06-29T05:44:35Z | - |
dc.date.available | 2023-06-29T05:44:35Z | - |
dc.date.issued | 2023 | - |
dc.identifier.citation | Huang, F.. (2023). Modeling sequential dependence in statistics and machine learning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/328600 | - |
dc.description.abstract | In the era of big data, sequential data modeling has a wide scope of applications ranging from weather forecast, energy and stock markets prediction to music generation and machine translation. The key to modeling sequential data lies in learning the sequential dependence aka serial dependence therein. It has attracted the attention from both statisticians and machine learning practitioners, and many models have been proposed to tackle large sequential data. In statistics, models such as VAR and VARMA are commonly adopted to analyze high-dimensional time series. On the other hand, machine learning algorithms apply deep neural networks, including recurrent networks and Transformer-based models, to learn from audio, video or language datasets. This thesis proposes methods from statistical and machine learning viewpoints, to capture the sequential dependence with improved forecasting performance and sample efficiency against other existing methods. In the first part of this thesis, a general framework is proposed for modeling high-dimensional low-rank linear time series. Specifically, we develop the estimation method and algorithm for high-dimensional general linear processes (GLP), with detailed statistical and convergence analysis. Albeit being the most general linear time series model, GLP has not yet been studied systematically in the high-dimensional literature. This thesis contributes to filling this gap. Simulations are conducted to verify the theoretical results, and empirical studies demonstrate the usefulness of the proposed method. Secondly, this thesis introduces a novel VARMA variant, which not only preserves the parsimony and rich temporal dependence structures of VARMA, but also avoids its two notorious drawbacks, i.e. non-identifiability and computational intractability, even for moderate-dimensional data. Moreover, its parameter estimation is scalable with respect to the complexity of temporal dependence, namely the number of decay patterns constituting the autoregressive structure; hence it is called the scalable ARMA (SARMA) model. In the high-dimensional setup, we further impose a low-Tucker-rank assumption on the coefficient tensor of the proposed model. This further leads to desirable dynamic factor interpretations, making the model especially suited for financial and economic data. We derive non-asymptotic error bounds for the proposed estimator and propose a tractable alternating least squares algorithm. Theoretical and computational properties of the proposed method are verified by simulation studies, and the advantages over existing methods are illustrated in real applications. Lastly, inspired by derivation of the SARMA model, this thesis applies the same technique to rewrite an RNN layer into a lightweight positional encoding matrix of a self-attention, named the Recurrence Encoding Matrix (REM). This motivates a solution to seamlessly incorporate recurrent dynamics of an RNN into a Transformer, leading to the newly proposed Self-Attention with Recurrence (RSA) module. The proposed module can leverage the recurrent inductive bias of REMs to model the recurrent signals, while the self-attention is used to model the remaining non-recurrent signals. The relative proportions of these two components are controlled by a data-driven gated mechanism, which is the key to a better sample efficiency than its corresponding baseline Transformer. The effectiveness of RSA modules are demonstrated by four sequential learning tasks in machine learning. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Sequential analysis | - |
dc.subject.lcsh | Machine learning | - |
dc.title | Modeling sequential dependence in statistics and machine learning | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Statistics and Actuarial Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2023 | - |
dc.identifier.mmsid | 991044695780603414 | - |