File Download
There are no files associated with this item.
Supplementary
-
Citations:
- Appears in Collections:
Conference Paper: Encoding Recurrence into Transformers
Title | Encoding Recurrence into Transformers |
---|---|
Authors | |
Issue Date | 1-May-2023 |
Abstract | This paper novelly breaks down with ignorable loss an RNN layer into a sequence of simple RNNs, each of which can be further rewritten into a lightweight positional encoding matrix of a self-attention, named the Recurrence Encoding Matrix (REM). Thus, recurrent dynamics introduced by the RNN layer can be encapsulated into the positional encodings of a multihead self-attention, and this makes it possible to seamlessly incorporate these recurrent dynamics into a Transformer, leading to a new module, Self-Attention with Recurrence (RSA). The proposed module can leverage the recurrent inductive bias of REMs to achieve a better sample efficiency than its corresponding baseline Transformer, while the self-attention is used to model the remaining non-recurrent signals. The relative proportions of these two components are controlled by a data-driven gated mechanism, and the effectiveness of RSA modules are demonstrated by four sequential learning tasks. |
Persistent Identifier | http://hdl.handle.net/10722/338289 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Huang, Feiqing | - |
dc.contributor.author | Lu, Kexin | - |
dc.contributor.author | CAI, Yuxi | - |
dc.contributor.author | Qin, Zhen | - |
dc.contributor.author | Fang, Yanwen | - |
dc.contributor.author | Tian, Guangjian | - |
dc.contributor.author | Li, Guodong | - |
dc.date.accessioned | 2024-03-11T10:27:45Z | - |
dc.date.available | 2024-03-11T10:27:45Z | - |
dc.date.issued | 2023-05-01 | - |
dc.identifier.uri | http://hdl.handle.net/10722/338289 | - |
dc.description.abstract | <p>This paper novelly breaks down with ignorable loss an RNN layer into a sequence of simple RNNs, each of which can be further rewritten into a lightweight positional encoding matrix of a self-attention, named the Recurrence Encoding Matrix (REM). Thus, recurrent dynamics introduced by the RNN layer can be encapsulated into the positional encodings of a multihead self-attention, and this makes it possible to seamlessly incorporate these recurrent dynamics into a Transformer, leading to a new module, Self-Attention with Recurrence (RSA). The proposed module can leverage the recurrent inductive bias of REMs to achieve a better sample efficiency than its corresponding baseline Transformer, while the self-attention is used to model the remaining non-recurrent signals. The relative proportions of these two components are controlled by a data-driven gated mechanism, and the effectiveness of RSA modules are demonstrated by four sequential learning tasks.</p> | - |
dc.language | eng | - |
dc.relation.ispartof | The 11th International Conference on Learning Representations (ICLR 2023) (01/05/2023-05/05/2023, Kigali, Rwanda) | - |
dc.title | Encoding Recurrence into Transformers | - |
dc.type | Conference_Paper | - |