File Download

There are no files associated with this item.

Conference Paper: Encoding Recurrence into Transformers

TitleEncoding Recurrence into Transformers
Authors
Issue Date1-May-2023
Abstract

This paper novelly breaks down with ignorable loss an RNN layer into a sequence of simple RNNs, each of which can be further rewritten into a lightweight positional encoding matrix of a self-attention, named the Recurrence Encoding Matrix (REM). Thus, recurrent dynamics introduced by the RNN layer can be encapsulated into the positional encodings of a multihead self-attention, and this makes it possible to seamlessly incorporate these recurrent dynamics into a Transformer, leading to a new module, Self-Attention with Recurrence (RSA). The proposed module can leverage the recurrent inductive bias of REMs to achieve a better sample efficiency than its corresponding baseline Transformer, while the self-attention is used to model the remaining non-recurrent signals. The relative proportions of these two components are controlled by a data-driven gated mechanism, and the effectiveness of RSA modules are demonstrated by four sequential learning tasks.


Persistent Identifierhttp://hdl.handle.net/10722/338289

 

DC FieldValueLanguage
dc.contributor.authorHuang, Feiqing-
dc.contributor.authorLu, Kexin-
dc.contributor.authorCAI, Yuxi-
dc.contributor.authorQin, Zhen-
dc.contributor.authorFang, Yanwen-
dc.contributor.authorTian, Guangjian-
dc.contributor.authorLi, Guodong-
dc.date.accessioned2024-03-11T10:27:45Z-
dc.date.available2024-03-11T10:27:45Z-
dc.date.issued2023-05-01-
dc.identifier.urihttp://hdl.handle.net/10722/338289-
dc.description.abstract<p>This paper novelly breaks down with ignorable loss an RNN layer into a sequence of simple RNNs, each of which can be further rewritten into a lightweight positional encoding matrix of a self-attention, named the Recurrence Encoding Matrix (REM). Thus, recurrent dynamics introduced by the RNN layer can be encapsulated into the positional encodings of a multihead self-attention, and this makes it possible to seamlessly incorporate these recurrent dynamics into a Transformer, leading to a new module, Self-Attention with Recurrence (RSA). The proposed module can leverage the recurrent inductive bias of REMs to achieve a better sample efficiency than its corresponding baseline Transformer, while the self-attention is used to model the remaining non-recurrent signals. The relative proportions of these two components are controlled by a data-driven gated mechanism, and the effectiveness of RSA modules are demonstrated by four sequential learning tasks.</p>-
dc.languageeng-
dc.relation.ispartofThe 11th International Conference on Learning Representations (ICLR 2023) (01/05/2023-05/05/2023, Kigali, Rwanda)-
dc.titleEncoding Recurrence into Transformers-
dc.typeConference_Paper-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats