CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling

Zhang, Jun; Jiang, Shuyang; Feng, Jiangtao; Zheng, Lin; Kong, Lingpeng

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.48550/arXiv.2210.07661

Supplementary

Citations:
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling

Title	CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling
Authors	Zhang, Jun Jiang, Shuyang Feng, Jiangtao Zheng, Lin Kong, Lingpeng
Issue Date	18-Jul-2023
Abstract	Transformer has achieved remarkable success in language, image, and speech processing. Recently, various efficient attention architectures have been proposed to improve transformer's efficiency while largely preserving its efficacy, especially in modeling long sequences. A widely-used benchmark to test these efficient methods' capability on long-range modeling is Long Range Arena (LRA). However, LRA only focuses on the standard bidirectional (or noncausal) self attention, and completely ignores cross attentions and unidirectional (or causal) attentions, which are equally important to downstream applications. In this paper, we propose Comprehensive Attention Benchmark (CAB) under a fine-grained attention taxonomy with four distinguishable attention patterns, namely, noncausal self, causal self, noncausal cross, and causal cross attentions. CAB collects seven real-world tasks from different research areas to evaluate efficient attentions under the four attention patterns. Among these tasks, CAB validates efficient attentions in eight backbone networks to show their generalization across neural architectures. We conduct exhaustive experiments to benchmark the performances of nine widely-used efficient attention architectures designed with different philosophies on CAB. Extensive experimental results also shed light on the fundamental problems of efficient attentions, such as efficiency length against vanilla attention, performance consistency across attention patterns, the benefit of attention mechanisms, and interpolation/extrapolation on long-context language modeling.
Persistent Identifier	http://hdl.handle.net/10722/333816

DC Field	Value	Language
dc.contributor.author	Zhang, Jun	-
dc.contributor.author	Jiang, Shuyang	-
dc.contributor.author	Feng, Jiangtao	-
dc.contributor.author	Zheng, Lin	-
dc.contributor.author	Kong, Lingpeng	-
dc.date.accessioned	2023-10-06T08:39:18Z	-
dc.date.available	2023-10-06T08:39:18Z	-
dc.date.issued	2023-07-18	-
dc.identifier.uri	http://hdl.handle.net/10722/333816	-
dc.description.abstract	<p>Transformer has achieved remarkable success in language, image, and speech processing. Recently, various efficient attention architectures have been proposed to improve transformer's efficiency while largely preserving its efficacy, especially in modeling long sequences. A widely-used benchmark to test these efficient methods' capability on long-range modeling is Long Range Arena (LRA). However, LRA only focuses on the standard bidirectional (or noncausal) self attention, and completely ignores cross attentions and unidirectional (or causal) attentions, which are equally important to downstream applications. In this paper, we propose Comprehensive Attention Benchmark (CAB) under a fine-grained attention taxonomy with four distinguishable attention patterns, namely, noncausal self, causal self, noncausal cross, and causal cross attentions. CAB collects seven real-world tasks from different research areas to evaluate efficient attentions under the four attention patterns. Among these tasks, CAB validates efficient attentions in eight backbone networks to show their generalization across neural architectures. We conduct exhaustive experiments to benchmark the performances of nine widely-used efficient attention architectures designed with different philosophies on CAB. Extensive experimental results also shed light on the fundamental problems of efficient attentions, such as efficiency length against vanilla attention, performance consistency across attention patterns, the benefit of attention mechanisms, and interpolation/extrapolation on long-context language modeling.<br></p>	-
dc.language	eng	-
dc.relation.ispartof	International Conference on Machine Learning (23/07/2023-29/07/2023, Honolulu, Hawaii)	-
dc.title	CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling	-
dc.type	Conference_Paper	-
dc.identifier.doi	10.48550/arXiv.2210.07661	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: CAB: Comprehensive Attention Benchmarking on Long Sequence Modeling

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats