Cascaded Head-colliding Attention

Zheng, L; Wu, Z; Kong, L

File Download

Content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.18653/v1/2021.acl-long.45

Supplementary

Citations:
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Cascaded Head-colliding Attention

Title	Cascaded Head-colliding Attention
Authors	Zheng, L Wu, Z Kong, L
Issue Date	2021
Publisher	Association for Computational Linguistics.
Citation	Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), Virtual Meeting, Bangkok, Thailand, 1-6 August 2021, v. 1: Long Papers, p. 536-549 How to Cite? DOI: http://dx.doi.org/10.18653/v1/2021.acl-long.45
Abstract	Transformers have advanced the field of natural language processing (NLP) on a variety of important tasks. At the cornerstone of the Transformer architecture is the multi-head attention (MHA) mechanism which models pairwise interactions between the elements of the sequence. Despite its massive success, the current framework ignores interactions among different heads, leading to the problem that many of the heads are redundant in practice, which greatly wastes the capacity of the model. To improve parameter efficiency, we re-formulate the MHA as a latent variable model from a probabilistic perspective. We present cascaded head-colliding attention (CODA) which explicitly models the interactions between attention heads through a hierarchical variational distribution. We conduct extensive experiments and demonstrate that CODA outperforms the transformer baseline, by 0.6 perplexity on Wikitext-103 in language modeling, and by 0.6 BLEU on WMT14 EN-DE in machine translation, due to its improvements on the parameter efficiency.
Description	Session 2E: Machine Learning for NLP 1 - Anthology ID: 2021.acl-long.45
Persistent Identifier	http://hdl.handle.net/10722/304334
ISBN	9781954085527

DC Field	Value	Language
dc.contributor.author	Zheng, L	-
dc.contributor.author	Wu, Z	-
dc.contributor.author	Kong, L	-
dc.date.accessioned	2021-09-23T08:58:35Z	-
dc.date.available	2021-09-23T08:58:35Z	-
dc.date.issued	2021	-
dc.identifier.citation	Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), Virtual Meeting, Bangkok, Thailand, 1-6 August 2021, v. 1: Long Papers, p. 536-549	-
dc.identifier.isbn	9781954085527	-
dc.identifier.uri	http://hdl.handle.net/10722/304334	-
dc.description	Session 2E: Machine Learning for NLP 1 - Anthology ID: 2021.acl-long.45	-
dc.description.abstract	Transformers have advanced the field of natural language processing (NLP) on a variety of important tasks. At the cornerstone of the Transformer architecture is the multi-head attention (MHA) mechanism which models pairwise interactions between the elements of the sequence. Despite its massive success, the current framework ignores interactions among different heads, leading to the problem that many of the heads are redundant in practice, which greatly wastes the capacity of the model. To improve parameter efficiency, we re-formulate the MHA as a latent variable model from a probabilistic perspective. We present cascaded head-colliding attention (CODA) which explicitly models the interactions between attention heads through a hierarchical variational distribution. We conduct extensive experiments and demonstrate that CODA outperforms the transformer baseline, by 0.6 perplexity on Wikitext-103 in language modeling, and by 0.6 BLEU on WMT14 EN-DE in machine translation, due to its improvements on the parameter efficiency.	-
dc.language	eng	-
dc.publisher	Association for Computational Linguistics.	-
dc.relation.ispartof	Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.title	Cascaded Head-colliding Attention	-
dc.type	Conference_Paper	-
dc.identifier.email	Kong, L: lpk@cs.hku.hk	-
dc.identifier.authority	Kong, L=rp02775	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.18653/v1/2021.acl-long.45	-
dc.identifier.hkuros	324950	-
dc.identifier.volume	1	-
dc.identifier.spage	536	-
dc.identifier.epage	549	-
dc.publisher.place	Stroudsburg, PA, USA	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Cascaded Head-colliding Attention

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats