File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Cascaded Head-colliding Attention

TitleCascaded Head-colliding Attention
Authors
Issue Date2021
PublisherAssociation for Computational Linguistics.
Citation
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), Virtual Meeting, Bangkok, Thailand, 1-6 August 2021, v. 1: Long Papers, p. 536-549 How to Cite?
AbstractTransformers have advanced the field of natural language processing (NLP) on a variety of important tasks. At the cornerstone of the Transformer architecture is the multi-head attention (MHA) mechanism which models pairwise interactions between the elements of the sequence. Despite its massive success, the current framework ignores interactions among different heads, leading to the problem that many of the heads are redundant in practice, which greatly wastes the capacity of the model. To improve parameter efficiency, we re-formulate the MHA as a latent variable model from a probabilistic perspective. We present cascaded head-colliding attention (CODA) which explicitly models the interactions between attention heads through a hierarchical variational distribution. We conduct extensive experiments and demonstrate that CODA outperforms the transformer baseline, by 0.6 perplexity on Wikitext-103 in language modeling, and by 0.6 BLEU on WMT14 EN-DE in machine translation, due to its improvements on the parameter efficiency.
DescriptionSession 2E: Machine Learning for NLP 1 - Anthology ID: 2021.acl-long.45
Persistent Identifierhttp://hdl.handle.net/10722/304334
ISBN

 

DC FieldValueLanguage
dc.contributor.authorZheng, L-
dc.contributor.authorWu, Z-
dc.contributor.authorKong, L-
dc.date.accessioned2021-09-23T08:58:35Z-
dc.date.available2021-09-23T08:58:35Z-
dc.date.issued2021-
dc.identifier.citationProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), Virtual Meeting, Bangkok, Thailand, 1-6 August 2021, v. 1: Long Papers, p. 536-549-
dc.identifier.isbn9781954085527-
dc.identifier.urihttp://hdl.handle.net/10722/304334-
dc.descriptionSession 2E: Machine Learning for NLP 1 - Anthology ID: 2021.acl-long.45-
dc.description.abstractTransformers have advanced the field of natural language processing (NLP) on a variety of important tasks. At the cornerstone of the Transformer architecture is the multi-head attention (MHA) mechanism which models pairwise interactions between the elements of the sequence. Despite its massive success, the current framework ignores interactions among different heads, leading to the problem that many of the heads are redundant in practice, which greatly wastes the capacity of the model. To improve parameter efficiency, we re-formulate the MHA as a latent variable model from a probabilistic perspective. We present cascaded head-colliding attention (CODA) which explicitly models the interactions between attention heads through a hierarchical variational distribution. We conduct extensive experiments and demonstrate that CODA outperforms the transformer baseline, by 0.6 perplexity on Wikitext-103 in language modeling, and by 0.6 BLEU on WMT14 EN-DE in machine translation, due to its improvements on the parameter efficiency.-
dc.languageeng-
dc.publisherAssociation for Computational Linguistics.-
dc.relation.ispartofProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.titleCascaded Head-colliding Attention-
dc.typeConference_Paper-
dc.identifier.emailKong, L: lpk@cs.hku.hk-
dc.identifier.authorityKong, L=rp02775-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.18653/v1/2021.acl-long.45-
dc.identifier.hkuros324950-
dc.identifier.volume1-
dc.identifier.spage536-
dc.identifier.epage549-
dc.publisher.placeStroudsburg, PA, USA-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats