File Download
There are no files associated with this item.
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: Counterfactual Data Augmentation for Neural Machine Translation
Title | Counterfactual Data Augmentation for Neural Machine Translation |
---|---|
Authors | |
Issue Date | 2021 |
Citation | NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 2021, p. 187-197 How to Cite? |
Abstract | We propose a data augmentation method for neural machine translation. It works by interpreting language models and phrasal alignment causally. Specifically, it creates augmented parallel translation corpora by generating (path-specific) counterfactual aligned phrases. We generate these by sampling new source phrases from a masked language model, then sampling an aligned counterfactual target phrase by noting that a translation language model can be interpreted as a Gumbel-Max Structural Causal Model (Oberst and Sontag, 2019). Compared to previous work, our method takes both context and alignment into account to maintain the symmetry between source and target sequences. Experiments on IWSLT’15 English → Vietnamese, WMT’17 English → German, WMT’18 English → Turkish, and WMT’19 robust English → French show that the method can improve the performance of translation, backtranslation and translation robustness. |
Persistent Identifier | http://hdl.handle.net/10722/321947 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Liu, Qi | - |
dc.contributor.author | Kusner, Matt J. | - |
dc.contributor.author | Blunsom, Phil | - |
dc.date.accessioned | 2022-11-03T02:22:33Z | - |
dc.date.available | 2022-11-03T02:22:33Z | - |
dc.date.issued | 2021 | - |
dc.identifier.citation | NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 2021, p. 187-197 | - |
dc.identifier.uri | http://hdl.handle.net/10722/321947 | - |
dc.description.abstract | We propose a data augmentation method for neural machine translation. It works by interpreting language models and phrasal alignment causally. Specifically, it creates augmented parallel translation corpora by generating (path-specific) counterfactual aligned phrases. We generate these by sampling new source phrases from a masked language model, then sampling an aligned counterfactual target phrase by noting that a translation language model can be interpreted as a Gumbel-Max Structural Causal Model (Oberst and Sontag, 2019). Compared to previous work, our method takes both context and alignment into account to maintain the symmetry between source and target sequences. Experiments on IWSLT’15 English → Vietnamese, WMT’17 English → German, WMT’18 English → Turkish, and WMT’19 robust English → French show that the method can improve the performance of translation, backtranslation and translation robustness. | - |
dc.language | eng | - |
dc.relation.ispartof | NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference | - |
dc.title | Counterfactual Data Augmentation for Neural Machine Translation | - |
dc.type | Conference_Paper | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.scopus | eid_2-s2.0-85109906533 | - |
dc.identifier.spage | 187 | - |
dc.identifier.epage | 197 | - |