论文笔记-Using monoligual data in machine transaltion

Monolingual Data in NMT

Why Monolingual data enhancement

  • Large scale source-side data:
    enhancing encoder network to obtain high quality context vector representation of source sentence.

  • Large scale target-side data:
    boosting fluency for machine translation when decoding.

The methods of using monolingual data

Multi-task learning

Target-side language model: Integrating Language Model into the Decoder

shallow fusion

both an NMT model (on parallel corpora) as well as a recurrent neural network language model (RNNLM, on larger monolingual corpora) have been pre-trained separately before being integrated.

Shallow fusion: rescore the probability of the candidate words.

deep fusion

multi-task learning

Using Target-side Monolingual Data for Neural Machine Translation through Multi-task Learning, EMNLP, 2017

利用 target-side 的单语多了一个训练语言模型的任务。事实上(b)就是上一张 PPT 中的方法,这篇paper在这个基础上增加了语言模型的 loss。

\(\sigma\) 参数在两个任务训练时都会更新。而 \(\theta\) 参数仅仅在训练翻译模型时才会更新参数。

auto-encoder

通过 自编码 的形式,重构对应的 mono-data,作为辅助任务,与 NMT 模型共享 encoder 参数。

Semi-Supervised Learning for Neural Machine Translation, ACL, 2016

Back-translation

What is back-translation?

Synthetic pseudo parallel data from target-side monolingual data using a reverse translation model.

why back-translation and motivation?

It mitigates the problem of overfitting and fluency by exploiting additional data in the target language.

目标语言必须始终是真实句子才能让翻译模型翻译的结果更流畅、更准确,而源语言即便有少量用词不当、语序不对、语法错误,只要不影响理解就无所谓。其实人做翻译的时候也是一样的:翻译质量取决于一个人译出语言的水平,而不是源语言的水平(源语言的水平只要足够看懂句子即可)

Different aspects of the BT which influence the performance of translation:

  • Size of the Synthetic Data
  • Direction of Back-Translation
  • Quality of the Synthetic Data

Size of the Synthetic Data

Direction of Back-Translation

Quality of the Synthetic Data

copy mechanism

作者的实验设置:用 target-side mono-data 来构建伪平行语料,一部分是直接 copy,另一部分是通过 back-translate 得到的。也就是 mono-data 出现了两次。

总觉得哪里不对。。。

Dummy source sentence

Pseudo parallel data: + target-side mono-data

The downside:

the network ‘unlearns’ its conditioning on the source context if the ratio of monolingual training instances is too high.

Improving Neural Machine Translation Models with Monolingual Data, Sennrich et al, ACL 2016

Self-learning

Synthetic target sentences from source-side mono-data:

  • Build a baseline machine translation (MT) system on parallel data
  • Translate source-side mono-data into target sentences
  • Real parallel data + pseudo parallel data

reference

  1. Improving Neural Machine Translation Models with Monolingual Data, Sennrich et al, ACL 2016
  2. Using Monolingual Data in Neural Machine Translation: a Systematic Study, Burlot et al. ACL 2018
  3. Copied Monolingual Data Improves Low-Resource Neural Machine Translation, Currey et al. 2017 In Proceedings of the Second Conference on Machine Translation
  4. Semi-Supervised Learning for Neural Machine Translation, Cheng et al. ACL 2016
  5. Exploiting Source-side Monolingual Data in Neural Machine Translation, Zhang et al. EMNLP 2016
  6. Using Target-side Monolingual Data for Neural Machine Translation through Multi-task Learning, Domhan et al. EMNLP 2018 On Using Monolingual Corpora in Neural Machine Translation, Gulcehre, 2015
  7. Back-Translation Sampling by Targeting Difficult Words in Neural Machine Translation, EMNLP 2018
  8. Understanding Back-Translation at Scale, Edunov et al. EMNLP 2018