迁移学习系列 1-Neural Transfer Learning for NLP

迁移学习与监督学习的区别

training domain 和 target domain 不一致时,需要知识迁移。

那么暂时的问题来了?
- 1.如何界定 domain 的范围,尤其是NLP领域。从医学文本能迁移到科幻小说吗,感觉不可以。。
- 2.从 big domain 到 small domain 的迁移也是属于迁移学习的范畴吧?比如像 BERT 这样在超大的训练集上进行 training,然后在小的子集上 fine-tune,都能表现的很好是吗?

Why transfer learning

目前的监督模型依旧非常脆弱,Jia and Liang, EMNLP 2017 这篇 paper 证明了目前的 SOTA 的模型对对抗样本非常敏感。

迁移学习能解决这个问题吗,疑惑??

Abstract:
Standard accuracy metrics indicate that reading comprehension systems are making rapid progress, but the extent to which these systems truly understand language remains unclear. To reward systems with real language understanding abilities, we propose an adversarial evaluation scheme for the Stanford Question Answering Dataset (SQuAD). Our method tests whether systems can answer questions about paragraphs that contain adversarially inserted sentences, which are automatically generated to distract computer systems without changing the correct answer or misleading humans. In this adversarial setting, the accuracy of sixteen published models drops from an average of 75% F1 score to 36%; when the adversary is allowed to add ungrammatical sequences of words, average accuracy on four models decreases further to 7%. We hope our insights will motivate the development of new models that understand language more precisely.

Synthetic and Natural Noise Both Break Neural Machine Translation, Belinkov and Bisk (ICLR 2018) 这篇 paper 中提到基于字符级别的翻译模型能有效解决 OOV 等问题,但是却使得模型对 noise 非常敏感且脆弱。如果出现 phonetic 拼写错误,omission 省略, key swap 关键字母交换,都会导致 BLEU 值严重下降。

Abstract
Character-based neural machine translation (NMT) models alleviate out-of-vocabulary issues, learn morphology, and move us closer to completely end-to-end translation systems. Unfortunately, they are also very brittle and easily falter when presented with noisy data. In this paper, we confront NMT models with synthetic and natural sources of noise. We find that state-of-the-art models fail to translate even moderately noisy texts that humans have no trouble comprehending. We explore two approaches to increase model robustness: structure-invariant word representations and robust training on noisy texts. We find that a model based on a character convolutional neural network is able to simultaneously learn representations robust to multiple kinds of noise.

Iyyer et al. (NAACL 2018) 这篇 paper 提出了一个句法规则控制下的释义生成模型,syntactically controlled paraphrase networks (SCPNs). 然后发现这样的对抗样本很容易愚弄训练好的监督模型。

Abstract:
We propose syntactically controlled paraphrase networks (SCPNs) and use them to generate adversarial examples. Given a sentence and a target syntactic form (e.g., a constituency parse), SCPNs are trained to produce a paraphrase of the sentence with the desired syntax. We show it is possible to create training data for this task by first doing backtranslation at a very large scale, and then using a parser to label the syntactic transformations that naturally occur during this process. Such data allows us to train a neural encoderdecoder model with extra inputs to specify the target syntax. A combination of automated and human evaluations show that SCPNs generate paraphrases that follow their target specifications without decreasing paraphrase quality when compared to baseline (uncontrolled)paraphrase systems. Furthermore, they are more capable of generating syntactically adversarial examples that both (1) “fool” pretrained models and (2) improve the robustness of these models to syntactic variation when used to augment their training data

人工标注所有 domain 或者任何语言的数据是不可理的,因此需要 transfering knowledge from a related setting to the target setting.

NLP 很多重大的基础性的研究都可以看作是迁移学习的一种形式。
- LSA
- Brown clusters
- word embedding

已有工作的局限性:
- 限制度太高:预设定好的相似度指标,hard 参数共享
- 条件设定太过于具体:单一的 task
- baseline 太弱:缺少与传统方法的对比
- 模型脆弱:在 out-of-domain 不work,依赖于相似的语言/任务
- 效率低:需要大量参数,时间和样本

研究目标

  • 迁移学习
  • 传导式迁移学习(相同的任务,只有sourced domain有label)
    • 领域自适应(不同的 domain)
    • 跨语言学习(不同的 language)
  • 归纳式迁移学习(不同的任务, target domain 也有标签)
    • 多任务学习
    • 序列迁移学习

大佬太强了。。。。强到爆炸啊

domain adaptation

Propose two novel methods that bridge the domain discrepancy by selecting relevant and informative data for unsupervised domain adaptation.
提出两方法,替无监督的域适应选择相关的,具有信息量的数据来弥合域之间的差异。

Based on Bayesian Optimisation

Learning to select data for transfer learning with Bayesian Optimization, EMNLP2017

还不太懂 bayesian optimisation:
- https://zhuanlan.zhihu.com/p/29779000
- https://www.jiqizhixin.com/articles/2017-08-18-5
- A Tutorial on Bayesian Optimization

Using semi-supervised learning and multi-task learning

Strong Baselines for Neural Semi-supervised Learning under Domain Shift, Ruder & Plank, ACL 2018

Novel neural models have been proposed in recent years for learning under domain shift. Most models, however, only evaluate on a single task, on proprietary datasets, or compare to weak baselines, which makes comparison of models difficult. In this paper, we re-evaluate classic general-purpose bootstrapping approaches in the context of neural networks under domain shifts vs. recent neural approaches and propose a novel multi-task tri-training method that reduces the time and space complexity of classic tri-training. Extensive experiments on two benchmarks for part-of-speech tagging and sentiment analysis are negative: while our novel method establishes a new state-of-the-art for sentiment analysis, it does not fare consistently the best. More importantly, we arrive at the somewhat surprising conclusion that classic tri-training, with some additions, outperforms the state-of-the-art for NLP. Hence classic approaches constitute an important and strong baseline.

大佬的论文真的难。。太 hardcore 了。。

cross-lingual Learning

On the Limitations of Unsupervised Bilingual Dictionary Induction
A Discriminative Latent-Variable Model for Bilingual Lexicon Induction

multi-task learning

sequential transfer learning