cs224d-lecture14 Tree-RNN and Constituency Parsing

• 语言的语义解释

• 如果将短语结构映射到向量空间中：利用语义的合成性

• 对比 RNN 和 CNN

• Recursive neural networks

• Parsing a sentence with an RNN

• 使用tree-rnn 进行分类： assignment3 情感分类

语言的语义解释–并不只是词向量

the country of my birth, the place where I was born

Question: how can we represent the meaning of longer phrases?

Answer: By mapping them into the same vector space.

How should we map phrases into a vector space?

利用语义的合成性： use principle of Compositionality

• the meanings of its words

• the rules that combine them

问题是：我们真的需要学习这种树结构吗？

Do we really need to learn this structure?

1. Recursive vs. RNN

Richard mentioned that the recurrent models are really sort of capturing representations of whole prefixes and you’re not getting any representations of smaller units than that.

1. 语言本质是递归的吗？

[The man from [the company that you spoke with about [the project] yesterday]]

1、通过递归地描述句子（句法树），可以有效地消歧：

2、便于指代相消等任务。

3、便于利用语法树结构（基于短语的机器翻译）

从 RNNs 到 CNNs

RNN只会为满足语法的短语计算向量，而CNN为每个可能的短语计算向量。从语言学和认知科学的角度来讲，CNN并不合理。甚至recurrent neural network也比tree model和CNN更合理。

• So the sort of picture is that for the CNN, you’re sort of making a representation of every pair of words, every triple of words, every four words.
• Where as the tree recursive neural network is saying well some of those representations don’t correspond to a phrase and so we’re gonna delete them out. So that for the convolultional neural network, you have a representation for every bigram. So you have a representation for there speak and trigram there speak slowly. Whereas for the recursive neural network, you only have representations for the sort of semantically meaningful phrases like people there and speaks slowly going together to give a representation for the whole sentence.

Recursive Neural Networks for Structure Prediction

• 输入： 两个子节点的向量表示

• 输出： 两个子节点合并后的新节点语义表示，以及新节点成立的分值

Recursive Neural Network Definition

$$h=relu([h^{(1)}{left},h^{(1)}{right}]W+b^{(1)})$$

$$\hat y = softmax(h^{(1)}U+b^{(s)})$$

$L\in R^{|V|\times d},W^{(1)}\in R^{2d\times d}, b^{(1)}\in R^{1\times d}, U\in R^{(d\times 5)}, b^{(s)}\in R^{1\times 5}$

HMM和RNN是什么关系？功效上两者有冲突重叠？

Parsing a sentence with an RNN

greedily incrementally building up parse structure.

Max-Margin Framework-Details

$$L_i=\sum_{j\ne y_j}^N max(0, s_j-(s_{y_i}-\Delta))$$

$$J=\sum_imax(0, s(x_i,y_j)-max_{y\in A(x_i)}(s(x_i,y)+\Delta(y,y_i)))$$

• $\Delta$ 表示对所有非正确分类的惩罚

• max 表示贪心搜索得到的syntactic tree的得分

• 有时候也可用beam search

使用 tree-RNN 进行分类任务

Richard Socher 的代码比如softmax之类的可真熟练～

ok！完全弄懂了吧？！～

Presentation

[Deep reinforcement learning for dialogue generation]

Xie Pan

2018-05-16

2021-06-29