chapter13 Lexicalized PCFG
Lexicalized PCFGs
Lexicalization of a treebank: 给treebank添加head
Lexicalized probabilistic context-free Grammars: Lexicalized PCFGs中的参数,也就是rules的概率
Parsing with lexicalized PCFGs:使用动态规划CKY对测试集中的sentences寻找最优parse tree
Parameter estimation in lexicalized probabilistic context-free grammars: 通过训练集,也就是语料库corpus得到Lexicalized PCFG的参数
Accuracy of lexicalized PCFGs:测试集的准确率计算
相关研究
Lexicalized PCFGs 词汇化PCFGs
Lexicalization of a treebank
给每个rules添加一个head,前面也介绍过~不过么看懂,这里又一次讲到了。
关于怎么找到这个head,一个rule中最中心的部分~a core idea in syntax
the central sub-constituent of each rule
the sementic predicate in each rule
在语料库中是没有标注出head的,那么需要一些规则来人为选定head。
以NP为例,right-most
以VP为例, left-most
添加head之后有啥用呢? make the PCFG more sensitive to lexical information.
propagate lexical items bottom-up throught these threes using head annotations where each non-terminal receives its head from its head child.
Lexicalized probabilistic context-free Grammars
- Chomsky Normal Form
突然发现自己不看字母也能听懂 Michael Collins 的英语了~发音真的好听而且清晰!!还有自己截图右下角都有个预览的框框。。因为鼠标截图时不得不回碰到youtobe预览的进度条。。。
- Lexicalized context-free grammars in chomsky normal form
跟regular CFG很相似,但多了head words
- Parameters in a Lexicalized PCFG
在PCFG中,是 $S\rightarrow$
在 Lexicalized PCFG中, 是 $S(saw)\rightarrow$
因此参数多了很多很多,比如 $S(saw)\rightarrow_2 NP(man)\ VP(saw)$, $S(saw)\rightarrow_2 NP(women)\ VP(saw)$ 这都有对应的概率~we heve every possible lexicalized rule, 同时smoothing技术在这会直接用到~
Parsing with lexicalized PCFGs
对于一个rule grammar,其non-terminal展开是 $|\Sigma|^2\times |N|^3$
对于一个长度为n的sentence,使用dynamic programming算法需要的计算复杂度为 $O(n^3|\Sigma|^2|N|^3)$,但是 词典 $|\Sigma|$ 可能会巨大!!!
但可以这么简化,就是不用考虑整个词典 $|\Sigma|$, 只需要考虑在sentence出现过的词,也就是 n 个。
那么总的计算复杂度为 $O(n^5|N|^3)$.
棒啊!~
Parameter estimation in lexicalized probabilistic context-free grammars
- 为什么 Lexicalized PCFGs 要好于 PCFG?
使用prepositional phrase ambiguity举例说明:
rules:
7个non-terminal:
$$S(saw)\rightarrow_2 NP(man)\ VP(saw)$$
$$NP(man)\rightarrow_2 DT(the)\ NN(man)$$
$$VP(saw)\rightarrow_1 VP(saw)\ PP(with)\tag{~}$$
$$VP(saw)\rightarrow_1 Vt(saw)\ NP(dog)\tag{~}$$
$$NP(dog)\rightarrow_2 DT(the)\ NN(dog)$$
$$PP(with)\rightarrow_1 IN(with)\ NP(telescope)$$
$$NP(telescope)\rightarrow_2 DT(with)\ NN(telescope)$$
9个terminal:
$$\cdots$$
7个non-terminal:
$$S(saw)\rightarrow_2 NP(man)\ VP(saw)$$
$$NP(man)\rightarrow_2 DT(the)\ NN(man)$$
$$VP(saw)\rightarrow_1 VP(saw)\ PP(dog)\tag{~}$$
$$NP(dog)\rightarrow_1 NP(dog)\ PP(with)\tag{~}$$
$$NP(dog)\rightarrow_2 DT(the)\ NN(dog)$$
$$PP(with)\rightarrow_1 IN(with)\ NP(telescope)$$
$$NP(telescope)\rightarrow_2 DT(with)\ NN(telescope)$$
9个terminal:
$$\cdots$$
两者的区别在于(~)的概率大小。如果没有lexicalized,parse tree的选择就仅仅只取决于短语结构,而完全忽视了语义的信息。
- 如何计算参数
$q(S(saw)\rightarrow_2 NP(man)\ VP(saw))$ 是条件概率,可以看做是已知S,saw,从CFG语法中选出 $S\rightarrow_2 NP\ VP$, 并且从NP的 word 中选出 man 的概率.
$$q(S(saw)\rightarrow_2 NP(man)\ VP(saw))=q(S\rightarrow_2 NP\ VP|S,saw) \times q(man|S\rightarrow_2NP\ VP,saw)$$
用极大似然估计可得:
- 第一项: $q(S\rightarrow_2 NP\ VP|S,saw)=\dfrac{count(S(saw)\rightarrow_2 NP\ VP)}{count(S(saw))}$
- 第二项: $q(man|S\rightarrow_2NP\ VP,saw)=\dfrac{count(S(saw)\rightarrow_2 NP(man)\ VP(saw))}{count(S(saw)\rightarrow_2 NP\ VP,saw)}$
用线性插值法估计参数:
$$\lambda_1+\lambda_2=1$$
$$q_{ML}(S\rightarrow_2NP\ VP|S,saw)=\dfrac{count(S(saw)\rightarrow_2 NP\ VP)}{count(S(saw))}$$
$$q_{ML}(S\rightarrow_2NP\ VP|S) = \dfrac{count(S\rightarrow_2 NP\ VP)}{count(S)}$$
依然还有很多需要解决的问题:
deal with rules with more child
incorporate parts of speech(useful in smoothing)
encode preferences for close attachment
further reading:
MIchael Collins.2003. Head-Driven Statistical Models for Natural Language Parsing.
Accuracy of lexicalized PCFGs
gold standard and parsing output:
Result
介绍了一些相关的研究~
reference:
chapter13 Lexicalized PCFG
http://www.panxiaoxie.cn/2018/04/22/chapter13-Lexicalized-PCFG/