chapter13 Lexicalized PCFG

  • Lexicalized PCFGs
  • Lexicalization of a treebank: 给treebank添加head
  • Lexicalized probabilistic context-free Grammars: Lexicalized PCFGs中的参数,也就是rules的概率
  • Parsing with lexicalized PCFGs:使用动态规划CKY对测试集中的sentences寻找最优parse tree
  • Parameter estimation in lexicalized probabilistic context-free grammars: 通过训练集,也就是语料库corpus得到Lexicalized PCFG的参数
  • Accuracy of lexicalized PCFGs:测试集的准确率计算
  • 相关研究

Lexicalized PCFGs 词汇化PCFGs

Lexicalization of a treebank

给每个rules添加一个head,前面也介绍过~不过么看懂,这里又一次讲到了。

关于怎么找到这个head,一个rule中最中心的部分~a core idea in syntax

  • the central sub-constituent of each rule
  • the sementic predicate in each rule

在语料库中是没有标注出head的,那么需要一些规则来人为选定head。

以NP为例,right-most

以VP为例, left-most

添加head之后有啥用呢? make the PCFG more sensitive to lexical information.

propagate lexical items bottom-up throught these threes using head annotations where each non-terminal receives its head from its head child.

Lexicalized probabilistic context-free Grammars

  1. Chomsky Normal Form

突然发现自己不看字母也能听懂 Michael Collins 的英语了~发音真的好听而且清晰!!还有自己截图右下角都有个预览的框框。。因为鼠标截图时不得不回碰到youtobe预览的进度条。。。

  1. Lexicalized context-free grammars in chomsky normal form

跟regular CFG很相似,但多了head words

  1. Parameters in a Lexicalized PCFG

在PCFG中,是 \(S\rightarrow\)

在 Lexicalized PCFG中, 是 \(S(saw)\rightarrow\)

因此参数多了很多很多,比如 \(S(saw)\rightarrow_2 NP(man)\ VP(saw)\), \(S(saw)\rightarrow_2 NP(women)\ VP(saw)\) 这都有对应的概率~we heve every possible lexicalized rule, 同时smoothing技术在这会直接用到~

Parsing with lexicalized PCFGs

对于一个rule grammar,其non-terminal展开是 \(|\Sigma|^2\times |N|^3\)

对于一个长度为n的sentence,使用dynamic programming算法需要的计算复杂度为 \(O(n^3|\Sigma|^2|N|^3)\),但是 词典 \(|\Sigma|\) 可能会巨大!!!

但可以这么简化,就是不用考虑整个词典 \(|\Sigma|\), 只需要考虑在sentence出现过的词,也就是 n 个。

那么总的计算复杂度为 \(O(n^5|N|^3)\).

棒啊!~

Parameter estimation in lexicalized probabilistic context-free grammars

  1. 为什么 Lexicalized PCFGs 要好于 PCFG?

使用prepositional phrase ambiguity举例说明:

rules:

7个non-terminal: \[S(saw)\rightarrow_2 NP(man)\ VP(saw)\] \[NP(man)\rightarrow_2 DT(the)\ NN(man)\] \[VP(saw)\rightarrow_1 VP(saw)\ PP(with)\tag{~}\] \[VP(saw)\rightarrow_1 Vt(saw)\ NP(dog)\tag{~}\] \[NP(dog)\rightarrow_2 DT(the)\ NN(dog)\] \[PP(with)\rightarrow_1 IN(with)\ NP(telescope)\] \[NP(telescope)\rightarrow_2 DT(with)\ NN(telescope)\] 9个terminal: \[\cdots\]

7个non-terminal: \[S(saw)\rightarrow_2 NP(man)\ VP(saw)\] \[NP(man)\rightarrow_2 DT(the)\ NN(man)\] \[VP(saw)\rightarrow_1 VP(saw)\ PP(dog)\tag{~}\] \[NP(dog)\rightarrow_1 NP(dog)\ PP(with)\tag{~}\] \[NP(dog)\rightarrow_2 DT(the)\ NN(dog)\] \[PP(with)\rightarrow_1 IN(with)\ NP(telescope)\] \[NP(telescope)\rightarrow_2 DT(with)\ NN(telescope)\] 9个terminal: \[\cdots\]

两者的区别在于(~)的概率大小。如果没有lexicalized,parse tree的选择就仅仅只取决于短语结构,而完全忽视了语义的信息。

  1. 如何计算参数

\(q(S(saw)\rightarrow_2 NP(man)\ VP(saw))\) 是条件概率,可以看做是已知S,saw,从CFG语法中选出 \(S\rightarrow_2 NP\ VP\), 并且从NP的 word 中选出 man 的概率. \[q(S(saw)\rightarrow_2 NP(man)\ VP(saw))=q(S\rightarrow_2 NP\ VP|S,saw) \times q(man|S\rightarrow_2NP\ VP,saw)\] 用极大似然估计可得:

  • 第一项: \(q(S\rightarrow_2 NP\ VP|S,saw)=\dfrac{count(S(saw)\rightarrow_2 NP\ VP)}{count(S(saw))}\)

  • 第二项: \(q(man|S\rightarrow_2NP\ VP,saw)=\dfrac{count(S(saw)\rightarrow_2 NP(man)\ VP(saw))}{count(S(saw)\rightarrow_2 NP\ VP,saw)}\)

用线性插值法估计参数: \[\lambda_1+\lambda_2=1\] \[q_{ML}(S\rightarrow_2NP\ VP|S,saw)=\dfrac{count(S(saw)\rightarrow_2 NP\ VP)}{count(S(saw))}\] \[q_{ML}(S\rightarrow_2NP\ VP|S) = \dfrac{count(S\rightarrow_2 NP\ VP)}{count(S)}\]

依然还有很多需要解决的问题:

  • deal with rules with more child
  • incorporate parts of speech(useful in smoothing)
  • encode preferences for close attachment

further reading: MIchael Collins.2003. Head-Driven Statistical Models for Natural Language Parsing.

Accuracy of lexicalized PCFGs

gold standard and parsing output:

Result

介绍了一些相关的研究~

reference: