![Page 1: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/1.jpg)
复旦大学大数据学院School of Data Science, Fudan University
DATA130006 Text Management and Analysis
Statistical Natural Language Parsing魏忠钰
November29th,2017
Adapted from Stanford CS124U
![Page 2: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/2.jpg)
NLP Startups
![Page 3: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/3.jpg)
NLP Startups
![Page 4: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/4.jpg)
![Page 5: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/5.jpg)
![Page 6: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/6.jpg)
Constituency (phrase structure)
§ Phrasestructureorganizeswordsintonestedconstituents.
![Page 7: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/7.jpg)
Parsing Tree Example
![Page 8: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/8.jpg)
Attachment ambiguities
§ Akeyparsingdecisionishowwe‘attach’variousconstituents§ PPs,adverbialorparticipialphrases,infinitives,coordinations,etc.
§ Catalan numbers: Cn = (2n)!/[(n+1)!n!]
§ An exponentially growing series, which arises in many tree-like contexts
![Page 9: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/9.jpg)
Two problems to solve: 1. Repeated work…
![Page 10: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/10.jpg)
Two problems to solve: 2. Choosing the correct parse
§Wordsaregoodpredictorsofattachment
§ Moscowsentmorethan100,000soldiersintoAfghanistan…
§ SydneyWaterbreachedanagreementwithNSWHealth…
§Ourstatisticalparserswilltrytoexploitsuchstatistics.
![Page 11: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/11.jpg)
context-free grammars (CFGs)
§G=(T,N,S,R)§ Tisasetofterminalsymbols§ Nisasetofnonterminalsymbols§ Sisthestartsymbol(S∈ N)§ Risasetofrules/productionsoftheformX® g
§ X∈ Nandg∈ (N∪ T)*
§ AgrammarGgeneratesalanguageL.
![Page 12: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/12.jpg)
A phrase structure grammar
S® NPVPVP® VNPVP® VNPPPNP® NPNPNP® NPPPNP® NNP® ePP® PNP
peoplefishtankspeoplefishwithrods
N® peopleN® fishN® tanksN® rodsV® peopleV® fishV® tanksP® with
![Page 13: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/13.jpg)
Many parses
![Page 14: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/14.jpg)
Probabilistic – or stochastic – context-free grammars (PCFGs)
• G=(T,N,S,R,P)• Tisasetofterminalsymbols• Nisasetofnonterminalsymbols• Sisthestartsymbol(S∈ N)• Risasetofrules/productionsoftheformX® g• Pisaprobabilityfunction
• P:R® [0,1]•
• AgrammarGgeneratesalanguagemodelL.
€
∀X ∈ N, P(X →γ) =1X→γ ∈R∑
€
P(γ) =1γ ∈T *∑
![Page 15: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/15.jpg)
A PCFG
S® NPVP 1.0VP® VNP 0.6VP® VNPPP 0.4NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks 0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
![Page 16: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/16.jpg)
The rise of annotated data: The Penn Treebank((S(NP-SBJ(DTThe)(NNmove))(VP(VBDfollowed)(NP(NP(DTa)(NNround))(PP(INof)(NP(NP(JJsimilar)(NNSincreases))(PP(INby)(NP(JJother)(NNSlenders)))(PP(INagainst)(NP(NNPArizona)(JJreal)(NNestate)(NNSloans))))))
(,,)(S-ADV(NP-SBJ(-NONE- *))(VP(VBGreflecting)(NP(NP(DTa)(VBGcontinuing)(NNdecline))(PP-LOC(INin)(NP(DTthat)(NNmarket)))))))
(..)))
![Page 17: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/17.jpg)
The probability of trees and strings
§ P(t)– Theprobabilityofatreet istheproductoftheprobabilitiesoftherulesusedtogenerateit.
§ P(s)– Theprobabilityofthestrings isthesumoftheprobabilitiesofthetreeswhichhavethatstringastheiryield
P(s)=Σj P(s,t)wheret isaparseofs
=Σj P(t)
![Page 18: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/18.jpg)
P(t1)=1.0× 0.7× 0.4× 0.5× 0.6× 0.7× 1.0× 0.2× 1.0× 0.7× 0.1
=0.0008232
![Page 19: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/19.jpg)
P(t2)=1.0× 0.7× 0.6× 0.5× 0.6× 0.2× 0.7× 1.0× 0.2× 1.0× 0.7× 0.1
=0.00024696
![Page 20: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/20.jpg)
Tree and String Probabilities
• s =peoplefishtankswithrods• P(t1)=1.0× 0.7× 0.4× 0.5× 0.6× 0.7
× 1.0× 0.2× 1.0× 0.7× 0.1=0.0008232
• P(t2)=1.0× 0.7× 0.6× 0.5× 0.6× 0.2× 0.7× 1.0× 0.2× 1.0× 0.7× 0.1
=0.00024696• P(s)=P(t1)+P(t2)
=0.0008232 +0.00024696=0.00107016
Verb attach
Noun attach
![Page 21: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/21.jpg)
• A PP is more likely attached to a VP compared to NP.
![Page 22: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/22.jpg)
Outline
§Restricting the grammar form for efficient parsing
![Page 23: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/23.jpg)
Chomsky Normal Form
§ AllrulesareoftheformX® YZorX® w§ X,Y,Z∈ Nandw∈ T
§ Atransformationtothisformdoesn’tchangethegenerativecapacityofaCFG§ Thatis,itrecognizesthesamelanguage
§ Butmaybewithdifferenttrees
§ Emptiesandun-aries areremovedrecursively§ n-ary rulesaredividedbyintroducingnewnon-terminals(n>2)
![Page 24: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/24.jpg)
A phrase structure grammar
S® NPVPVP® VNPVP® VNPPPNP® NPNPNP® NPPPNP® NNP® ePP® PNP
S® VPVP® V
N® peopleN® fishN® tanksN® rodsV® peopleV® fishV® tanksP® with
![Page 25: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/25.jpg)
Chomsky Normal Form steps
S® NPVPS® VPVP® VNPVP® VVP® VNPPPVP® VPPNP® NPNPNP® NPNP® NPPPNP® PPNP® NPP® PNPPP® P
N® peopleN® fishN® tanksN® rodsV® peopleV® fishV® tanksP® with
S® VNPS® V
![Page 26: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/26.jpg)
Chomsky Normal Form steps
S® NPVP
VP® VNP
S® VNP
VP® V
S® V
VP® VNPPP
S® VNPPP
VP® VPP
S® VPP
NP® NPNP
NP® NP
NP® NPPP
NP® PP
NP® N
PP® PNP
PP® P
N® peopleN® fishN® tanksN® rodsV® peopleV® fishV® tanksP® with
![Page 27: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/27.jpg)
Chomsky Normal Form steps
S® NPVP
VP® VNP
S® VNP
VP® V
VP® VNPPP
S® VNPPP
VP® VPP
S® VPP
NP® NPNP
NP® NP
NP® NPPP
NP® PP
NP® N
PP® PNP
PP® P
N® peopleN® fishN® tanksN® rodsV® peopleS® peopleV® fishS® fishV® tanksS® tanksP® with
![Page 28: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/28.jpg)
Chomsky Normal Form steps
S® NPVP
VP® VNP
S® VNP
VP® VNPPP
S® VNPPP
VP® VPP
S® VPP
NP® NPNP
NP® NP
NP® NPPP
NP® PP
NP® N
PP® PNP
PP® P
N® people
N® fish
N® tanks
N® rods
V® people
S® people
VP® people
V® fish
S® fish
VP® fish
V® tanks
S® tanks
VP® tanks
P® with
![Page 29: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/29.jpg)
Chomsky Normal Form steps
S® NPVP
VP® VNP
S® VNP
VP® VNPPP
S® VNPPP
VP® VPP
S® VPP
NP® NPNP
NP® NPPP
NP® PNP
PP® PNP
VP® VX
X® NPPP
X=@VP_V
NP® people
NP® fish
NP® tanks
NP® rods
V® people
S® people
VP® people
V® fish
S® fish
VP® fish
V® tanks
S® tanks
VP® tanks
P® with
PP® with
![Page 30: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/30.jpg)
Chomsky Normal Form steps
S® NPVP
VP® VNP
S® VNP
VP® V@VP_V
@VP_V® NPPP
S® V@S_V
@S_V® NPPP
VP® VPP
S® VPP
NP® NPNP
NP® NPPP
NP® PNP
PP® PNP
NP® people
NP® fish
NP® tanks
NP® rods
V® people
S® people
VP® people
V® fish
S® fish
VP® fish
V® tanks
S® tanks
VP® tanks
P® with
PP® with
![Page 31: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/31.jpg)
A phrase structure grammar
S® NPVPVP® VNPVP® VNPPPNP® NPNPNP® NPPPNP® NNP® ePP® PNP
N® peopleN® fishN® tanksN® rodsV® peopleV® fishV® tanksP® with
![Page 32: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/32.jpg)
Chomsky Normal Form steps
S® NPVP
VP® VNP
S® VNP
VP® V@VP_V
@VP_V® NPPP
S® V@S_V
@S_V® NPPP
VP® VPP
S® VPP
NP® NPNP
NP® NPPP
NP® PNP
PP® PNP
NP® people
NP® fish
NP® tanks
NP® rods
V® people
S® people
VP® people
V® fish
S® fish
VP® fish
V® tanks
S® tanks
VP® tanks
P® with
PP® with
![Page 33: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/33.jpg)
Chomsky Normal Form
§Withsomeextrabook-keepinginsymbolnames,youcanevenreconstructthesametreeswithade-transform
§ Youshouldthinkofthisasatransformationforefficientparsing
§ InpracticefullChomskyNormalFormisapain§ Reconstructingn-aries iseasy§ Reconstructingunaries/emptiesistrickier
§Binarization iscrucialforcubictimeCFGparsing
![Page 34: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/34.jpg)
An example: before binarization…
ROOT
S
NP VP
N
people
V NP PP
P NP
rodswithtanksfish
N N
![Page 35: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/35.jpg)
After binarization…
P NP
rods
N
with
NP
N
people tanksfish
N
VP
V NP PP
@VP_V
ROOT
S
![Page 36: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/36.jpg)
Treebank: empties and unaries
ROOT
S-HLN
NP-SUBJ VP
VB-NONE-
e Atone
PTB Tree
ROOT
S
NP VP
VB-NONE-
e Atone
NoFuncTags
ROOT
S
VP
VB
Atone
NoEmpties
ROOT
S
Atone
NoUnaries
ROOT
VB
Atone
High Low
![Page 37: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/37.jpg)
Outline
§Restricting the grammar form for efficient parsing
§ Exact polynomial time parsing of (P)CFGs
![Page 38: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/38.jpg)
Constituency Parsing
fishpeoplefishtanks
RuleProb θiS® NPVP θ0NP® NPNP θ1…
N® fish θ42N® people θ43V® fish θ44…
PCFG
N N V N
VP
NPNP
S
![Page 39: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/39.jpg)
Cocke-Kasami-Younger (CKY) Constituency Parsing
fishpeoplefishtanks
![Page 40: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/40.jpg)
Viterbi (Max) Scores
peoplefish
NP 0.35V 0.1N 0.5
VP 0.06NP 0.14V 0.6N 0.2
S® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
S -> NPVP=0.35*0.06*0.9VP -> V NP = 0.1 *0.5 *0.14NP -> NPNP = 0.1 *0.35 *0.14
![Page 41: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/41.jpg)
Extended CKY parsing
§Unaries canbeincorporatedintothealgorithm§ Messy,butdoesn’tincreasealgorithmiccomplexity
§ Emptiescanbeincorporated§ Doesn’tincreasecomplexity;essentiallylikeu-naries
§ Binarizationisvital§ Withoutbinarization,youdon’tgetparsingcubicinthelengthofthesentenceandinthenumberofnon-terminalsinthegrammar
![Page 42: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/42.jpg)
The CKY algorithm (1960/1965) extended to unaries
function CKY(words, grammar) returns [most_probable_parse,prob]score = new double[#(words)+1][#(words)+1][#(nonterms)]back = new Pair[#(words)+1][#(words)+1][#nonterms]]for i=0; i<#(words); i++for A in nontermsif A -> words[i] in grammarscore[i][i+1][A] = P(A -> words[i])
//handle unariesboolean added = truewhile added added = falsefor A, B in nontermsif score[i][i+1][B] > 0 && A->B in grammarprob = P(A->B)*score[i][i+1][B]if prob > score[i][i+1][A]score[i][i+1][A] = probback[i][i+1][A] = Badded = true
![Page 43: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/43.jpg)
The CKY algorithm (1960/1965) extended to unaries
for span = 2 to #(words)for begin = 0 to #(words)- spanend = begin + spanfor split = begin+1 to end-1for A,B,C in nonterms
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if prob > score[begin][end][A]score[begin]end][A] = probback[begin][end][A] = new Triple(split,B,C)
//handle unariesboolean added = truewhile addedadded = falsefor A, B in nontermsprob = P(A->B)*score[begin][end][B];if prob > score[begin][end][A]score[begin][end][A] = probback[begin][end][A] = Badded = true
return buildTree(score, back)
![Page 44: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/44.jpg)
The grammar: Binary, no epsilons,
S® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks 0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
![Page 45: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/45.jpg)
score[0][1]
score[1][2]
score[2][3]
score[3][4]
score[0][2]
score[1][3]
score[2][4]
score[0][3]
score[1][4]
score[0][4]
0
1
2
3
4
1 2 3 4fish people fish tanks
![Page 46: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/46.jpg)
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
for i=0; i<#(words); i++for A in nontermsif A -> words[i] in grammarscore[i][i+1][A] = P(A -> words[i]);
![Page 47: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/47.jpg)
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
for i=0; i<#(words); i++for A in nontermsif A -> words[i] in grammarscore[i][i+1][A] = P(A -> words[i]);
![Page 48: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/48.jpg)
N® fish0.2V® fish0.6
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
// handle unariesboolean added = true
while added added = falsefor A, B in nonterms
if score[i][i+1][B] > 0 && A->B in grammarprob = P(A->B)*score[i][i+1][B]if(prob > score[i][i+1][A])
score[i][i+1][A] = probback[i][i+1][A] = Badded = true
![Page 49: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/49.jpg)
N® fish0.2V® fish0.6
N® people0.5V® people0.1
N® fish0.2V® fish0.6
N® tanks0.2V® tanks0.1
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
// handle unariesboolean added = true
while added added = falsefor A, B in nonterms
if score[i][i+1][B] > 0 && A->B in grammarprob = P(A->B)*score[i][i+1][B]if(prob > score[i][i+1][A])
score[i][i+1][A] = probback[i][i+1][A] = Badded = true
![Page 50: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/50.jpg)
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1
N® fish0.2V® fish0.6
N® tanks0.2V® tanks0.1
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
// handle unariesboolean added = true
while added added = falsefor A, B in nonterms
if score[i][i+1][B] > 0 && A->B in grammarprob = P(A->B)*score[i][i+1][B]if(prob > score[i][i+1][A])
score[i][i+1][A] = probback[i][i+1][A] = Badded = true
![Page 51: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/51.jpg)
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if (prob > score[begin][end][A])
score[begin]end][A] = probback[begin][end][A] = new Triple(split,B,C)
for span = 2 to #(words)for begin = 0 to #(words)- span
end = begin + spanfor split = begin+1 to end-1
![Page 52: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/52.jpg)
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if (prob > score[begin][end][A])
score[begin]end][A] = probback[begin][end][A] = new Triple(split,B,C)
for span = 2 to #(words)for begin = 0 to #(words)- span
end = begin + spanfor split = begin+1 to end-1
![Page 53: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/53.jpg)
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
NP® NPNP0.0049
VP® VNP0.105
S® NPVP0.00126
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if (prob > score[begin][end][A])
score[begin]end][A] = probback[begin][end][A] = new Triple(split,B,C)
for span = 2 to #(words)for begin = 0 to #(words)- span
end = begin + spanfor split = begin+1 to end-1
![Page 54: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/54.jpg)
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
NP® NPNP0.0049
VP® VNP0.105
S® NPVP0.00126
NP® NPNP0.0049
VP® VNP0.007
S® NPVP0.0189
NP® NPNP0.00196
VP® VNP0.042
S® NPVP0.00378
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
//handle unariesboolean added = truewhile added
added = falsefor A, B in nonterms
prob = P(A->B)*score[begin][end][B];if prob > score[begin][end][A]
score[begin][end][A] = probback[begin][end][A] = Badded = true
![Page 55: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/55.jpg)
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
NP® NPNP0.0049
VP® VNP0.105
S® VP0.0105
NP® NPNP0.0049
VP® VNP0.007
S® NPVP0.0189
NP® NPNP0.00196
VP® VNP0.042
S® VP0.0042
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
for split = begin+1 to end-1for A,B,C in nonterms
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if prob > score[begin][end][A]
score[begin]end][A] = probback[begin][end][A] = new Triple(split,B,C)
for span = 2 to #(words)for begin = 0 to #(words)- span
end = begin + spanfor split = begin+1 to end-1
![Page 56: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/56.jpg)
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
NP® NPNP0.0049
VP® VNP0.105
S® VP0.0105
NP® NPNP0.0049
VP® VNP0.007
S® NPVP0.0189
NP® NPNP0.00196
VP® VNP0.042
S® VP0.0042
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
for split = begin+1 to end-1for A,B,C in nonterms
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if prob > score[begin][end][A]
score[begin]end][A] = probback[begin][end][A] = new Triple(split,B,C)
for span = 2 to #(words)for begin = 0 to #(words)- span
end = begin + spanfor split = begin+1 to end-1
![Page 57: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/57.jpg)
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
NP® NPNP0.0049
VP® VNP0.105
S® VP0.0105
NP® NPNP0.0049
VP® VNP0.007
S® NPVP0.0189
NP® NPNP0.00196
VP® VNP0.042
S® VP0.0042
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
for split = begin+1 to end-1for A,B,C in nonterms
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if prob > score[begin][end][A]
score[begin]end][A] = probback[begin][end][A] = new Triple(split,B,C)
for span = 2 to #(words)for begin = 0 to #(words)- span
end = begin + spanfor split = begin+1 to end-1
![Page 58: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/58.jpg)
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
NP® NPNP0.0049
VP® VNP0.105
S® VP0.0105
NP® NPNP0.0049
VP® VNP0.007
S® NPVP0.0189
NP® NPNP0.00196
VP® VNP0.042
S® VP0.0042
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
for split = begin+1 to end-1for A,B,C in nonterms
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if prob > score[begin][end][A]
score[begin]end][A] = probback[begin][end][A] = new Triple(split,B,C)
for span = 2 to #(words)for begin = 0 to #(words)- span
end = begin + spanfor split = begin+1 to end-1
![Page 59: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/59.jpg)
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
NP® NPNP0.0049
VP® VNP0.105
S® VP0.0105
NP® NPNP0.0049
VP® VNP0.007
S® NPVP0.0189
NP® NPNP0.00196
VP® VNP0.042
S® VP0.0042
NP® NPNP0.0000686
VP® VNP0.00147
S® NPVP0.000882
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
for split = begin+1 to end-1for A,B,C in nonterms
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if prob > score[begin][end][A]
score[begin]end][A] = probback[begin][end][A] = new Triple(split,B,C)
for span = 2 to #(words)for begin = 0 to #(words)- span
end = begin + spanfor split = begin+1 to end-1
![Page 60: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/60.jpg)
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
NP® NPNP0.0049
VP® VNP0.105
S® VP0.0105
NP® NPNP0.0049
VP® VNP0.007
S® NPVP0.0189
NP® NPNP0.00196
VP® VNP0.042
S® VP0.0042
NP® NPNP0.0000686
VP® VNP0.00147
S® NPVP0.000882
NP® NPNP0.0000686
VP® VNP0.000098
S® NPVP0.01323
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
for split = begin+1 to end-1for A,B,C in nonterms
prob=score[begin][split][B]*score[split][end][C]*P(A->BC)if prob > score[begin][end][A]
score[begin]end][A] = probback[begin][end][A] = new Triple(split,B,C)
![Page 61: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/61.jpg)
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® people0.5V® people0.1NP® N0.35VP® V0.01S® VP0.001
N® fish0.2V® fish0.6NP® N0.14VP® V0.06S® VP0.006
N® tanks0.2V® tanks0.1NP® N0.14VP® V0.03S® VP0.003
NP® NPNP0.0049
VP® VNP0.105
S® VP0.0105
NP® NPNP0.0049
VP® VNP0.007
S® NPVP0.0189
NP® NPNP0.00196
VP® VNP0.042
S® VP0.0042
NP® NPNP0.0000686
VP® VNP0.00147
S® NPVP0.000882
NP® NPNP0.0000686
VP® VNP0.000098
S® NPVP0.01323
NP® NPNP0.0000009604
VP® VNP0.00002058
S® NPVP0.00018522
0
1
2
3
4
1 2 3 4fish people fish tanksS® NPVP 0.9S® VP 0.1VP® VNP 0.5VP® V 0.1VP® V@VP_V 0.3VP® VPP 0.1@VP_V® NPPP 1.0NP® NPNP 0.1NP® NPPP 0.2NP® N 0.7PP® PNP 1.0
N® people 0.5N® fish 0.2N® tanks
0.2N® rods 0.1V® people 0.1V® fish 0.6V® tanks 0.3P® with 1.0
CallbuildTree(score,back)togetthebestparse
![Page 62: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/62.jpg)
Quiz Question!
runs down
NNS 0.0023VB 0.001
PP 0.2IN 0.0014NNS 0.0001
PP → IN 0.002NP → NNS NNS 0.01NP → NNS NP 0.005NP → NNS PP 0.01VP → VB PP 0.045VP → VB NP 0.015
?? ???? ??
What constituents (with what probability can you make?
![Page 63: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/63.jpg)
Outline
§ Restrictingthegrammarformforefficientparsing§ Exactpolynomialtimeparsingof(P)CFGs§ ConstituencyParserEvaluation
![Page 64: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/64.jpg)
Evaluating constituency parsing
![Page 65: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/65.jpg)
Evaluating constituency parsing
Goldstandardbrackets:S-(0:11), NP-(0:2),VP-(2:9),VP-(3:9),NP-(4:6),PP-(6-9),NP-(7,9),NP-(9:10)
Candidatebrackets:S-(0:11),NP-(0:2),VP-(2:10),VP-(3:10),NP-(4:6),PP-(6-10),NP-(7,10)
LabeledPrecision 3/7=42.9%LabeledRecall 3/8=37.5%LP/LRF1 40.0%TaggingAccuracy 11/11=100.0%
![Page 66: 复旦大学大数据学院 Statistical Natural Language Parsing · 2017. 11. 29. · 复旦大学大数据学院 SchoolofDataScience,FudanUniversity DATA130006Text Management and](https://reader036.vdocuments.pub/reader036/viewer/2022071218/604e4a7d1b122459ba3c43af/html5/thumbnails/66.jpg)
How good are PCFGs?
§ PennWSJparsingaccuracy:about73%LP/LRF1
§ Robust§Usuallyadmiteverything,butwithlowprobability
§ Partialsolutionforgrammarambiguity§ APCFGgivessomeideaoftheplausibilityofaparse§ Butnotsogoodbecausetheindependenceassumptionsaretoostrong
§Giveaprobabilisticlanguagemodel§ Butinthesimplecaseitperformsworsethanatrigrammodel
§ TheproblemseemstobethatPCFGslackthelexicalizationofatrigrammodel