101035 中文信息处理 chinese nlp lecture 9. 句 —— 语法分析( 2 ) grammatical...
TRANSCRIPT
101035 中文信息处理
Chinese NLP
Lecture 9
2
句——语法分析( 2)Grammatical Analysis (2)
• 句法分析( Syntactic parsing)
• 搜索式句法分析( Parsing as search)
• 结构歧义( Structural ambiguities)• 动态规划句法分析( Dynamic programming
parsing)
3
句法分析Syntactic Parsing
• Basics
• The goal of syntactic parsing is to construct a parse tree for a given sentence, based on a grammar or rule system.
• Parsing is essentially searching in the rule space by finding all possible rule combinations.
• The search is successful when a combination is found, wherein the rules can be used to generate a parse tree to represent the sentence structure.
4
• Parsing Methods
• Basic Searching Methods
• Top-down
• Bottom-up
• Dynamic Programming Methods
• CKY
• Early
• Chart Parsing
• Statistical-based Methods
• Probabilistic parsing
5
搜索式句法分析Parsing as Search
• Top-Down Parsing
• A top-down parser builds a tree from the root node S down to the leaves.
• Bottom-Up Parsing
• A bottom-up parser starts with the words of the input and builds a tree rooted in the symbol of S.
6
• Example
Book that flight.
7
• Example
• Top-down parsing
Book that flight.
…
8
• Example
• Bottom-up parsing
Book that flight.
9
In-Class Exercise
• Provide the omitted steps of the top-down parsing in order to derive the final correct parse tree. (To save space, terminal nodes can be found in only one step.)
10
• Top-Down vs Bottom-Up
• Top-down parsing does not waste time exploring trees that cannot result in an S, but bottom-up parsing generates many trees unable to lead to an S.
• Bottom-up parsing always generates trees that are consistent with the input words, but top-down parsing spends considerable effort on S trees that are not consistent with the input.
11
结构歧义Structural Ambiguities
• A Major Challenge
• One sentence usually corresponds to more than one parse tree, rendering different meanings. Structural ambiguities are a major challenge for syntactic parsing.
12
• Ambiguities in English
• Attachment ambiguity
I shot an elephant in my pajamas.
13
• Ambiguities in English
• Attachment ambiguity
• Ambiguities in Chinese
• “VP+的+是+NP”型
• “N1+N2+N3”型
• “ADJ+N1+N2”型
• “VP+N1的+N2”型
I can see old men and women in the park.
反对的是少数人
北欧语言研究会
小学生词典
咬死了猎人的狗
14
• Ambiguities in Chinese
• “N1+的+N2+和+N3”型
• “V+N1+N2”型
• “MQ+NP1+的+NP2”型
• “VP+ MQ +NP"型
衣服的袖子和口袋
赠意大利图书
三个学校的实验员
发了三天工资
15
动态规划句法分析Dynamic Programming
Parsing
• Features
• Dynamic programming parsing methods are efficient because subtrees are discovered once, stored, and then used in all parses calling for that constituent.
• It partially solves the ambiguity problem by storing all possible parses.
16
• CKY Parsing
• A dynamic programming bottom-up parsing method
Book the flight through Houston.
Every non-
terminal rule must
be converted to CNF
17
• CKY Parsing
• For a sentence of length n, CKY deals with the upper-triangular portion of an (n+1)×(n+1) matrix. Each cell [i, j] in this matrix contains a set of non-terminals that represent all the constituents that span positions i through j of the input.
0 Book 1 that 2 flight 3
[0, 3]
• CKY parsing is parse table filling.
18
• CKY Parsing
• Algorithm
19
• CKY Parsing
• Example
Book the flight through Houston.
20
• CKY Parsing Book the flight through Houston.
21
In-Class Exercise
• When CKY ends (on the previous page), it generates 3 possible parses at once (S1, S2, S3). Please draw their corresponding parse trees.
22
• Earley
• A dynamic programming top-down parsing method
• Earley algorithm is a single left-to-right pass that fills an array called a chart that has N +1 entries.
• Earley’s word indexing method is the same as CKY’s.
• Dotted rule
• The structure of a state of the chart with a dot (•)
• A state’s position with respect to the input are represented by two numbers indicating where the state begins and where its dot lies.
NP → Det • Nominal, [1,2]
NP begins at
position 1
The dot lies at
position 2Expecte
dParse
d
S → α•, [0,N] Successful parse
23
• Earley
• Predictor
• It creates new states representing top-down expectations generated during the parsing process.
• It is applied to non-terminal to the right of the dot.
• Scanner
• When a state has a POS category to the right of the dot, Scanner is called to examine the input and incorporate a state corresponding to the prediction of a word with a particular POS into the chart.
• Completer
• It is applied to a state when its dot has reached the right end of the rule.
• The purpose of Completer is to find, and advance, all previously created states that were looking for a particular grammatical category that has just been discovered.
24
• Earley
• Algorithm
3 core operations
25
• Early
• Example
Book that flight.
26
• Early
• Example
Book that flight.
27
• Early
• States that lead to the correct parse.
Book that flight.
28
• 句法分析• Parsing Methods
• 搜索式句法分析• Top-Down
• Bottom-Up
• 结构歧义• Ambiguities in English
• Ambiguities in Chinese
Wrap-Up
• 动态规划句法分析• Features
• CKY Parsing
• Earley
• Examples