101035 中文信息处理 chinese nlp lecture 9. 句 —— 语法分析( 2 ) grammatical...

28
101035 中中中中中中 Chinese NLP Lecture 9

Upload: ross-elliott

Post on 29-Dec-2015

357 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

101035 中文信息处理

Chinese NLP

Lecture 9

Page 2: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

2

句——语法分析( 2)Grammatical Analysis (2)

• 句法分析( Syntactic parsing)

• 搜索式句法分析( Parsing as search)

• 结构歧义( Structural ambiguities)• 动态规划句法分析( Dynamic programming

parsing)

Page 3: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

3

句法分析Syntactic Parsing

• Basics

• The goal of syntactic parsing is to construct a parse tree for a given sentence, based on a grammar or rule system.

• Parsing is essentially searching in the rule space by finding all possible rule combinations.

• The search is successful when a combination is found, wherein the rules can be used to generate a parse tree to represent the sentence structure.

Page 4: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

4

• Parsing Methods

• Basic Searching Methods

• Top-down

• Bottom-up

• Dynamic Programming Methods

• CKY

• Early

• Chart Parsing

• Statistical-based Methods

• Probabilistic parsing

Page 5: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

5

搜索式句法分析Parsing as Search

• Top-Down Parsing

• A top-down parser builds a tree from the root node S down to the leaves.

• Bottom-Up Parsing

• A bottom-up parser starts with the words of the input and builds a tree rooted in the symbol of S.

Page 6: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

6

• Example

Book that flight.

Page 7: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

7

• Example

• Top-down parsing

Book that flight.

Page 8: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

8

• Example

• Bottom-up parsing

Book that flight.

Page 9: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

9

In-Class Exercise

• Provide the omitted steps of the top-down parsing in order to derive the final correct parse tree. (To save space, terminal nodes can be found in only one step.)

Page 10: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

10

• Top-Down vs Bottom-Up

• Top-down parsing does not waste time exploring trees that cannot result in an S, but bottom-up parsing generates many trees unable to lead to an S.

• Bottom-up parsing always generates trees that are consistent with the input words, but top-down parsing spends considerable effort on S trees that are not consistent with the input.

Page 11: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

11

结构歧义Structural Ambiguities

• A Major Challenge

• One sentence usually corresponds to more than one parse tree, rendering different meanings. Structural ambiguities are a major challenge for syntactic parsing.

Page 12: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

12

• Ambiguities in English

• Attachment ambiguity

I shot an elephant in my pajamas.

Page 13: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

13

• Ambiguities in English

• Attachment ambiguity

• Ambiguities in Chinese

• “VP+的+是+NP”型

• “N1+N2+N3”型

• “ADJ+N1+N2”型

• “VP+N1的+N2”型

I can see old men and women in the park.

反对的是少数人

北欧语言研究会

小学生词典

咬死了猎人的狗

Page 14: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

14

• Ambiguities in Chinese

• “N1+的+N2+和+N3”型

• “V+N1+N2”型

• “MQ+NP1+的+NP2”型

• “VP+ MQ +NP"型

衣服的袖子和口袋

赠意大利图书

三个学校的实验员

发了三天工资

Page 15: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

15

动态规划句法分析Dynamic Programming

Parsing

• Features

• Dynamic programming parsing methods are efficient because subtrees are discovered once, stored, and then used in all parses calling for that constituent.

• It partially solves the ambiguity problem by storing all possible parses.

Page 16: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

16

• CKY Parsing

• A dynamic programming bottom-up parsing method

Book the flight through Houston.

Every non-

terminal rule must

be converted to CNF

Page 17: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

17

• CKY Parsing

• For a sentence of length n, CKY deals with the upper-triangular portion of an (n+1)×(n+1) matrix. Each cell [i, j] in this matrix contains a set of non-terminals that represent all the constituents that span positions i through j of the input.

0 Book 1 that 2 flight 3

[0, 3]

• CKY parsing is parse table filling.

Page 18: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

18

• CKY Parsing

• Algorithm

Page 19: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

19

• CKY Parsing

• Example

Book the flight through Houston.

Page 20: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

20

• CKY Parsing Book the flight through Houston.

Page 21: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

21

In-Class Exercise

• When CKY ends (on the previous page), it generates 3 possible parses at once (S1, S2, S3). Please draw their corresponding parse trees.

Page 22: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

22

• Earley

• A dynamic programming top-down parsing method

• Earley algorithm is a single left-to-right pass that fills an array called a chart that has N +1 entries.

• Earley’s word indexing method is the same as CKY’s.

• Dotted rule

• The structure of a state of the chart with a dot (•)

• A state’s position with respect to the input are represented by two numbers indicating where the state begins and where its dot lies.

NP → Det • Nominal, [1,2]

NP begins at

position 1

The dot lies at

position 2Expecte

dParse

d

S → α•, [0,N] Successful parse

Page 23: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

23

• Earley

• Predictor

• It creates new states representing top-down expectations generated during the parsing process.

• It is applied to non-terminal to the right of the dot.

• Scanner

• When a state has a POS category to the right of the dot, Scanner is called to examine the input and incorporate a state corresponding to the prediction of a word with a particular POS into the chart.

• Completer

• It is applied to a state when its dot has reached the right end of the rule.

• The purpose of Completer is to find, and advance, all previously created states that were looking for a particular grammatical category that has just been discovered.

Page 24: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

24

• Earley

• Algorithm

3 core operations

Page 25: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

25

• Early

• Example

Book that flight.

Page 26: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

26

• Early

• Example

Book that flight.

Page 27: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

27

• Early

• States that lead to the correct parse.

Book that flight.

Page 28: 101035 中文信息处理 Chinese NLP Lecture 9. 句 —— 语法分析( 2 ) Grammatical Analysis (2) 句法分析( Syntactic parsing) 搜索式句法分析( Parsing as search)

28

• 句法分析• Parsing Methods

• 搜索式句法分析• Top-Down

• Bottom-Up

• 结构歧义• Ambiguities in English

• Ambiguities in Chinese

Wrap-Up

• 动态规划句法分析• Features

• CKY Parsing

• Earley

• Examples