101035 中文信息处理 chinese nlp lecture 8. 句 —— 语法分析( 1 ) grammatical...
TRANSCRIPT
101035 中文信息处理
Chinese NLP
Lecture 8
2
句——语法分析( 1 )Grammatical Analysis (1)
• 语法分析基础( Basics)
• 形式语法( Formal grammars)
• 上下文无关语法( Context-free grammars )• 依存语法( Dependency grammar )
3
语法分析基础Basics
• Constituency (句子成分)• Grammar, or strictly speaking syntax, is about how
words are put together to make sentences.
• A constituent is a group of words, assuming a certain syntactic role.
• A constituent stands in certain grammatical relations to other constituents.
4
• Examples of Constituents
• English noun phrases
• English noun phrases appear in similar syntactic environments.
• But an individual word in a noun phrase cannot.
Harry the Horse, a high-class spot such as Mindy’sthe Broadway coppers, the reason he comes into the Hot Box
a high-class spot such as Mindy’s attracts. . .the Broadway coppers love. . .
* a high-class attracts. . .* the love. . .
5
• Examples of Constituents
• Chinese phrases
• “ 把……”,“被……”
• Structural account
老师被迟到的学生逗乐了。 = 迟到的学生把老师逗乐了。 ≠ * 老师被迟到的学生被逗乐了。老师被冤枉的事情传开了。≠ * 冤枉的事情把老师传开了。 = 老师被冤枉的事情被传开了。电话被监听的老师找到了。 = 监听的老师把电话找到了。 = 电话被监听的老师被找到了。
6
形式语法Formal Grammars
• Enumeration
• The grammar of a language can be the set of all enumerated sentences.
• We cannot exhaust all possible sentences or deal with new sentences.
• Rather, we should use recursive language to describe sentences with internal structure.
7
• Regular expressions
• Symbols of a language (POS)
ART (冠词) , PRON (代词)N (名词) , V (动词) , ADJ (形容词) , ADV (副词)
• Combination patterns of the symbols
ART+N ; ART+N+V ; ART+ADJ+N+V
• Regular expression symbols
• *: occurs zero or more times
ART+ADJ*+N
• +: occurs 1 or more times
ART+ADJ++N
• ( ): occurs zero or 1 time
ART+(ADJ)+N
• |: disjunctions
N | PRON + V
8
In-Class Exercise
• Write a regular expression that can describe all the following phrases.
老张是一个环卫工老张是一个聪明的环卫工。老张是一个聪明勤劳的环卫工。他是一个聪明的人。
9
• Rules in a Formal Grammar
• A set of rules or productions express the ways that symbols of the language can be grouped and ordered together.
• S (句子) , NP (名词短语) , VP (动词短语) , PP (介词短语)
• Formal Definition of a Formal Grammar
• N: a set of non-terminal symbols (or variables)
• Σ: a set of terminal symbols (disjoint from N)
• R: a set of rules or productions, each of the form A β, where A is a nonterminal, β is a string of symbols from the infinite set of strings (Σ ∪N)∗
• S: a designated start symbol
S NP VP, NP Det N,VP V NP, PP Prep NP
10
上下文无关语法Context-Free Grammars
• Definition
• As a kind of formal grammar, Context-Free Grammars (CFGs) are the most commonly used mathematical system for modeling the constituent structure of a language. They are also called Phrase-Structure Grammars.
11
• Parse tree
• A parse tree is a tree structure that shows how the rules in a CFG are used in a sequence to expand a non-terminal node into terminal nodes.
NP → Det Nominal
Det → aNominal →
Noun
Noun → flight
12
• An English Example
• Lexicon
I prefer a morning flight.
13
• An English Example
• Grammar
I prefer a morning flight.
14
• An English Example
• Parse Tree
I prefer a morning flight.
15
• Chinese Examples
16
• Treebanks
• A Treebank is a corpus in which every sentence is syntactically annotated with a parse tree.
• Treebanks are invaluable resources for NLP, especially parsing.
• The Penn Treebank Project is a representative treebank.
• Samples from Penn Treebank.
17
• Chomsky Normal Form
• A CFG is in Chomsky Normal Form (CNF) if each production is either of the form A → B C or A → a. That is, the right-hand side of each rule either has two non-terminal symbols or one terminal symbol.
• Conversion to CNF
VP → VBD NP PPVP → VP PPVP -> VBD NP PP*
18
依存语法Dependency Grammar
• Definition
• It is a kind of grammar where the syntactic structure of a sentence is described purely in terms of words and binary semantic or syntactic relations between these words.
• Dependency relations are directional.
• There are no structural levels or non-terminal nodes as in CFG.
19
• A Chinese Example
Dependency Tree
• Dependency Graph
那个小孩喜欢通俗歌曲
喜欢
小孩 歌曲
通俗那个
喜欢小孩 歌曲通俗Root
HED
SBV
VOB
ATT
那个
ATT
20
• Axioms of Dependency
• Only one constituent in a sentence is independent.
• All the other constituents in the sentence are dependent on some constituent.
• No constituent is dependent on two or more other constituents.
• If A is dependent on B and C is situated between A and B in the sentence, then either C is dependent on A or B, or C is dependent on a constituent between A and B.
21
• Conditions of Dependency Tree
• Single Type Node: A dependency tree has only terminal nodes and no non-terminal nodes.
• Single Parent Node: The root node is the only parent node. All the other nodes have only one parent node.
• Unique Root Node: A dependency tree has only one root node, which governs all the other nodes.
• Non-overlapping: A dependency tree’s branches cannot overlap with each other.
• Mutual exclusiveness: The relations of governing and preceding are exclusive. If two nodes have a “governing” relation between them, they cannot have a “preceding” relation.
22
• Dependency Relations
• There are more than 50 dependency relations in English (Stanford Parser)
Dependency relation
Meaning Example
amod adjectival modifier Sam eats red meatamod(meat, red)
dobj direct object She gave me a raisedobj(gave, raise)
nsubj nominal subject Clinton defeated Dole nsubj (defeated, Clinton)
pcomp prepositional complement
They heard about you missing classes pcomp(about, missing)
tmod temporal modifier Last night, I swam in the pooltmod(swam, night)
23
In-Class Exercise
• Given the sentence The sausage was eaten by his dog, complete the following dependency relations by choosing from the list of {nsubj, amod, dobj, pcomp, tmod}.
_____(eat, sausage)
_____(eat, dog)
24
• Heads and Dependency
• Syntactic constituents could be associated with a lexical head.
• N is the head of an NP, V is the head of a VP …
Workers dumped sacks into a bin.
25
• Heads and Dependency
• A dependency graph can be automatically derived from a context-free parse by using the head rules.
Vinken will join the board as a nonexecutive director Nov 29.
26
• 语法分析基础• Constituents
• 形式语法• Regular Expressions
• Symbols and Rules
• Formal Definition
• 上下文无关语法• Parse Tree
Wrap-Up
• Examples
• Treebanks
• 依存语法• Axioms
• Dependency Tree and Graph
• Dependency Relations
• Heads and Dependency