cmsc 723 / ling 645: intro to computational linguistics november 10, 2004 lecture 10 (dorr): cfg’s...

CMSC 723 / LING 645: Intro to Computational Linguistics

November 10, 2004

Lecture 10 (Dorr):CFG’s (Finish Chapter 9)

Parsing (Chapter 10)

Prof. Bonnie J. DorrDr. Christof Monz

TA: Adam Lee

What Issues Arise in Creating Grammars?

AgreementSubcategorizationMovement

Agreement

Simple Examples:– This dog– Those dogs– *Those dog– *This dogs

More complicated examples:– Do any flights stop in Chicago?– Does Delta fly from Atlanta to Boston?– Do I get dinner on this flight?– What flights leave in the morning?– * What flight leave in the morning?

Agreement Features Propagate Downard …

NP3sg → (Det) (Card) (Ord) (Quant) (AP) NominalSg

NPnon3sg → (Det) (Card) (Ord) (Quant) (AP) NominalPl

What’s wrong with this picture?

Combinatorial explosionOther feature-based expansions?Loss of Generalization

Subcategorization

Verbs have preferences for the kinds of constituents they co-occur with

For example:– VP → Verb (disappear)– VP → Verb NP (prefer a morning flight)– VP → Verb NP PP (leave Boston in the morning)– VP → Verb PP (leaving on Thursday)

But not: *I disappeared the cat.

Sentential complements

Example: – You [VP [V said [S there were two flights that

were the cheapest]]]– You [VP [ V said [S you had a two hundred sixty

six dollar fare]]]– [VP [ V Tell][NP me ] [S how to get from the

airport in Philadelphia to downtown]]Rule:

– VP → Verb S

Other VP Constituents

A VP can be the constituent of another VP:VP → Verb VP– I want [VP to fly from Milwaukee to Orlando]

– I’m trying [VP to find a flight that goes from Pittsburgh to Denver next Friday]

Verbs can also be followed by particles to form a phrasal verb: VP → Verb Particle– take off

Why do we need Subcategorization?

Important observation: while a verb phrase can have many possible kinds of constituents, not every verb is compatible with every verb phrase

Example: verb want can be used – With a NP complement: I want a flight

– With an infinitive VP complement: I want to fly to Dallas

But verb find cannot take such a VP complement– *I find to fly to Dallas

Subcategorization

Frame Verb ExampleØ eat, sleep, … I want to eatNP prefer, find, leave, ... Find [NP the flight from Pittsburgh to Boston]NP NP show, give, … Show [NP me] [NP airlines with flights from Pittsburgh]PPfrom PPto fly, travel, … I would like to fly [pp from Boston] [pp to Philadelphia]

NP PPwith help, load, … Can you help [NP me] [pp with a flight]

VPto prefer, want, need, … I would prefer [VPto to go by United airlines]

VPbrst can, would, might, … I can [VPbrst go from Boston]

S mean Does this mean [S AA has a hub in Boston]?

Auxiliaries

Examples:– Modals: can, could, may, might– Perfect: have– Progressive: be– Passive: be

What are their subcategories?Ordering: modal < perfect < progressive <

passivee.g, might have been prevented

Movement

I looked up his grade.I looked his grade up.

John put the book on the table.What did John put on the table?

Long distance dependencies.

Grammar Equivalence and Normal Form

Weak equivalence

Strong equivalence

Chomsky Normal Form (CNF)A → B C or A → a

CNF Example

Convert to Chomsky-Normal Form (CNF):

S → a Y X X → a X | b Y → Y a | b

Why do care about grammar?

We need grammars for parsing!

Note: Parsing is the topic we will cover next.

What Linguistic Representations are necessary for parsing?Words

– Classes: groups of words which behave similarly

– Function/Content

– P.O.S.: Nouns, Verbs, Adjectives, Prepositions

Constituents: – Groupings of words into larger units which behavior

similarly and have a particular p.o.s. as their head

– Phrases: NP headed by Noun... VP, PP, ...

Analyzing Language in Terms of these RepresentationsMorphological parsing:

– analyze words into morphemes and affixes– rule-based, FSAs, FSTs

POS TaggingSyntactic parsing:

– identify component parts and how related– to see if a sentence is grammatical– to assign a semantic structure

Syntactic Parsing

Declarative formalisms like CFGs define the legal strings of a language but don’t specify how to recognize or assign structure to them

Parsing algorithms specify how to recognize the strings of a language and assign each string one or more syntactic structures

Parse trees useful for grammar checking, semantic analysis, MT, QA, information extraction, speech recognition…and almost every task in NLP

CFG for Fragment of English: G0

Parse Tree for ‘Book that flight’ using G0

Parsing as a Search Problem

Searching FSAs– Finding the right path through the automaton

– Search space defined by structure of FSA

Searching CFGs– Finding the right parse tree among all possible parse

trees

– Search space defined by the grammar

Constraints provided by the input sentence and the automaton or grammar

Two Search Strategies

Top-Down– Search for tree starting from S until input words

covered.

Bottom-Up– Start with words and build upwards toward S

Top-Down Parser

Builds from the root S node to the leavesAssuming we build all trees in parallel:

– Find all trees with root S– Next expand all constituents in these trees/rules– Continue until leaves are pos– Candidate trees failing to match pos of input string are

rejectedTop-Down: Rationalist Tradition

– Expectation driven– Goal: Build tree for input starting with S

Top-Down Search Space for G0

Bottom-Up Parsing

Parser begins with words of input and builds up trees, applying G0 rules whose right-hand sides match

Book that flightN Det N V Det N

Book that flight Book that flight– ‘Book’ ambiguous– Parse continues until an S root node reached or no further node

expansion possible Bottom-Up: Empiricist Tradition

– Data driven– Primary consideration: Lowest sub-trees of final tree must hook up

with words in input.

Expanding Bottom-Up Search Space for ‘Book that flight’

Comparing Top-Down and Bottom-Up

Top-Down parsers: never explore illegal parses (e.g. can’t form an S) -- but waste time on trees that can never match the input

Bottom-Up parsers: never explore trees inconsistent with input -- but waste time exploring illegal parses (no S root)

For both: how to explore the search space?– Pursuing all parses in parallel or …?– Which node to expand next?– Which rule to apply next?

Search Strategy and Search Control

Parallel:– Explore all possible trees in parallel

Depth-first search: – Agenda of search states: expand search space incrementally,

exploring most recently generated state (tree) each time– When you reach a state (tree) inconsistent with input, backtrack

to most recent unexplored state (tree) Which node to expand?

– Leftmost Which grammar rule to use?

– Order in the grammar

Basic Algorithm for Top-Down, Depth-First, Left-Right Strategy

Initialize agenda with ‘S’ tree and point to first word and make this current search state (cur)

Loop until successful parse or empty agenda– Apply all applicable grammar rules to leftmost

unexpanded node of cur • If this node is a POS category and matches that of

the current input, push this onto agenda

• Else, push new trees onto agenda

– Pop new cur from agenda

Basic Top-Down Parser

Top-Down Depth-First Derivation Using G0

Example: Does this flight include a meal?

Example continued …

Augmenting Top-Down Parsing with Bottom-Up FilteringWe saw: Top-Down, depth-first, L-to-R parsing

– Expands non-terminals along the tree’s left edge down to leftmost leaf of tree

– Moves on to expand down to next leftmost leaf… In a successful parse, current input word will be the

first word in derivation of the unexpanded node that the parser is currently processing

So….lookahead to left-corner of the tree in – B is a left-corner of A if A =*=> B– Build table with left-corners of all non-terminals in

grammar and consult before applying rule

Left Corners

Left-Corner Table for G0

Previous Example:

Category Left Corners

S Det, PropN, Aux, V

NP Det, PropN

Nom N

VP V

Summing Up Parsing Strategies

Parsing is a search problem which may be implemented with many search strategies

Top-Down vs. Bottom-Up Parsers– Both generate too many useless trees– Combine the two to avoid over-generation: Top-

Down Parsing with Bottom-Up look-ahead Left-corner table provides more efficient look-

ahead– Pre-compute all POS that can serve as the leftmost

POS in the derivations of each non-terminal category

Three Critical Problems in Parsing

Left Recursion

Ambiguity

Repeated Parsing of Sub-trees

Left Recursion

Depth-first search will never terminate if grammar is left recursive: A→A B βExamples: NP → NP PP, VP → VP PP, S → S and S, NP → NP and NP

Indirect Left Recursion: A→A B βExample: NP → Det Nominal, Det → NP ’s

NP NP NP

NominalDet

’s

*

Solutions to Left Recursion

Rule orderingDon't use recursive rulesLimit depth of recursion in parsing to some

analytically or empirically set limitDon't use top-down parsing

Rule Ordering

Bad:– NP → NP PP– NP → Det Nominal

Better alternative: – First: NP → Det Nominal– Then: NP → NP PP

Grammar Rewriting

Rewrite left-recursive grammar as weakly equivalent non-recursive one.

Can be done:– By Hand (ick) or …

– Automatically

Example Rewrite: NP → NP PP, NP → Det Nominal

As: NP → Det Nominal Stuff, Stuff → PP Stuff, Stuff → ε

Problems with Grammar Rewriting

• Original:[NP [NP the book] [PP on [NP [NP the table] [PP in [NP [NP the yard] [PP of [NP the house]]]]]]]

• Becomes:[NP the book [Stuff [PP on [NP the table [Stuff [PP in [NP the yard [Stuff [PP of [NP the house [Stuff]]] [Stuff]]]] [Stuff]]]] [Stuff]]]

Depth Bound

Set an arbitrary bound Set an analytically derived

bound Run tests and derive reasonable

bound empiricallyUse iterative deepening

Ambiguity

Local Ambiguity– Leads to hypotheses that are locally reasonable

but eventually lead nowhere – “Book that flight”

Global Ambiguity– Leads to multiple parses for the same input– “I shot an elephant in my pajamas”

More Ambiguity Examples

Multiple legal structures– Attachment (e.g. I saw a man on a hill with

telescope)– Coordination (e.g. younger cats and dogs)– NP bracketing (e.g. Spanish language

teachers)

Two Parse Trees for Ambiguous Sentence

More Ambiguity: ‘Can you book TWA flights?’

A Correct Parse for ‘Show me the meal on Flight UA 386 from San Francisco to Denver’

Inefficient Re-Parsing of Subtrees

Invariants

Despite ambiguity, there are invariants

Sub-components of final parse tree are re-analyzed unnecessarily

Except for top-most component, every part of final tree is derived more than once.

What’s the solution?

Key to efficient processing is reuseFill table with solutions to sub-problems for later

use.We want an algorithm that:

– Does not do repeated work

– Does top-down search with bottom-up filtering

– Solves left-recursion problem

– Solves an exponential problem

Dynamic Programming

Create table of solutions to sub-problems (e.g. subtrees) as parse proceeds

Look up subtrees for each constituent rather than re-parsing

Since all parses implicitly stored, all available for later disambiguation

Examples: Cocke-Younger-Kasami (CYK) (1960), Graham-Harrison-Ruzzo (GHR) (1980) and Earley (1970) algorithms

Earley Algorithm

Uses dynamic programming to do parallel top-down search in (worst case) O(N3) time

First, left-to-right pass fills out a chart with N+1 states– Think of chart entries as sitting between words in

the input string keeping track of states of the parse at these positions

– For each word position, chart contains set of states representing all partial parse trees generated to date. E.g. chart[0] contains all partial parse trees generated at the beginning of the sentence

Chart Entries

predicted constituents

in-progress constituents

completed constituents

Represent three types of constituents:

Progress in parse represented by Dotted Rules

Position of • indicates type of constituent

0 Book 1 that 2 flight 3

• S → • VP, [0,0] (predicted)

• NP → Det • Nom, [1,2] (in progress)

• VP →V NP •, [0,3] (completed)

[x,y] tells us what portion of the input is spanned so far by this rule

Each State si:<dotted rule>, [<back pointer>,<current position>]

S → • VP, [0,0] – First 0 means S constituent begins at the start of input

– Second 0 means the dot here too

– So, this is a top-down prediction

NP → Det • Nom, [1,2]– the NP begins at position 1

– the dot is at position 2

– so, Det has been successfully parsed

– Nom predicted next

0 Book 1 that 2 flight 3

0 Book 1 that 2 flight 3 (continued)

VP → V NP •, [0,3]– Successful VP parse of entire input

Successful Parse

Final answer found by looking at last entry in chart

If entry resembles S → • [nil,N] then input parsed successfully

Chart will also contain record of all possible parses of input string, given the grammar

Parsing Procedure for the Earley Algorithm Move through each set of states in order,

applying one of three operators to each state:– predictor: add predictions to the chart– scanner: read input and add corresponding state to

chart– completer: move dot to right when new constituent

found Results (new states) added to current or next set

of states in chart No backtracking and no states removed: keep

complete history of parse

States and State Sets

Dotted Rule si represented as <dotted rule>, [<back pointer>, <current position>]

State Set Sj to be a collection of states si with the

same <current position>.

Earley Algorithm from Book

Earley Algorithm (simpler!)

1. Add Start → · S, [nil,0] to state set 0

2. Predict all states you can, adding new predictions to state set 0. Let i = 1.

3. Scan input word i—add all matched states to state set Si.

Add all new states produced by Complete to state set Si Add all new states produced by Predict to state set Si Unless i=n, (a) Let i = i + 1; (b) repeat step 3.

4. At the end, see if state set n contains Start → S ·, [nil,n]

3 Main Sub-Routines of Earley Algorithm

Predictor: Adds predictions into the chart.Completer: Moves the dot to the right

when new constituents are found.Scanner: Reads the input words and enters

states representing those words into the chart.

Predictor

Intuition: create new state for top-down prediction of new phrase.

Applied when non part-of-speech non-terminals are to the right of a dot: S → • VP [0,0]

Adds new states to current chart– One new state for each expansion of the non-terminal

in the grammarVP → • V [0,0]VP → • V NP [0,0]

Formally: Sj: A → α · B β, [i,j] Sj: B → · γ, [j,j]

Scanner

Intuition: Create new states for rules matching part of speech of next word.

Applicable when part of speech is to the right of a dot: VP → • V NP [0,0] ‘Book…’

Looks at current word in input If match, adds state(s) to next chart

VP → V • NP [0,1]Formally:

Sj: A → α · B β, [i,j] Sj+1: A → α B ·β, [i,j+1]

Completer

Intuition: parser has finished a new phrase, so must find and advance states all that were waiting for this

Applied when dot has reached right end of ruleNP → Det Nom • [1,3]

Find all states w/dot at 1 and expecting an NPVP → V • NP [0,1]

Adds new (completed) state(s) to current chartVP → V NP • [0,3]

Formally: Sk: B → δ ·, [j,k] Sk: A → α B · β, [i,k], where: Sj: A → α · B β, [i,j].

Example: State Set S0 for Parsing “Book that flight” using Grammar G0

nil

Example: State Set S1 for Parsing “Book that flight”

VP→ V and VP → V NP are both passed to Scanner, which adds them to Chart[1], moving dots to right

Scanner

Scanner

Prediction of Next Rule

When VP → V is itself processed by the Completer, S → VP is added to Chart[1] since VP is a left corner of S

Last 2 rules in Chart[1] are added by Predictor when VP → V NP is processed

And so on….

Last Two States

Scanner

ScannerScanner

γ → S . [nil,3] Completer

How do we retrieve the parses at the end?

Augment the Completer to add pointers to prior states it advances as a field in the current state– i.e. what state did we advance here?– Read the pointers back from the final state

Error Handling

What happens when we look at the contents of the last table column and don't find a S → rule?– Is it a total loss? No...– Chart contains every constituent and

combination of constituents possible for the input given the grammar

Also useful for partial parsing or shallow parsing used in information extraction

Earley’s Keys to Efficiency

Left-recursion, Ambiguity and repeated re-parsing of subtrees– Solution: dynamic programming

Combine top-down predictions with bottom-up look-ahead to avoid unnecessary expansions

Earley is still one of the most efficient parsersAll efficient parsers avoid re-computation in

a similar way.But Earley doesn’t require grammar

conversion!

Next Time

Read Chapter 14.

cmsc 723 / ling 645: intro to computational linguistics november 10, 2004 lecture 10 (dorr): cfg’s...

Documents