cmsc 723 / ling 645: intro to computational linguistics november 10, 2004 lecture 10 (dorr): cfg’s...

76
CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie J. Dorr Dr. Christof Monz TA: Adam Lee

Post on 20-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

CMSC 723 / LING 645: Intro to Computational Linguistics

November 10, 2004

Lecture 10 (Dorr):CFG’s (Finish Chapter 9)

Parsing (Chapter 10)

Prof. Bonnie J. DorrDr. Christof Monz

TA: Adam Lee

Page 2: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

What Issues Arise in Creating Grammars?

AgreementSubcategorizationMovement

Page 3: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Agreement

Simple Examples:– This dog– Those dogs– *Those dog– *This dogs

More complicated examples:– Do any flights stop in Chicago?– Does Delta fly from Atlanta to Boston?– Do I get dinner on this flight?– What flights leave in the morning?– * What flight leave in the morning?

Page 4: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Agreement

Nouns: Nominal → Noun | Noun Noun– NominalSg → NounSg | NounSg NounSg

– NominalPl → NounPl | NounPl NounPl

Subj-Verb Agreement: S → Aux NP VP– S → Aux3sg NP3sg VP– S →Auxnon3sg NPnon3sg VP

Lexical items:– NounSg → flight– NounPl → flights– Aux3sg → does | has | can …– Auxnon3sg → do | have | can …

Page 5: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Agreement Features Propagate Downard …

NP3sg → (Det) (Card) (Ord) (Quant) (AP) NominalSg

NPnon3sg → (Det) (Card) (Ord) (Quant) (AP) NominalPl

Page 6: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

What’s wrong with this picture?

Combinatorial explosionOther feature-based expansions?Loss of Generalization

Page 7: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Subcategorization

Verbs have preferences for the kinds of constituents they co-occur with

For example:– VP → Verb (disappear)– VP → Verb NP (prefer a morning flight)– VP → Verb NP PP (leave Boston in the morning)– VP → Verb PP (leaving on Thursday)

But not: *I disappeared the cat.

Page 8: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Sentential complements

Example: – You [VP [V said [S there were two flights that

were the cheapest]]]– You [VP [ V said [S you had a two hundred sixty

six dollar fare]]]– [VP [ V Tell][NP me ] [S how to get from the

airport in Philadelphia to downtown]]Rule:

– VP → Verb S

Page 9: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Other VP Constituents

A VP can be the constituent of another VP:VP → Verb VP– I want [VP to fly from Milwaukee to Orlando]

– I’m trying [VP to find a flight that goes from Pittsburgh to Denver next Friday]

Verbs can also be followed by particles to form a phrasal verb: VP → Verb Particle– take off

Page 10: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Why do we need Subcategorization?

Important observation: while a verb phrase can have many possible kinds of constituents, not every verb is compatible with every verb phrase

Example: verb want can be used – With a NP complement: I want a flight

– With an infinitive VP complement: I want to fly to Dallas

But verb find cannot take such a VP complement– *I find to fly to Dallas

Page 11: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Subcategorization

Frame Verb ExampleØ eat, sleep, … I want to eatNP prefer, find, leave, ... Find [NP the flight from Pittsburgh to Boston]NP NP show, give, … Show [NP me] [NP airlines with flights from Pittsburgh]PPfrom PPto fly, travel, … I would like to fly [pp from Boston] [pp to Philadelphia]

NP PPwith help, load, … Can you help [NP me] [pp with a flight]

VPto prefer, want, need, … I would prefer [VPto to go by United airlines]

VPbrst can, would, might, … I can [VPbrst go from Boston]

S mean Does this mean [S AA has a hub in Boston]?

Page 12: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Auxiliaries

Examples:– Modals: can, could, may, might– Perfect: have– Progressive: be– Passive: be

What are their subcategories?Ordering: modal < perfect < progressive <

passivee.g, might have been prevented

Page 13: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Movement

I looked up his grade.I looked his grade up.

John put the book on the table.What did John put on the table?

Long distance dependencies.

Page 14: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Grammar Equivalence and Normal Form

Weak equivalence

Strong equivalence

Chomsky Normal Form (CNF)A → B C or A → a

Page 15: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

CNF Example

Convert to Chomsky-Normal Form (CNF):

S → a Y X X → a X | b Y → Y a | b

Page 16: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Why do care about grammar?

We need grammars for parsing!

Note: Parsing is the topic we will cover next.

Page 17: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

What Linguistic Representations are necessary for parsing?Words

– Classes: groups of words which behave similarly

– Function/Content

– P.O.S.: Nouns, Verbs, Adjectives, Prepositions

Constituents: – Groupings of words into larger units which behavior

similarly and have a particular p.o.s. as their head

– Phrases: NP headed by Noun... VP, PP, ...

Page 18: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Analyzing Language in Terms of these RepresentationsMorphological parsing:

– analyze words into morphemes and affixes– rule-based, FSAs, FSTs

POS TaggingSyntactic parsing:

– identify component parts and how related– to see if a sentence is grammatical– to assign a semantic structure

Page 19: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Syntactic Parsing

Declarative formalisms like CFGs define the legal strings of a language but don’t specify how to recognize or assign structure to them

Parsing algorithms specify how to recognize the strings of a language and assign each string one or more syntactic structures

Parse trees useful for grammar checking, semantic analysis, MT, QA, information extraction, speech recognition…and almost every task in NLP

Page 20: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

CFG for Fragment of English: G0

Page 21: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Parse Tree for ‘Book that flight’ using G0

Page 22: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Parsing as a Search Problem

Searching FSAs– Finding the right path through the automaton

– Search space defined by structure of FSA

Searching CFGs– Finding the right parse tree among all possible parse

trees

– Search space defined by the grammar

Constraints provided by the input sentence and the automaton or grammar

Page 23: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Two Search Strategies

Top-Down– Search for tree starting from S until input words

covered.

Bottom-Up– Start with words and build upwards toward S

Page 24: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Top-Down Parser

Builds from the root S node to the leavesAssuming we build all trees in parallel:

– Find all trees with root S– Next expand all constituents in these trees/rules– Continue until leaves are pos– Candidate trees failing to match pos of input string are

rejectedTop-Down: Rationalist Tradition

– Expectation driven– Goal: Build tree for input starting with S

Page 25: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Top-Down Search Space for G0

Page 26: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Bottom-Up Parsing

Parser begins with words of input and builds up trees, applying G0 rules whose right-hand sides match

Book that flightN Det N V Det N

Book that flight Book that flight– ‘Book’ ambiguous– Parse continues until an S root node reached or no further node

expansion possible Bottom-Up: Empiricist Tradition

– Data driven– Primary consideration: Lowest sub-trees of final tree must hook up

with words in input.

Page 27: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Expanding Bottom-Up Search Space for ‘Book that flight’

Page 28: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Comparing Top-Down and Bottom-Up

Top-Down parsers: never explore illegal parses (e.g. can’t form an S) -- but waste time on trees that can never match the input

Bottom-Up parsers: never explore trees inconsistent with input -- but waste time exploring illegal parses (no S root)

For both: how to explore the search space?– Pursuing all parses in parallel or …?– Which node to expand next?– Which rule to apply next?

Page 29: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Search Strategy and Search Control

Parallel:– Explore all possible trees in parallel

Depth-first search: – Agenda of search states: expand search space incrementally,

exploring most recently generated state (tree) each time– When you reach a state (tree) inconsistent with input, backtrack

to most recent unexplored state (tree) Which node to expand?

– Leftmost Which grammar rule to use?

– Order in the grammar

Page 30: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Basic Algorithm for Top-Down, Depth-First, Left-Right Strategy

Initialize agenda with ‘S’ tree and point to first word and make this current search state (cur)

Loop until successful parse or empty agenda– Apply all applicable grammar rules to leftmost

unexpanded node of cur • If this node is a POS category and matches that of

the current input, push this onto agenda

• Else, push new trees onto agenda

– Pop new cur from agenda

Page 31: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Basic Top-Down Parser

Page 32: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Top-Down Depth-First Derivation Using G0

Page 33: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Example: Does this flight include a meal?

Page 34: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Example continued …

Page 35: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Augmenting Top-Down Parsing with Bottom-Up FilteringWe saw: Top-Down, depth-first, L-to-R parsing

– Expands non-terminals along the tree’s left edge down to leftmost leaf of tree

– Moves on to expand down to next leftmost leaf… In a successful parse, current input word will be the

first word in derivation of the unexpanded node that the parser is currently processing

So….lookahead to left-corner of the tree in – B is a left-corner of A if A =*=> B– Build table with left-corners of all non-terminals in

grammar and consult before applying rule

Page 36: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Left Corners

Page 37: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Left-Corner Table for G0

Previous Example:

Category Left Corners

S Det, PropN, Aux, V

NP Det, PropN

Nom N

VP V

Page 38: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Summing Up Parsing Strategies

Parsing is a search problem which may be implemented with many search strategies

Top-Down vs. Bottom-Up Parsers– Both generate too many useless trees– Combine the two to avoid over-generation: Top-

Down Parsing with Bottom-Up look-ahead Left-corner table provides more efficient look-

ahead– Pre-compute all POS that can serve as the leftmost

POS in the derivations of each non-terminal category

Page 39: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Three Critical Problems in Parsing

Left Recursion

Ambiguity

Repeated Parsing of Sub-trees

Page 40: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Left Recursion

Depth-first search will never terminate if grammar is left recursive: A→A B βExamples: NP → NP PP, VP → VP PP, S → S and S, NP → NP and NP

Indirect Left Recursion: A→A B βExample: NP → Det Nominal, Det → NP ’s

NP NP NP

NominalDet

’s

*

Page 41: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Solutions to Left Recursion

Rule orderingDon't use recursive rulesLimit depth of recursion in parsing to some

analytically or empirically set limitDon't use top-down parsing

Page 42: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Rule Ordering

Bad:– NP → NP PP– NP → Det Nominal

Better alternative: – First: NP → Det Nominal– Then: NP → NP PP

Page 43: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Grammar Rewriting

Rewrite left-recursive grammar as weakly equivalent non-recursive one.

Can be done:– By Hand (ick) or …

– Automatically

Example Rewrite: NP → NP PP, NP → Det Nominal

As: NP → Det Nominal Stuff, Stuff → PP Stuff, Stuff → ε

Page 44: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Problems with Grammar Rewriting

• Original:[NP [NP the book] [PP on [NP [NP the table] [PP in [NP [NP the yard] [PP of [NP the house]]]]]]]

• Becomes:[NP the book [Stuff [PP on [NP the table [Stuff [PP in [NP the yard [Stuff [PP of [NP the house [Stuff]]] [Stuff]]]] [Stuff]]]] [Stuff]]]

Page 45: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Depth Bound

Set an arbitrary bound Set an analytically derived

bound Run tests and derive reasonable

bound empiricallyUse iterative deepening

Page 46: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Ambiguity

Local Ambiguity– Leads to hypotheses that are locally reasonable

but eventually lead nowhere – “Book that flight”

Global Ambiguity– Leads to multiple parses for the same input– “I shot an elephant in my pajamas”

Page 47: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

More Ambiguity Examples

Multiple legal structures– Attachment (e.g. I saw a man on a hill with

telescope)– Coordination (e.g. younger cats and dogs)– NP bracketing (e.g. Spanish language

teachers)

Page 48: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Two Parse Trees for Ambiguous Sentence

Page 49: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

More Ambiguity: ‘Can you book TWA flights?’

Page 50: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

A Correct Parse for ‘Show me the meal on Flight UA 386 from San Francisco to Denver’

Page 51: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Inefficient Re-Parsing of Subtrees

Page 52: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Invariants

Despite ambiguity, there are invariants

Sub-components of final parse tree are re-analyzed unnecessarily

Except for top-most component, every part of final tree is derived more than once.

Page 53: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

What’s the solution?

Key to efficient processing is reuseFill table with solutions to sub-problems for later

use.We want an algorithm that:

– Does not do repeated work

– Does top-down search with bottom-up filtering

– Solves left-recursion problem

– Solves an exponential problem

Page 54: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Dynamic Programming

Create table of solutions to sub-problems (e.g. subtrees) as parse proceeds

Look up subtrees for each constituent rather than re-parsing

Since all parses implicitly stored, all available for later disambiguation

Examples: Cocke-Younger-Kasami (CYK) (1960), Graham-Harrison-Ruzzo (GHR) (1980) and Earley (1970) algorithms

Page 55: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Earley Algorithm

Uses dynamic programming to do parallel top-down search in (worst case) O(N3) time

First, left-to-right pass fills out a chart with N+1 states– Think of chart entries as sitting between words in

the input string keeping track of states of the parse at these positions

– For each word position, chart contains set of states representing all partial parse trees generated to date. E.g. chart[0] contains all partial parse trees generated at the beginning of the sentence

Page 56: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Chart Entries

predicted constituents

in-progress constituents

completed constituents

Represent three types of constituents:

Page 57: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Progress in parse represented by Dotted Rules

Position of • indicates type of constituent

0 Book 1 that 2 flight 3

• S → • VP, [0,0] (predicted)

• NP → Det • Nom, [1,2] (in progress)

• VP →V NP •, [0,3] (completed)

[x,y] tells us what portion of the input is spanned so far by this rule

Each State si:<dotted rule>, [<back pointer>,<current position>]

Page 58: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

S → • VP, [0,0] – First 0 means S constituent begins at the start of input

– Second 0 means the dot here too

– So, this is a top-down prediction

NP → Det • Nom, [1,2]– the NP begins at position 1

– the dot is at position 2

– so, Det has been successfully parsed

– Nom predicted next

0 Book 1 that 2 flight 3

Page 59: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

0 Book 1 that 2 flight 3 (continued)

VP → V NP •, [0,3]– Successful VP parse of entire input

Page 60: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Successful Parse

Final answer found by looking at last entry in chart

If entry resembles S → • [nil,N] then input parsed successfully

Chart will also contain record of all possible parses of input string, given the grammar

Page 61: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Parsing Procedure for the Earley Algorithm Move through each set of states in order,

applying one of three operators to each state:– predictor: add predictions to the chart– scanner: read input and add corresponding state to

chart– completer: move dot to right when new constituent

found Results (new states) added to current or next set

of states in chart No backtracking and no states removed: keep

complete history of parse

Page 62: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

States and State Sets

Dotted Rule si represented as <dotted rule>, [<back pointer>, <current position>]

State Set Sj to be a collection of states si with the

same <current position>.

Page 63: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Earley Algorithm from Book

Page 64: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Earley Algorithm (simpler!)

1. Add Start → · S, [nil,0] to state set 0

2. Predict all states you can, adding new predictions to state set 0. Let i = 1.

3. Scan input word i—add all matched states to state set Si.

Add all new states produced by Complete to state set Si Add all new states produced by Predict to state set Si Unless i=n, (a) Let i = i + 1; (b) repeat step 3.

4. At the end, see if state set n contains Start → S ·, [nil,n]

Page 65: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

3 Main Sub-Routines of Earley Algorithm

Predictor: Adds predictions into the chart.Completer: Moves the dot to the right

when new constituents are found.Scanner: Reads the input words and enters

states representing those words into the chart.

Page 66: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Predictor

Intuition: create new state for top-down prediction of new phrase.

Applied when non part-of-speech non-terminals are to the right of a dot: S → • VP [0,0]

Adds new states to current chart– One new state for each expansion of the non-terminal

in the grammarVP → • V [0,0]VP → • V NP [0,0]

Formally: Sj: A → α · B β, [i,j] Sj: B → · γ, [j,j]

Page 67: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Scanner

Intuition: Create new states for rules matching part of speech of next word.

Applicable when part of speech is to the right of a dot: VP → • V NP [0,0] ‘Book…’

Looks at current word in input If match, adds state(s) to next chart

VP → V • NP [0,1]Formally:

Sj: A → α · B β, [i,j] Sj+1: A → α B ·β, [i,j+1]

Page 68: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Completer

Intuition: parser has finished a new phrase, so must find and advance states all that were waiting for this

Applied when dot has reached right end of ruleNP → Det Nom • [1,3]

Find all states w/dot at 1 and expecting an NPVP → V • NP [0,1]

Adds new (completed) state(s) to current chartVP → V NP • [0,3]

Formally: Sk: B → δ ·, [j,k] Sk: A → α B · β, [i,k], where: Sj: A → α · B β, [i,j].

Page 69: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Example: State Set S0 for Parsing “Book that flight” using Grammar G0

nil

Page 70: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Example: State Set S1 for Parsing “Book that flight”

VP→ V and VP → V NP are both passed to Scanner, which adds them to Chart[1], moving dots to right

Scanner

Scanner

Page 71: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Prediction of Next Rule

When VP → V is itself processed by the Completer, S → VP is added to Chart[1] since VP is a left corner of S

Last 2 rules in Chart[1] are added by Predictor when VP → V NP is processed

And so on….

Page 72: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Last Two States

Scanner

ScannerScanner

γ → S . [nil,3] Completer

Page 73: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

How do we retrieve the parses at the end?

Augment the Completer to add pointers to prior states it advances as a field in the current state– i.e. what state did we advance here?– Read the pointers back from the final state

Page 74: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Error Handling

What happens when we look at the contents of the last table column and don't find a S → rule?– Is it a total loss? No...– Chart contains every constituent and

combination of constituents possible for the input given the grammar

Also useful for partial parsing or shallow parsing used in information extraction

Page 75: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Earley’s Keys to Efficiency

Left-recursion, Ambiguity and repeated re-parsing of subtrees– Solution: dynamic programming

Combine top-down predictions with bottom-up look-ahead to avoid unnecessary expansions

Earley is still one of the most efficient parsersAll efficient parsers avoid re-computation in

a similar way.But Earley doesn’t require grammar

conversion!

Page 76: CMSC 723 / LING 645: Intro to Computational Linguistics November 10, 2004 Lecture 10 (Dorr): CFG’s (Finish Chapter 9) Parsing (Chapter 10) Prof. Bonnie

Next Time

Read Chapter 14.