cyk ) cocke-younger-kasami) parsing algorithm

32
CYK ) - - Cocke Younger Kasami) Parsing Algorithm ر عط م ن سی ح مد ح م د ی س ی ع ی ب ط ان ب ز" ش رداز پ ر ی ب ک ر می ی ا. عت ی ص گاه" ش ن دا ر. پ و ی9 مپ ی کا س د ی ه م کده" س ن دا

Upload: mervin

Post on 14-Jan-2016

108 views

Category:

Documents


0 download

DESCRIPTION

دانشگاه صنعتی امیر کبیر دانشکده مهندسی کامپیوتر. CYK ) Cocke-Younger-Kasami) Parsing Algorithm. سید محمد حسین معطر پردازش زبان طبیعی. Parsing Algorithms. CFGs are basis for describing (syntactic) structure of NL sentences Thus - Parsing Algorithms are core of NL analysis systems - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK )Cocke-Younger-Kasami) Parsing Algorithm

معطر حسین محمد سیدطبیعی زبان پردازش

امیر صنعتی دانشگاهکبیر

مهندسی دانشکدهکامپیوتر

Page 2: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

Parsing Algorithms

CFGs are basis for describing (syntactic) structure of NL sentences

Thus - Parsing Algorithms are core of NL analysis systems Recognition vs. Parsing:

– Recognition - deciding the membership in the language:– Parsing – Recognition+ producing a parse tree for it

Parsing is more “difficult” than recognition? (time complexity)

Ambiguity - an input may have exponentially many parses

Page 3: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

Parsing Algorithms

Parsing General CFLs vs. Limited Forms Efficiency:

– Deterministic (LR) languages can be parsed in linear time– A number of parsing algorithms for general CFLs require O(n3)

time– Asymptotically best parsing algorithm for general CFLs requires

O(n2.37), but is not practical Utility - why parse general grammars and not just CNF?

– Grammar intended to reflect actual structure of language– Conversion to CNF completely destroys the parse structure

Page 4: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK )Cocke-Younger-Kasami)

One of the earliest recognition and parsing algorithms The standard version of CYK can only recognize

languages defined by context-free grammars in Chomsky Normal Form (CNF).

It is also possible to extend the CYK algorithm to handle some grammars which are not in CNF– Harder to understand

Based on a “dynamic programming” approach:– Build solutions compositionally from sub-solutions– Store sub-solutions and re-use them whenever necessary

Uses the grammar directly (no PDA is used) Recognition version: decide whether S == > w ?

Page 5: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm The CYK algorithm for the membership problem is as follows:

– Let the input string be a sequence of n letters a1 ... an. – Let the grammar contain r terminal and nonterminal symbols R1 ... Rr,

and let R1 be the start symbol. – Let P[n,n,r] be an array of booleans. Initialize all elements of P to false. – For each i = 1 to n

• For each unit production Rj -> ai, set P[i,1,j] = true. – For each i = 2 to n -- Length of span

• For each j = 1 to n-i+1 -- Start of span – For each k = 1 to i-1 -- Partition of span

» For each production RA -> RB RC » If P[j,k,B] and P[j+k,i-k,C] then set P[j,i,A] = true

– If P[1,n,1] is true • Then string is member of language • Else string is not member of language

Page 6: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Pseudocode

On input x = x1x2 … xn :for (i = 1 to n) //create middle diagonal for (each var. A)

if(Axi) add A to table[i-1][i]

for (d = 2 to n) // d’th diagonal for (i = 0 to n-d)

for (k = i+1 to i+d-1) for (each var. A) for(each var. B in table[i][k])

for(each var. C in table[k][k+d]) if(ABC) add A to table[i][k+d]

return Stable[0][n] ? ACCEPT : REJECT

Page 7: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm

this algorithm considers every possible consecutive subsequence of the sequence of letters and sets P[i,j,k] to be true if the sequence of letters starting from i of length j can be generated from Rk.

Once it has considered sequences of length 1, it goes on to sequences of length 2, and so on.

For subsequences of length 2 and greater, it considers every possible partition of the subsequence into two halves, and checks to see if there is some production P -> Q R such that Q matches the first half and R matches the second half. If so, it records P as matching the whole subsequence.

Once this process is completed, the sentence is recognized by the grammar if the subsequence containing the entire string is matched by the start symbol

Page 8: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

Q: Consider the grammar G given by

S | AB | XB

T AB | XB

X AT

A a

B b

1. Is x = aaabb in L(G )

2. Is x = aaabbb in L(G )

Page 9: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

The algorithm is “bottom-up” in that we start with bottom of derivation tree.

S | AB | XB

T AB | XB

X AT

A a

B b

a a a b b

Page 10: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

1) Write variables for all length 1 substrings

S | AB | XB

T AB | XB

X AT

A a

B b

a a a b b

A A A B B

Page 11: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

2) Write variables for all length 2 substrings

S | AB | XB

T AB | XB

X AT

A a

B b

a a a b b

A A A B B

TS,T

Page 12: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

3) Write variables for all length 3 substrings

S | AB | XB

T AB | XB

X ATA a

B b

a a a b b

A A A B B

T

X

S,T

Page 13: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

4) Write variables for all length 4 substrings

S | AB | XB

T AB | XBX AT

A a

B b

a a a b b

A A A B B

T

X

S,T

S,T

Page 14: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

5) Write variables for all length 5 substrings.

S | AB | XBT AB | XB

X ATA aB b

REJECT!

a a a b b

A A A B B

T

X

S,T

S,T

X

Page 15: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

Now look at aaabbb :

S | AB | XB

T AB | XB

X ATA a

B b

a a a b b b

Page 16: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

1) Write variables for all length 1 substrings.

S | AB | XB

T AB | XB

X AT

A a

B b

a a a b b

A A A B B

b

B

Page 17: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

2) Write variables for all length 2 substrings.

S | AB | XB

T AB | XB

X AT

A a

B b

a a a b b

A A A B B

S,T

b

B

Page 18: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

3) Write variables for all length 3 substrings.

S | AB | XB

T AB | XB

X ATA a

B b

a a a b b

A A A B B

TX

b

B

S,T

Page 19: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

4) Write variables for all length 4 substrings.

S | AB | XB

T AB | XBX ATA a

B b

a a a b b

A A A B B

TX

S,T

b

B

S,T

Page 20: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

5) Write variables for all length 5 substrings.

S | AB | XB

T AB | XB

X ATA a

B b

a a a b b

A A A B B

TX

S,T

b

B

X

S,T

Page 21: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

6) Write variables for all length 6 substrings.

S | AB | XBT AB | XBX ATA aB b

S is included soaaabbb accepted!

a a a b b

A A A B B

TX

S,T

b

B

XS,T

S,T

Page 22: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

Can also use a table for same purpose.

end at

start at

1: aaabbb

2: aaabbb

3: aaabbb

4: aaabbb

5: aaabbb

6: aaabbb

0:aaabbb

1:aaabbb

2:aaabbb

3:aaabbb

4:aaabbb

5:aaabbb

Page 23: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

1. Variables for length 1 substrings.

end at

start at

1: aaabbb

2: aaabbb

3: aaabbb

4: aaabbb

5: aaabbb

6: aaabbb

0:aaabbb A

1:aaabbb A

2:aaabbb A

3:aaabbb B

4:aaabbb B

5:aaabbb B

Page 24: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

2. Variables for length 2 substrings.

end at

start at

1: aaabbb

2: aaabbb

3: aaabbb

4: aaabbb

5: aaabbb

6: aaabbb

0:aaabbb A -

1:aaabbb A -

2:aaabbb A S,T

3:aaabbb B -

4:aaabbb B -

5:aaabbb B

Page 25: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

3. Variables for length 3 substrings.

end at

start at

1: aaabbb

2: aaabbb

3: aaabbb

4: aaabbb

5: aaabbb

6: aaabbb

0:aaabbb A - -

1:aaabbb A - X

2:aaabbb A S,T -

3:aaabbb B - -

4:aaabbb B -

5:aaabbb B

Page 26: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

4. Variables for length 4 substrings.

end at

start at

1: aaabbb

2: aaabbb

3: aaabbb

4: aaabbb

5: aaabbb

6: aaabbb

0:aaabbb A - - -

1:aaabbb A - X S,T

2:aaabbb A S,T - -

3:aaabbb B - -

4:aaabbb B -

5:aaabbb B

Page 27: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

5. Variables for length 5 substrings.

end at

start at

1: aaabbb

2: aaabbb

3: aaabbb

4: aaabbb

5: aaabbb

6: aaabbb

0:aaabbb A - - - X

1:aaabbb A - X S,T -

2:aaabbb A S,T - -

3:aaabbb B - -

4:aaabbb B -

5:aaabbb B

Page 28: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

CYK Algorithm for Deciding Context Free Languages

6. Variables for aaabbb. ACCEPTED!

end at

start at

1: aaabbb

2: aaabbb

3: aaabbb

4: aaabbb

5: aaabbb

6: aaabbb

0:aaabbb A - - - X S,T

1:aaabbb A - X S,T -

2:aaabbb A S,T - -

3:aaabbb B - -

4:aaabbb B -

5:aaabbb B

Page 29: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

Parsing results

We keep the results for every wij in a table.

Note that we only need to fill in entries up to the diagonal – the longest substring starting at i is of length n-i+1

Page 30: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

Constructing parse tree

we need to construct parse trees for string w: Idea:

– Keep back-pointers to the table entries that we combine

– At the end - reconstruct a parse from the back-pointers

This allows us to find all parse trees

Page 31: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

Ambiguity

Efficient Representation of Ambiguities Local Ambiguity Packing :

– a Local Ambiguity - multiple ways to derive the same substring from a non-terminal

– All possible ways to derive each non-terminal are stored together– When creating back-pointers, create a single back-pointer to the

“packed” representation Allows to efficiently represent a very large number of

ambiguities (even exponentially many) Unpacking - producing one or more of the packed parse

trees by following the back-pointers.

Page 32: CYK  ) Cocke-Younger-Kasami) Parsing Algorithm

References

Hopcroft and Ullman,“Intro. to Automata Theory, Lang. and Comp.”Section 6.3, pp. 139-141

“CYK algorithm ” , Wikipedia, the free encyclopedia

A representation by Zeph Grunschlag