bottom-up syntax analysis mooly sagiv html://msagiv/courses/wcc02.html textbook:modern compiler...

30
Bottom-Up Syntax Analysis Mooly Sagiv html://www.math.tau.ac.il/~msagiv/ courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

Post on 21-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

Bottom-Up Syntax Analysis

Mooly Sagiv

html://www.math.tau.ac.il/~msagiv/courses/wcc02.htmlTextbook:Modern Compiler Implementation in C

Chapter 3

IBM
Fix Tree!!!
IBM
$ as bottom-stack
Page 2: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

Efficient Parsers • Pushdown automata

• Deterministic

• Report an error as soon as the input is not a prefix of a valid program

• Not usable for all context free grammars

bison

context free grammar

tokens parser

“Ambiguity errors”

parse tree

Page 3: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

Kinds of Parsers

• Top-Down (Predictive Parsing) LL– Construct parse tree in a top-down matter– Find the leftmost derivation– For every non-terminal and token predict the next

production

• Bottom-Up LR– Construct parse tree in a bottom-up manner– Find the rightmost derivation in a reverse order– For every potential right hand side and token decide when a

production is found

Page 4: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

Bottom-Up Syntax Analysis

• Input– A context free grammar– A stream of tokens

• Output– A syntax tree or error

• Method– Construct parse tree in a bottom-up manner– Find the rightmost derivation in (reversed order)– For every potential right hand side and token decide when a

production is found– Report an error as soon as the input is not a prefix of valid

program

Page 5: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

Pushdown Automaton

control

parser-table

input

stack

$

$u t w

V

Page 6: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

Bottom-Up Parser Actions

• reduce A – Pop | | symbols and | | states from the stack– Apply the associated action– Push a symbol goto[top, A] on the stack

• shift X – Push X onto the stack – Advance the input

• accept– Parsing is complete

• error – Report an error

Page 7: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

A Parser Table for S a S b|

state description a b $ S

0 initial s1 r S 4

1 parsed a s1 r S 2

2 parsed S s3

3 parsed a S b r S a S b r S a S b

4 final acc

Page 8: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

state a b $ S

0 s1 r S 4

1 s1 r S 2

2 s3

3 r S a S b r S a S b

4 acc

stack input action

0($) aabb$ s1

0($) 1(a) abb$ s1

0($) 1(a)1(a) bb$ r S

0($) 1(a)1(a)2(S) bb$ s3

0($) 1(a)1(a)2(S)3(b) b$ r S a S b

0($) 1(a)2(S) b$ s3

0 ($)1(a)2(S)3(b) $ r S a S b

0($) 4(S) $ acc

Page 9: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

state a b $ S

0 s1 r S 4

1 s1 r S 2

2 s3

3 r S a S b r S a S b

4 acc

stack input action

0($) abb$ s1

0($) 1(a) bb$ r S

0($) 1(a)2(S) bb$ s3

0($) 1(a)2(S)3(b) b$ r S a S b

0($) 4(S) b$ err

Page 10: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

A Parser Table for S AB | A

A aB a

state description a $ A B S

0 initial s1 2 5

1 parsed a from A r A a r A a

2 parsed A s3 r S A 4

3 parsed a from B r B a

4 parsed AB r S AB

5 final acc

Page 11: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

stack input action

0($) aa$ s1

0 ($)1(a) a$ r A a

0 ($)2(A) a$ s3

0 ($)2(A)3(a) $ r B a

0 ($)2(A)4(B) $ r S AB

0($)5(S) $ acc

state a $ A B S

0 s1 2 5

1 r A a r A a

2 s3 r S A 4

3 r B a

4 r S AB

5 acc

Page 12: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

A Parser Table for E E + E| id

state description id + $ E

0 initial s1 2

1 parsed id r E id r E id

2 parsed E s3 acc

3 parsed E+ s1 4

4 parsed E+E s3

r E E + Er E E+E

Page 13: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

The Challenge

• How to construct a parser-table from a given grammar• LR(1) grammars

– Left to right scanning– Rightmost derivations (reverse)– 1 token

• Different solutions– Operator precedence– SLR(1)

• Simple LR(1)– CLR(1)

• Canonic LR(1)– LALR(1)

• Look Ahead LR(1)• Yacc, Bison, CUP

Page 14: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

Grammar HierarchyNon-ambiguous CFG

CLR(1)

LALR(1)

SLR(1)

LL(1)

Page 15: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

Constructing an SLR parsing table

• Add a production S’ S$• Construct a finite automaton accepting “valid

stack symbols”– The states of the automaton becomes the states of

parsing-table

– Determine shift operations

– Determine goto operations

• Construct reduce entries by analyzing the grammar

Page 16: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

A finite Automaton for S’ S$

S a S b|

state description a b $ S

0 initial s1 4

1 parsed a s1 2

2 parsed S s3

3 parsed a S b

4 final

0 1 2 3

4

a bS

S

a

Page 17: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

Constructing a Finite Automaton

• NFA– For X X1 X2 … Xn

• [X X1 X2 …XiXi+1 … Xn]– “prefixes of rhs (handles)”– X1 X2 … Xi is at the top of the stack and we expect

Xi+1 … Xn

• The initial state [S’ .S$] ([X X1…XiXi+1 … Xn], Xi+1 ) = [X X1 …XiXi+1 … Xn]

– For every production Xi+1 ([[X X1 X2 …XiXi+1 … Xn], ) = [Xi+1 ]

• Convert into DFA

Page 18: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

NFA S’ S$

S a S b|

[S’ .S$]

[S’ S.$]

[S .aSb]

[S .]

[S a.Sb]

[S aSb.]

[S aS.b]S

Sb

a

Page 19: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

[S’ .S$]

[S’ S.$]

[S .aSb]

[S .]

[S a.Sb]

[S aSb.]

[S aS.b]S

Sb

a

[S’ .S$][S .aSb]

[S .]

[S a.Sb][S .aSb]

[S .]

a

[S’ S.$]

S

[S aS.b]S

[S aSb.]

b

DFA

a

Page 20: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

state items a b $ S

0 [S .S$, S .aSb, S .] s1 4

1 [S a.Sb$, S .aSb, S .] s1 2

2 [S aS.b] s3

3 [S aSb.]

4 [S’ S.$] acc

[S’ .S$][S .aSb]

[S .]

[S a.Sb][S .aSb]

[S .]

a

[S’ S.$]

S

[S aS.b]S

[S aSb.]

b

a

Page 21: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

Filling reduce entries

• For an item [A .] we need to know the tokens that can follow A in a derivation from S’

• Follow(A) = {t | S’ * At}

• See the textbook for an algorithm for constructing Follow from a given grammar

Page 22: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

state items a b $ S

0 [S .S$, S .aSb, S .] s1 4

1 [S a.Sb$, S .aSb, S .] s1 2

2 [S aS.b] s3

3 [S aSb.]

4 [S’ S.$] acc

[S’ .S$][S .aSb]

[S .]

[S a.Sb][S .aSb]

[S .]

a

[S’ S.$]

S

[S aS.b]S

[S aSb.]

b

a

Follow(S) = {b, $}

r S r S

r S r S

r S a S b

Page 23: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

Example E E + E| E* E | id

[S’ .E$][E .E + E][E .E *E]

[E .id]

id

[E id.]

E [S’ E.$][E E. + E][E E. *E]

[E E +. E][E .E +E][E .E*E][E .id]

+

[E E *. E][E .E +E][E .E*E][E .id]

*

id

id

E E[E E + E.][E E .+E][E E .*E]

[E E *E.][E E .+E][E E .*E]

EE

1 2

3

4

5

67

+

*

*

+

Page 24: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

Interesting Non SLR(1) Grammar

S’ S$ S L = R | R

L *R | id

R L

[S’ .S$][S .L=R]

[S .R][L .*R][L .id][R L]

[S L.=R][R L.]L

=

Partial DFA

[S L=.R][R .L][L .*R][L .id]

Follow(R)= {$, =}

Page 25: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

LR(1) Parser

• Item [A ., t] is at the top of the stack and we are expecting

t

• LR(1) State– Sets of items

• LALR(1) State– Merge items with the same look-ahead

Page 26: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

Interesting Non LR(1) Grammars

• Ambiguous

– Arithmetic expressions

– Dangling-else

• Common derived prefix

– A B1 a b | B2 a c

– B1

– B2

• Optional non-terminals– St OptLab Ass

– OptLab id : | – Ass id := Exp

Page 27: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

A motivating example• Create a desk calculator• Challenges

– Non trivial syntax

– Recursive expressions (semantics)• Operator precedence

Page 28: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

Solution (lexical analysis)/* desk.l */

%%[0-9]+ { yylval = atoi(yytext);

return NUM ;}“+” { return PLUS ;}“-” { return MINUS ;}“/” { return DIV ;}“*” { return MUL ;} “(“ { return LPAR ;}“)” { return RPAR ;}“//”.*[\n] ; /* comment */[\t\n ] ; /* whitespace */. { error (“illegal symbol”, yytext[0]); }

Page 29: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

Solution (syntax analysis)/* desk.y */

%token NUM%left PLUS, MINUS%left MUL, DIV%token LPAR, RPAR%start P%%P : E {printf(“%d\n”, $1) ;} ;E : NUM { $$ = $1 ;} | LPAR e RPAR { $$ = $2; } | e PLUS e { $$ = $1 + $3 ; } | e MINUS e { $$ = $1 - $3 ; } | e MUL e { $$ = $1 * $3; } | e DIV e {$$ = $1 / $3; } ;%%#include “lex.yy.c”

flex desk.l

bison desk.y

cc y.tab.c –ll -ly

Page 30: Bottom-Up Syntax Analysis Mooly Sagiv html://msagiv/courses/wcc02.html Textbook:Modern Compiler Implementation in C Chapter 3

Summary• LR is a powerful technique• Generates efficient parsers• Generation tools exit

– Bison, yacc, CUP

• But some grammars need to be tuned– Shift/Reduce conflicts– Reduce/Reduce conflicts– Efficiency of the generated parser