meljun cortes automata theory 12
TRANSCRIPT
-
8/13/2019 MELJUN CORTES Automata Theory 12
1/22
CSC 3130: Automata theory and formal languages
Parsers for programming languag
MELJUN P. CORTES MBA MPA BSCS ACS
MELJUN CORTES
-
8/13/2019 MELJUN CORTES Automata Theory 12
2/22
CFG of the java programming languageIdentifier:
IDENTIFIER
QualifiedIdentifier:Identifier { . Identifier }
Literal:IntegerLiteralFloatingPointLiteralCharacterLiteralStringLiteralBooleanLiteral
NullLiteral
Expression:Expression1 [AssignmentOperator Expression1]]
AssignmentOperator:=+=-=*=/=&=|=
from http://java.sun.com/docs/books/jls/second_edition/html/syntax.doc.html#52996
-
8/13/2019 MELJUN CORTES Automata Theory 12
3/22
Parsing java programs
class Point2d {
/* The X and Y coordinates of the point--instance variables */private double x;private double y;private boolean debug; // A trick to help with debugging
public Point2d (double px, double py) { // Constructorx = px;y = py;
debug = false; // turn off debugging}
public Point2d () { // Default constructorthis (0.0, 0.0); // Invokes 2 parameter Point2D constructor
}// Note that a this() invocation must be the BEGINNING of// statement body of constructor
public Point2d (Point2d pt) { // Another consructorx = pt.getX();y = pt.getY();
}
}
Simple java program: about 1000 symbols
-
8/13/2019 MELJUN CORTES Automata Theory 12
4/22
Parsing algorithms
How long would it take to parse this?
Can we parse faster?
No! CYK is the fastest known general-purpose parsing algorithm
exhaustive algorithm about 1080 years(longer than life of universe)
CYK algorithm about 1 week!
-
8/13/2019 MELJUN CORTES Automata Theory 12
5/22
Another way of thinking
Scientist:Find an algorithmthatcan parse strings inany grammar
Engineer:Design yourgrammarso it has a very fastparsing algorithm
-
8/13/2019 MELJUN CORTES Automata Theory 12
6/22
An example
S Tc (1)
T TA (2) | A (3)
A aTb (4) | ab (5)
input: abaabbc
Stack Input
aab
A T Ta Taa Taab TaA TaT TaTb TA T TcS
abaabbcbaabbcaabbcaabbcaabbcabbcbbcbcbcbcccc
Action
shiftshiftreduce (5)reduce (3)shiftshiftshiftreduce (5)reduce (3)shiftreduce (4)reduce (2)shiftreduce (1) aa bb
A
a b
A
c
T T
T
A
S
-
8/13/2019 MELJUN CORTES Automata Theory 12
7/22
-
8/13/2019 MELJUN CORTES Automata Theory 12
8/22
Some terminology
S Tc (1)
T TA (2) | A (3)
A aTb (4) | ab (5)
input: abaabbc
Stack Input
aab
A T Ta Taa Taab TaA TaT TaTb TA T TcS
abaabbcbaabbcaabbcaabbcaabbcabbcbbcbcbcbcccc
Action
shift shift reduce (5)reduce (3)shiftshiftshiftreduce (5)reduce (3)shiftreduce (4)reduce (2)shiftreduce (1)
handle
valid items:aTb, ab
valid items: Ta, Tc, aTb
-
8/13/2019 MELJUN CORTES Automata Theory 12
9/22
Outline of LR(0) parsing algorithm
As the string is being read, it is pushed on astack
Algorithm keeps track of all valid items
Algorithm can perform two actions:no completeitem
is viable
shift reduce
there is one valid item,and it is complete
-
8/13/2019 MELJUN CORTES Automata Theory 12
10/22
Running the algorithm
Stack Input
S
S
SRS
R
a
aa
aabaAaAb
A
aabbabb
bb
bb
A Valid Items
A aAb A ab A aAb A ab A aAb A ab A aAb A ab A aAb A ab A ab A aAb A aAb
A aAb | ab A aAb aabb
-
8/13/2019 MELJUN CORTES Automata Theory 12
11/22
Running the algorithm
Stack Input
S
S
SRS
R
a
aa
aabaAaAb
A
aabbabb
bb
bb
A Valid Items
A aAb A ab A aAb A ab A aAb A ab A aAb A ab A aAb A ab A ab A aAb A aAb
A aAb | ab A aAb aabb
-
8/13/2019 MELJUN CORTES Automata Theory 12
12/22
-
8/13/2019 MELJUN CORTES Automata Theory 12
13/22
How to update viable items
Updating valid items on reduce b to B First, we backtrack to viable items before reduce
Then, we apply same rules as for shift B (as if B were
aterminal)
A a Bb A a Bbis updated to
A a X b disappears if X B
C d is added for every valid item A a Cb and production C d
-
8/13/2019 MELJUN CORTES Automata Theory 12
14/22
Viable item updates by NFA
States of NFA will be items (plus a start state q 0) For every item S a we have a transition
For every item A a X b we have a transition
For every item A a Cb and production C d
S a q 0
A a X b X
A a X b
C d A a Cb
-
8/13/2019 MELJUN CORTES Automata Theory 12
15/22
Example
A aAb | ab
A aAb A aAb A aAb
A aAb
A ab A ab A ab
q 0
a
a b
b
A
-
8/13/2019 MELJUN CORTES Automata Theory 12
16/22
Convert NFA to DFA
A aAb A ab
A aAb A ab
A aAb A ab
A aAb
A aAb
A ab
a
b
b Aa
1
2
3
4
5
states correspond to sets of valid itemstransitions are labeled by variables / terminals
die
-
8/13/2019 MELJUN CORTES Automata Theory 12
17/22
Attempt at parsing with DFA
Stack Input
S
S
SR
a
aa
aabaA
aabbabb
bb
bb
A DFA state
A aAb A ab A aAb A ab A aAb A ab A aAb A ab A aAb A ab A ab A aAb
A aAb | ab A aAb aabb
12
2
3?
-
8/13/2019 MELJUN CORTES Automata Theory 12
18/22
Remember the state in stack!
Stack Input
S
S
SRS
R
11a2
1a2a2
1a2a2b31a2A41a2A4b5
1A
aabbabb
bb
bb
A DFA state
A aAb A ab A aAb A ab A aAb A ab A aAb A ab A aAb A ab A ab A aAb A aAb
A aAb | ab A aAb aabb
12
2
345
-
8/13/2019 MELJUN CORTES Automata Theory 12
19/22
LR(0) grammars and deterministicPDAs
The parsing procedure can be implemented by adeterministic pushdown automaton
A PDA is deterministic if in every state there is atmost one possible transition for every input symbol and pop symbol, including
Example: PDA for w # w R
is deterministic, but PDAforww R is not
-
8/13/2019 MELJUN CORTES Automata Theory 12
20/22
LR(0) grammars and deterministicPDAs
Not every PDA can be made deterministic
Since PDAs are equivalent to CFLs, LR(0)parsing algorithm must fail for some CFLs!
When does LR(0) parsing algorithm fail?
-
8/13/2019 MELJUN CORTES Automata Theory 12
21/22
Outline of LR(0) parsing algorithm
Algorithm can perform two actions:
What if:
no completeitem
is valid
there is one valid item,and it is complete
shift (S) reduce (R)
some valid itemscomplete, some
not
more than one validcomplete item
S / R conflict R / R conflict
-
8/13/2019 MELJUN CORTES Automata Theory 12
22/22
context-free grammarsparse using CYK algorithm (slow)
LR() grammars
Hierarchy of context-free grammars
LR(1) grammars
LR(0) grammarsparse using LR(0) algorithm
javaperl
python
to be continued