meljun cortes automata theory 12

Upload: meljun-cortes-mbampa

Post on 04-Jun-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    1/22

    CSC 3130: Automata theory and formal languages

    Parsers for programming languag

    MELJUN P. CORTES MBA MPA BSCS ACS

    MELJUN CORTES

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    2/22

    CFG of the java programming languageIdentifier:

    IDENTIFIER

    QualifiedIdentifier:Identifier { . Identifier }

    Literal:IntegerLiteralFloatingPointLiteralCharacterLiteralStringLiteralBooleanLiteral

    NullLiteral

    Expression:Expression1 [AssignmentOperator Expression1]]

    AssignmentOperator:=+=-=*=/=&=|=

    from http://java.sun.com/docs/books/jls/second_edition/html/syntax.doc.html#52996

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    3/22

    Parsing java programs

    class Point2d {

    /* The X and Y coordinates of the point--instance variables */private double x;private double y;private boolean debug; // A trick to help with debugging

    public Point2d (double px, double py) { // Constructorx = px;y = py;

    debug = false; // turn off debugging}

    public Point2d () { // Default constructorthis (0.0, 0.0); // Invokes 2 parameter Point2D constructor

    }// Note that a this() invocation must be the BEGINNING of// statement body of constructor

    public Point2d (Point2d pt) { // Another consructorx = pt.getX();y = pt.getY();

    }

    }

    Simple java program: about 1000 symbols

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    4/22

    Parsing algorithms

    How long would it take to parse this?

    Can we parse faster?

    No! CYK is the fastest known general-purpose parsing algorithm

    exhaustive algorithm about 1080 years(longer than life of universe)

    CYK algorithm about 1 week!

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    5/22

    Another way of thinking

    Scientist:Find an algorithmthatcan parse strings inany grammar

    Engineer:Design yourgrammarso it has a very fastparsing algorithm

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    6/22

    An example

    S Tc (1)

    T TA (2) | A (3)

    A aTb (4) | ab (5)

    input: abaabbc

    Stack Input

    aab

    A T Ta Taa Taab TaA TaT TaTb TA T TcS

    abaabbcbaabbcaabbcaabbcaabbcabbcbbcbcbcbcccc

    Action

    shiftshiftreduce (5)reduce (3)shiftshiftshiftreduce (5)reduce (3)shiftreduce (4)reduce (2)shiftreduce (1) aa bb

    A

    a b

    A

    c

    T T

    T

    A

    S

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    7/22

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    8/22

    Some terminology

    S Tc (1)

    T TA (2) | A (3)

    A aTb (4) | ab (5)

    input: abaabbc

    Stack Input

    aab

    A T Ta Taa Taab TaA TaT TaTb TA T TcS

    abaabbcbaabbcaabbcaabbcaabbcabbcbbcbcbcbcccc

    Action

    shift shift reduce (5)reduce (3)shiftshiftshiftreduce (5)reduce (3)shiftreduce (4)reduce (2)shiftreduce (1)

    handle

    valid items:aTb, ab

    valid items: Ta, Tc, aTb

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    9/22

    Outline of LR(0) parsing algorithm

    As the string is being read, it is pushed on astack

    Algorithm keeps track of all valid items

    Algorithm can perform two actions:no completeitem

    is viable

    shift reduce

    there is one valid item,and it is complete

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    10/22

    Running the algorithm

    Stack Input

    S

    S

    SRS

    R

    a

    aa

    aabaAaAb

    A

    aabbabb

    bb

    bb

    A Valid Items

    A aAb A ab A aAb A ab A aAb A ab A aAb A ab A aAb A ab A ab A aAb A aAb

    A aAb | ab A aAb aabb

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    11/22

    Running the algorithm

    Stack Input

    S

    S

    SRS

    R

    a

    aa

    aabaAaAb

    A

    aabbabb

    bb

    bb

    A Valid Items

    A aAb A ab A aAb A ab A aAb A ab A aAb A ab A aAb A ab A ab A aAb A aAb

    A aAb | ab A aAb aabb

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    12/22

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    13/22

    How to update viable items

    Updating valid items on reduce b to B First, we backtrack to viable items before reduce

    Then, we apply same rules as for shift B (as if B were

    aterminal)

    A a Bb A a Bbis updated to

    A a X b disappears if X B

    C d is added for every valid item A a Cb and production C d

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    14/22

    Viable item updates by NFA

    States of NFA will be items (plus a start state q 0) For every item S a we have a transition

    For every item A a X b we have a transition

    For every item A a Cb and production C d

    S a q 0

    A a X b X

    A a X b

    C d A a Cb

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    15/22

    Example

    A aAb | ab

    A aAb A aAb A aAb

    A aAb

    A ab A ab A ab

    q 0

    a

    a b

    b

    A

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    16/22

    Convert NFA to DFA

    A aAb A ab

    A aAb A ab

    A aAb A ab

    A aAb

    A aAb

    A ab

    a

    b

    b Aa

    1

    2

    3

    4

    5

    states correspond to sets of valid itemstransitions are labeled by variables / terminals

    die

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    17/22

    Attempt at parsing with DFA

    Stack Input

    S

    S

    SR

    a

    aa

    aabaA

    aabbabb

    bb

    bb

    A DFA state

    A aAb A ab A aAb A ab A aAb A ab A aAb A ab A aAb A ab A ab A aAb

    A aAb | ab A aAb aabb

    12

    2

    3?

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    18/22

    Remember the state in stack!

    Stack Input

    S

    S

    SRS

    R

    11a2

    1a2a2

    1a2a2b31a2A41a2A4b5

    1A

    aabbabb

    bb

    bb

    A DFA state

    A aAb A ab A aAb A ab A aAb A ab A aAb A ab A aAb A ab A ab A aAb A aAb

    A aAb | ab A aAb aabb

    12

    2

    345

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    19/22

    LR(0) grammars and deterministicPDAs

    The parsing procedure can be implemented by adeterministic pushdown automaton

    A PDA is deterministic if in every state there is atmost one possible transition for every input symbol and pop symbol, including

    Example: PDA for w # w R

    is deterministic, but PDAforww R is not

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    20/22

    LR(0) grammars and deterministicPDAs

    Not every PDA can be made deterministic

    Since PDAs are equivalent to CFLs, LR(0)parsing algorithm must fail for some CFLs!

    When does LR(0) parsing algorithm fail?

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    21/22

    Outline of LR(0) parsing algorithm

    Algorithm can perform two actions:

    What if:

    no completeitem

    is valid

    there is one valid item,and it is complete

    shift (S) reduce (R)

    some valid itemscomplete, some

    not

    more than one validcomplete item

    S / R conflict R / R conflict

  • 8/13/2019 MELJUN CORTES Automata Theory 12

    22/22

    context-free grammarsparse using CYK algorithm (slow)

    LR() grammars

    Hierarchy of context-free grammars

    LR(1) grammars

    LR(0) grammarsparse using LR(0) algorithm

    javaperl

    python

    to be continued