regular expressions - cornell university€¦ · regular expressions cs 2800: discrete structures,...

Regular Expressions

CS 2800: Discrete Structures, Fall 2014

Sid Chaudhuri

Regular expressions (“regex”-es) are defned inductively

(start with simple base expressions, construct more complicated ones recursively)

Empty set

L(∅) = ∅

Empty string

L(ε) = {ε}

Literal character

L(x) = {x}

e.g. L(1) = {1}, L(2) = {2}, L(a) = {a}

Concatenation

L(AB) = {ab | a ∈ L(A), b ∈ L(B)}

e.g. L(12) = {12}, L(aabb) = {aabb}

Concatenation

L(AB) = {ab | a ∈ L(A), b ∈ L(B)}

e.g. L(12) = {12}, L(aabb) = {aabb},L(aε) = ?

Concatenation

L(AB) = {ab | a ∈ L(A), b ∈ L(B)}

e.g. L(12) = {12}, L(aabb) = {aabb},L(aε) = {a}

Concatenation

L(AB) = {ab | a ∈ L(A), b ∈ L(B)}

e.g. L(12) = {12}, L(aabb) = {aabb},L(aε) = {a}, L(a∅) = ?

Concatenation

L(AB) = {ab | a ∈ L(A), b ∈ L(B)}

e.g. L(12) = {12}, L(aabb) = {aabb},L(aε) = {a}, L(a∅) = ∅

Alternation

L(A|B) = L(A) ∪ L(B)

e.g. L(1|2) = {1, 2}, L(aa|bb) = {aa, bb}

Alternation

L(A|B) = L(A) ∪ L(B)

e.g. L(1|2) = {1, 2}, L(aa|bb) = {aa, bb},L(a|ε) = ?

Alternation

L(A|B) = L(A) ∪ L(B)

e.g. L(1|2) = {1, 2}, L(aa|bb) = {aa, bb},L(a|ε) = {a, ε}

Alternation

L(A|B) = L(A) ∪ L(B)

e.g. L(1|2) = {1, 2}, L(aa|bb) = {aa, bb},L(a|ε) = {a, ε}, L(a|∅) = ?

Alternation

L(A|B) = L(A) ∪ L(B)

e.g. L(1|2) = {1, 2}, L(aa|bb) = {aa, bb},L(a|ε) = {a, ε}, L(a|∅) = {a}

Kleene star

L(A*) ={ε} ∪ { x

1x

2...x

n | n ∈ N, x

i ∈ L(A) }

e.g. L(a*) = {ε, a, aa, aaa, aaaa, ...}

Kleene star

L(A*) ={ε} ∪ { x

1x

2...x

n | n ∈ N, x

i ∈ L(A) }

e.g. L(a*) = {ε, a, aa, aaa, aaaa, ...},L((ab)*) = ?

Kleene star

L(A*) ={ε} ∪ { x

1x

2...x

n | n ∈ N, x

i ∈ L(A) }

e.g. L(a*) = {ε, a, aa, aaa, aaaa, ...},L((ab)*) = {ε, ab, abab, ababab, ...}

Kleene star

L(A*) ={ε} ∪ { x

1x

2...x

n | n ∈ N, x

i ∈ L(A) }

e.g. L(a*) = {ε, a, aa, aaa, aaaa, ...},L((ab)*) = {ε, ab, abab, ababab, ...},

L((a|b)*) = ?

Kleene star

L(A*) ={ε} ∪ { x

1x

2...x

n | n ∈ N, x

i ∈ L(A) }


L((a|b)*) = {ε, a, b, aa, ab, ba, bb, aaa, aab, aba, ...}

Kleene star

L(A*) ={ε} ∪ { x

1x

2...x

n | n ∈ N, x

i ∈ L(A) }


L((a|b)*) = {ε, a, b, aa, ab, ba, bb, aaa, aab, aba, ...},L(ε*) = ?

Kleene star

L(A*) ={ε} ∪ { x

1x

2...x

n | n ∈ N, x

i ∈ L(A) }


L((a|b)*) = {ε, a, b, aa, ab, ba, bb, aaa, aab, aba, ...},L(ε*) = {ε}

Kleene star

L(A*) ={ε} ∪ { x

1x

2...x

n | n ∈ N, x

i ∈ L(A) }


L((a|b)*) = {ε, a, b, aa, ab, ba, bb, aaa, aab, aba, ...},L(ε*) = {ε}, L(∅*) = ?

Kleene star

L(A*) ={ε} ∪ { x

1x

2...x

n | n ∈ N, x

i ∈ L(A) }


L((a|b)*) = {ε, a, b, aa, ab, ba, bb, aaa, aab, aba, ...},L(ε*) = {ε}, L(∅*) = {ε}

Playing with regexes

● http://regex101.com/

● http://rubular.com/

● http://www.google.com/search?q=online+regex+tester

http://regex101.com/

http://rubular.com/

http://www.google.com/search?q=online+regex+tester

Kleene's Theorem

● A language is recognized by a regular expression (that is, it is a “regular language”) if and only if it is recognized by a fnite automaton

Kleene's Theorem

● A language is recognized by a regular expression (that is, it is a “regular language”) if and only if it is recognized by a fnite automaton– Regex has FA

● Relatively simple construction

Kleene's Theorem

● A language is recognized by a regular expression (that is, it is a “regular language”) if and only if it is recognized by a fnite automaton– Regex has FA

● Relatively simple construction

– FA has regex● Tricky to prove

Regex → FA

● For every regular expression, there is a fnite automaton that recognizes the same language

Regex → FA


● We will construct an ε-NFA

Regex → FA


● We will construct an ε-NFA– … which can be converted to an NFA

Regex → FA


● We will construct an ε-NFA– … which can be converted to an NFA– … which can be converted to a DFA

Recap: NFAs with epsilon transitions

● Just like ordinary NFAs, but...– Can “instantaneously” change state without reading an input symbol

– Valid transitions of this type are shown by arcs labeled 'ε'

– Note that ε does not suddenly become a member of the alphabet. Instead, we assume ε does not belong to any alphabet – it's a special symbol.

q0

q2

q1

a ε

εb

Why ε-NFAs?● Suitable for representing “or” relations● E.g. L = { an | n ∈ N is divisible by 2 or 3 }

● … but they're equivalent to NFAs and DFAs

p

r0

q0

q1

r1 r

2

ε

ε

a

aa a

a

regular expressions - cornell university€¦ · regular expressions cs 2800: discrete structures,...

Documents