2.1 language

Basic definitions

(1) alphabet a finite set of symbols. ex) T1 = { ㄱ , ㄴ , ㄷ ,..., ㅎ , ㅏ , ㅑ , … , ㅡ , ㅣ }

T2 = {A,B,C, … ,Z} T3 = {auto, break, case, … , while}

(2) string(or sentence, word) a sequence of symbols from some alphabet T.

(3) length the number of symbols in the string. denoted by |ω|

2.1 Language

꼭 기억해야 할 세 가지 개념

1. 언어의 정의

2. 문법의 정의 및 개념

3. 인식기의 의미

(4) empty string a string consisting of no symbols. denoted by ε or λ.

(5) T* denotes the set of all strings of symbols over the alphabet T, including the empty string.

T+ = T* - {ε}

T* : T star

T+ : T dagger (6) Language is any set of strings over an alphabet.(Text

p.40) (or A Language L over the alphabet T is a subset of T*

.)

L T⊆ *

More definitions

(1) concatenation

u = a1a2a3...an, v = b1b2b3...bm , u • v = a1a2a3...anb1b2b3...bm

u • v 를 보통 uv 로 표기 . uε= u = εu ∀u,v T∈ *, uv T∈ *. |uv| = |u| + |v|

(2) an represents n a's. a0 = ε

(3) the reversal of a string ω, denoted ωR is the string ω written in reverse order:

i.e., if ω = a1a2...an then ωR = anan-1...a1. 　

Text p. 44

(ωR)R=ω

(4) language product

LL' = {xy| x L and y L'}∈ ∈

(5) The powers of a language L are defined recursively by:

L0 = {ε}

Ln = LLn-1 for n 1.

(6) L* : reflexive transitive closure

= L0 L∪ 1 L∪ 2 ... L∪ ∪ n ∪… =

(7) L+ : transitive closure

= L1 L∪ 2 ∪... L∪ n ...∪

= L* - L0

Language 문장 (sentence) 들을 원소로 갖는 집합 언어를 어떻게 표현할 것인가 ?

Grammar terminal : 정의된 언어의 알파벳 nonterminal :

스트링을 생성하는 데 사용되는 중간 과정의 심볼 언어의 구조를 정의하는데 사용

grammar symbol (V)

2.2 Grammar

Text p. 46 G = (VN, VT, P, S)

VN : a finite set of nonterminal symbols

VT : a finite set of terminal symbols

VN ∩ VT = , VN V∪ T = V

P : a finite set of production rules α → β, α V∈ +, β V∈ * lhs rhs

S : start symbol(sentence symbol)

[ 예 ] G = ( {S, A}, {a, b}, P, S ) Text p.47 [ 예제 2.8]

P : S → aAS S → a

A → SbA A → ba A → SS

⇒ S → aAS | a

A → SbA | ba | SS

Derivation

1. ⇒ : "directly produce" or "directly derive"

if α → β P and ∈ , δ V∈ * then

αδ ⇒ βδ

2. ⇒ : Suppose α1,α2,...,αn V∈ * and α1 ⇒α2 ⇒ … ⇒αn,

then α1 ⇒ αn

(zero or more derivations)

3. ⇒ : one or more derivations.

cf) → : production rule 에서 사용 .

“may be replaced by”

⇒ : derivation 할 때 사용한다 .

Inroduction to FL theory

*

+

*

[8/25]

L(G) : Language generated by grammar G

L(G) = {ω | S ⇒ ω, ω V∈ T*}

☞ ω is a sentential form of G if S ⇒ ω and ω V∈ *.

ω is a sentence of G if S ⇒ ω and ω V∈ T*.

P : S → aA | bB | ε

A → bS

B → aS

S ⇒ abba 유도 과정

S ⇒ aA ( 생성규칙 S → aA) ⇒ abS ( 생성규칙 A → bS) ⇒ abbB ( 생성규칙 S → bB) ⇒ abbaS ( 생성규칙 B → aS) ⇒ abba ( 생성규칙 S → )

S ⇒ aA ( 생성규칙 S → aA) ⇒ abS ( 생성규칙 A → bS) ⇒ abbB ( 생성규칙 S → bB) ⇒ abbaS ( 생성규칙 B → aS) ⇒ abba ( 생성규칙 S → )

*

*

*

*

G1 = ( {S}, {a}, P, S ) 을 이용하여 L(G1)

P : S → a | aS

L (G1) = { an | n 1 }

Language design

Grammar Language

generation

design

Text p. 46

G = ( {A, B, C}, {a, b, c}, P, A)

P : A → abc A → aBbc

Bb → bB Bc → Cbcc

bC → Cb aC → aaB

aC → aa

L(G) = { anbncn | n 1 }

(===>) ex1) S → 0S1 | 01

ex2) S → aSb | c

ex3) A → aB

B → bB | b

ex4) A → abc A → aBbc Bb → bB Bc → Cbcc

bC → Cb aC → aaB

aC → aa

Grammar Design L = { an | n 0 } 일 때 문법 :

A → aA | ε L = { an | n 1 } 일 때 문법 :

A → aA | a Embedded production

A → aAb

ex1) L1 = { anbn | n 0 }

ex2) L2 = { 0i1j | i j, i,j 1 }

ex3) Constructs of Conventional PL

1) 파스칼 언어의 상수 정의 부분 :

상수정의 부분은 CONST 라는 예약어로 시작하며 하나의 상수

정의는 a=b 의 형태를 갖는다 . 여기서 , a 는 identifier 를 b 는

상 수 를 나 타 내 는 terminal 심 벌 이 다 . 상 수 정 의 부 분 은

선택적이며 각각의 상수정의는 ; 으로 구분한다 .

다음은 상수정의 부분의 예이다 .

CONST ON = TRUE;

OFF = FALSE;

EPSILON = 1.0E-10;

2) C 언어의 정수 선언 부분 :

정수선언 부분은 여러 개의 정수선언으로 구성되며 하나의 선언은 int a,a,a; 와 같은 형태를 갖는다 . 여기서 a 는 임의의 identifier 를 나타낸다 .

그리고 ; 으로 각각의 선언을 구분한다 . 예를 들어 , int i,j; int

sum; 과 같다 .

※ 문법을 고안할 때 , nonterminal 의 이름은 구문 구조를 대변할 수 있는 명칭으로 쓰는 것이 바람직하다 .

In order to prove that a grammar generates a language L

i) Every sentence generated by the grammar is in L. ii) Every string in L can be generated by the grammar.

교과서 55 쪽[ 예제 2.16]

proof) (=>) Every sentence derivable from S is balanced. (<=) Every balanced string is derivable from S.

G = ( { S }, { ( , ) }, {S → (S)S |ε}, S )

⇔ All strings of balanced parentheses.

(=>) Every sentence derivable from S is balanced. (i.e., S ⇒ ω, ω: balanced) By induction on the number of steps in a derivation.

i) n = 1 일 때 , S ⇒ ε, ε is surely balanced.

ii) Suppose that all derivations of fewer than n steps produce balanced sentences.

iii) Consider a leftmost derivation of exactly n steps. S ⇒ (S)S ⇒ (x)S ⇒ (x)y By the hypothesis x, y : balanced. Thus (x)y balanced.

* *

*

(<=) Every balanced string is derivable from S.

By induction on the length of a string.

i) |ω| = 0, S ε⇒

(the empty string is derivable from S.)

ii) Assume that every balanced string of length less than 2n is derived from S.

iii) Consider a balanced string ω of length 2n.

Let (x) : shortest prefix of ω being balanced.

Thus ω = (x)y, where x, y : balanced.

Since |x|, |y | < 2n, they are derivable from S by inductive hypothesis.

Thus S (S)S (x)S (x)y = ω⇒ ⇒ ⇒

Therefore, (x)y is also derivable from S.* *

Noam Chomsky According to the form of the productions. α → β P∈

Type 0 : No restrictions(unrestricted grammar) Type 1 : Context-sensitive grammar(CSG).

→ β, | | | β| Type 2 : Context-free grammar(CFG).

A → , where A : nonterminal, V∈ *. Type 3 : Regular grammar(RG).

A → tB or A → t, (right-linear)

A → Bt or A → t, (left-linear)

where, A, B : nonterminal, t V∈ T*.

2.3 Chomsky Hierarchy

REL (Recursively Enumerable Language) CSL (Context Sensitive Language) CFL (Context Free Language) RL (Regular Language)

Examples of Formal Language simple matching language : Lm = {anbn | n ≥ 0} CFL

double matching language : Ldm = {anbncn | n ≥ 1} CSL

mirror image language : Lmi = {ωωR | ω V∈ T*} CFL

palindrome language : Lr = {ω | ω = ωR } CFL

parenthesis language : Lp = {ω | ω: balanced parenthesis} CFL

The Chomsky Hierarchy of Languages

unrestricted language

context-sensitive language

context-free language

regular language

Languages & Recognizers

Grammar Language Recognizer

type 0(unrestricted)

type 1(context-sensitive)

type 2(context-free)

type 3(regular)

recursively enumerable set

context-sensitive language

context-free language

regular language

Turing Machine

Linear Bounded Automata

Pushdown Automata

Finite Automata

2.1 language

Documents