cd previous qa 2010

FOURTH SEMESTER EXAMINATION-2010

COMPILER DESIGN

1. Answer the following questions: 2*10

a) What is Syntax directed translation scheme? What are the different forms of

intermediate code used in compilation process?

Ans- A syntax directed translation scheme:

Describes the order and timing of the attribute computation

Embeds semantic rules to the grammar

Each semantic rule can only use information computed by already executed

semantic rules

It is a convenient way of describing an L- attributed definition

The different forms of Intermediate codes used in the compilation process are:

Three address code

Quadruples

Triples

Indirect triple

b) What is dead code elimination?

Ans- A variable is said to be dead at a place in a program if the value content in the

variable at that place is not used anywhere in the program. One advantage of copy

propagation is that it often turns the copy statement into dead code. Removal of the dead

assignment makes no difference to the result or meaning of a program. The elimination of

the dead variable from the code optimizes the code.

c) What is reduce-reduce (R-R) conflict in LR parser?

Ans: A reduce-reduce conflict occurs when the parser has two or more handles at the same

time on the top of the stack. What ever choice the parser makes is just as likely to be wrong

as not.

d) Why LR parsing is preferred over other parsers?

Ans: A large class of grammars can be parsed using LR (K) parsers where ‗L‘ stands for

left to right scanning of the input, ‗R‘ stands for constructing a rightmost derivation in

reverse and ‗K‘ is the number of input symbols of the look ahead to make parsing

decisions. LR parsing has several advantages for which it is preferred over other parsers:

It can recognize virtually all programming language constructs for which grammars

can be written.

It is the most general non-back tracking shift-reduce parser method

It can be implemented efficiently as other shift-reduce methods.

It can parse any grammar that a predictive parser can parse.

It can detect syntax error as early as possible on a left to right scan of the input.

e) What do you mean by Run time storage allocation?

Ans:

When we define a variable:

int count;

the size of the memory storage is already known before the program is actually

run- count occupies one cell of type int.

It is possible to defer the memory allocation until the program is run. This is

called run time storage allocation or dynamic memory allocation

f) Eliminate Left recursion from the following grammar:

E-> aa/abba/Eb/EE

Ans: Eaa

Eabba

EEb

EEE

Elimination of left recursion can be done by inserting another symbol A‘ such that

AβA‘

A‘αA‘/ε

Elimination of left recursion:

EE‘

E‘bE‘/EE‘/ ε

EaaE‘/abbaE‘

g) Briefly describe one use of flow graphs in the compiler writing?

Ans:

A graph representation of three address statement called flow graph is useful

for understanding code generation algorithms even if the graph is not

explicitly constructed by a code generation algorithm.

A flow graph of a program can be extensively used as a vehicle to collect

information about the intermediate program.

Some register assignment algorithms use flow graphs to find the inner loop

where a program is expected to spend most of the time.

The nodes of the flow graph are basic blocks.

h) Explain the concept of Boot strapping in compiler design process?

Ans: Compilation of High Level Language is a long and a complex process. Thus writing a

compiler for HLL from the scratch is difficult and time inefficient. Auto reinforcing

technique called the design effort and to improve the quality.

Boot Strapping gives the concept of creating the compiler.

For creating any kind of compiler we need three languages:

i. Source language‘S‘: language for which we are creating

the compiler.

ii. Target language ‗T‘: language in which the object code

is generated.

iii. Implementation Language ‗I‘: Language using which

we will write the compiler.

i) What is back patching in the process of Intermediate code generation?

Ans: When we generate ‗go to‘ statements in three address code, we face the problem that we

may not know the labels that control must go to at the time the jump statements are generated.

We can overcome this problem by generating a series of branching statements with the targets

of the jumps temporarily left unspecified. Each such statement will be put on a list of ‗go to‘

statements whose labels will be filled in when the proper label can be determined. We can call

this subsequent filling in the labels as Back Patching.

To manipulate lists of labels we use three functions:

I. MAKE LIST (j): creates a new list containing only ‗j‘ an index into the array of

quadruples is being generated. MAKE LIST returns a pointer to the list it has made.

II. MERGE LIST (q1, q2): take the lists pointed to by q1 and q2, concatenates them into one

list and returns a pointer to the concatenated list.

III. BACKPATCH (q,j): insert ‗j‘ as the target label for each of the statements on the list

pointed by q.

j) Differentiate between Phase and a pass in compiler construction?

Ans: Conceptually a compiler operates in phases, each of which transforms the source program from

one representation to another. There are six phases in compiler construction.

Whereas Pass means a group of phases. Compilers are broken into several passes and each pass

of the compiler communicate with each other via a temporary file. The process of creating executable

code from a source code can involve several stages. This means when a source program is inputted to

the compiler it reads the source program, stores the value, variables, and functions etc in a temporary

file. This is done in one pass. Other passes of the compiler reads the data from the previous passes for

execution.

It depends on the designer regarding the number of passes to be created.

2) a. What is the role of intermediate code generation in overall compiler design?

Ans) Intermediate code generation is the phase in which in which the source program will be

converted into a compact form using one of the following three methods:

I. Triple address code

II. Post fix

III. Quadruple

The front end translates a source program into an intermediate representation. From

which the back end generates target code. Although a source program can be translated

directly into the target language. Some benefits of using machine independent

intermediate form are:

Intermediate code is closer to the target machine than the source language and

hence easier to generate code

It allows a variety of optimizations to be performed in a machine independent

way.

Typically intermediate code generation can be implemented via syntax directed

translation and thus can be folded into parsing by augmenting the code for the

parser.

b. Define operator precedence relation and operator precedence grammar. Construct

precedence function for the following precedence relation:

− * ∕ ↑ id $

− > < < < < >

* > > < < < >

∕ > > > < < >

↑ > > > < < >

id > > > > >

Ans) Operator precedence relation:

We can define operator precedence relation among terminals only.

It can be defined as follows:

If ‗a‘ and ‗b‘ are two terminals; then ―a<b‖ & ―b>a‖ implies that ‗a‘ is having less precedence

than ‗b‘.

If ―a>b‖ & ―b<a‖ then it implies that ‗a‘ is having more precedence than ‗b‘.

If a=b then both ‘a‘ & ‗b‘ are having equal precedence.

Operator precedence grammar:

A grammar is said to be operator precedence grammar if it satisfies the following conditions:

There should not be ‗€‘ in the right side of the production.

There should not be two consecutive non-terminals.

3.

a) Discuss the construction of LR parser. What are the various data structures used in LR

parser design? Discuss the construction of ACTION [] and GOTO [] table?

Ans) construction of LR parser:

An LR parser is a general non- back tracking shift reduce parser.

An LR parser is a parser for context free grammars that reads input from Left to right and produces a

Right most Derivation. The term LR (k) parser is also used; here the k refers to the number of

unconsumed ―look ahead‖ input symbols that are used in making parsing decisions. Usually k is 1 and

is often omitted. A context-free grammar is called LR (k) if there exists an LR (k) parser for it.

An LR parser is said to perform bottom up parsing because it attempts to deduce the top level

grammar productions by building up from the leaves.

The LR Parser consists of Input tape, Stack, Parsing program and a parsing table.

Construction of ACTION [] and GOTO [] table:

GOTO []:

GOTO contains only non terminals.

GOTO part will be characterized as:

GOTO [i/p] [grammar symbol]

ACTION []:

ACTION part contains the terminals given in the production rule.

ACTION table entry is done by shift, reduce, Accept and error operations.

b) Write the role of an error detector in compilation process? Discuss different errors in

lexical-phase?

Ans) The role of error detector is that when it encounters an error in any phase of the compiler it does

not halt the parsing process rather it continues with the parsing process. The role of the error detector

in the compilation process is:

Detect the errors

Handle and react.

Notify the calling module

Notify the user

Easy to program and maintain

The different errors in lexical phase are:

Character streams that do not match the token patterns.

Ill formed numeric literals and identifiers.

4.a) What is the necessity of optimization in compilation? Discuss the factors influencing

optimization?

Ans) The aim of code optimization is to rearrange the instructions given in a program so as to gain

the execution speed without changing the basic meaning or semantic of he source program. There

are two types of code optimization.

Machine independent optimizations can be performed independently of the target machine for

which the compiler is generating code; that is, the optimizations are not tied to the target machine‘s

specific language or platform. Examples of machine independent optimizations are: elimination of

loop invariant computation, induction variable elimination and elimination of common sub

expression.

Machine dependent optimization requires knowledge of the target machine. An

attempt to generate object code that will utilize the target machine‘s registers more efficiently is an

example of machine dependent code optimization.

The factors influencing code optimization are:

The machine

Architecture of the CPU i.e RISC or CISC

Number of functional unit(s)

Cache size

CPU register(s)

Sometimes, the time taken to undertake optimization in itself may be an issue.

Optimizing existing code usually does not add new features, and worse, it might add new bugs in

previously working code (as any change might). Because manually optimized code might sometimes

have less 'readability' than un optimized code, optimization might impact maintainability of it also.

Optimization comes at a price and it is important to be sure that the investment is worthwhile.

An automatic optimizer (or optimizing compiler, a program that performs code optimization) may

itself have to be optimized, either to further improve the efficiency of its target programs or else speed

up its own operation. A compilation performed with optimization 'turned on' usually takes longer,

although this is usually only a problem when programs are quite large.

In particular, for just-in-time compilers the performance of the run time compile component, executing

together with its target code, is the key to improving overall execution speed.

b) Explain the symbol table construct for the block structure programming language?

Ans) Scoping is one of the applications of the symbol table. There are two types of scopes:

Local

Global

Symbol table decides whether the particular symbol is local or global

For example:

Int X;

Main ( )

{

Int Y;

Add ( );

}

Add ( )

{

Int Z;

}

{ } indicates the block i.e the life span of the variable is limited to that block

For individual blocks symbol table is created

The global value is stored in the top block so that it can be accessed by all the blocks

Address of the nodes

Leaf nodes of every block stores the address of the next block.

Void main ( )

{

Int x;

C.out<< “enter x”;

c.in>> x;

{

Int y;

C.out<< “ enter y”;

c.in>> y;

{

Int z;

c.out<< “enter z”;

c.in>> z;

}

}

}

MAIN BLOCK

BLOCK 1

BLOCK 2

X

statement

x

X

Y

x , y

Z

5. Consider the following grammar:

E ( L) / a

L L, E / E

a) Construct DFA of LR ( 0) items for this grammar

b) Construct SLR (1) parsing table

c) Show the parsing stack and actions of an SLR (1) parser for the input string ( ( a ) , a , ( a,

a ))

d) Is this grammar a LR (0) grammar? If not describe the LR (0) conflict.

Ans)

a) Augmented grammar:

E‟ E

E (L)

E a

L L , E

L E

Item Set I0:

I 0: E‟ . E

E . (L)

E .a

L . L , E

L .E

In item set (0) the symbols to be processed are E, ( , a , L

PROCESS E

I1 = GOTO (I0, E)

I1: E‘E.

L E.

PROCESS (

I2 = GOTO (I0, ( )

I2: E( . L )

L.L, E

L .E

PROCESS a

I3 = GOTO (I0, a )

I3: Ea .

PROCESS L

I4 = GOTO (I0, L )

I4: LL . , E

In item set (1) no symbols to be processed.

In item set (2) symbols to be processed are L , E

PROCESS L

I5 = GOTO(I2,L)

I5: E( L . )

L L . , E

PROCESS E

GOTO ( I2, E )

Already processed in I 1

In item set (3) no symbols are to be processed

In item set (4) symbols to be processed is ,

PROCESS ,

I6 = GOTO (I4, ,)

I6: LL , . E

In item set (5) symbols to be processed are ) ,

PROCESS )

I7 = GOTO (I5, ) )

I7: E( L ) .

PROCESS ,

Already processed in I 6

In item set (6) symbols to be processed is E

PROCESS E

I8 = GOTO (I6, E )

I8: LL , E .



DFA

E

( L ,

a

,

L , E

I0 I1

11

I2 I5 I7

I3

I4 I6 I8

b)

ACTION GOTO

State ( ) , a $ E L

I0 2 3 1 4

I1

I2 5 4 1 5

I3

I4 6

I5 7 6

I6 8

I7

I8

ACTION

State ( ) , a $

I0 S2 S3

I1 R4 R4 R4 R4 Accepted

I2 S5 S4

I3 R2 R2 R2 R2 R2

I4 S6

I5 S7 S6

I6

I7 R1 R1 R1 R1 R1

I8 R3 R3 R3 R3 R3

c)

d) Yes the grammar is a LR (0 ) grammar

6

a) What is an activation record? Explain clearly the components of an activation record?

Ans) The information needed by a single information or single activation of a procedure is

managed using a contiguous block of storage called an ― Activation Record‖ or ― Activation

Frame‖ consisting of the collection of the fields.

The components of an activation record are:

The temporary values used during expression evaluation

Local data of a procedure

Saved machine states information( PC‘s, registers, return address)

Access links for access to non-local names

The actual parameters

The returned value used by called procedure to return a value of calling procedure.

Control link points to the activation record of the caller.

b) Construct DAG for the following sequence of statements

X=Y/Z

W=P*Y

Y=Y*Z

P=W-X

Ans)

P

Y

W X

P Y

* /

Z

-

*

7.

a) Consider the following context free grammar where S is the start symbol and the terminals

are „a‟, „(„ „)‟

S ( )

S a

S (A)

A S

AA, S

Show precisely why this grammar is not LL (1). Rewrite this grammar to make it

suitable for recursive descent parsing. 5

b) Discuss the importance of symbol table in compiler design. How is the symbol table

manipulated at various phase of compilation?

Ans) A Symbol table is a data structure used by a compiler to keep track of the scope, life and

binding information about names. These names are used to identify the various program elements

like variables, constants, procedures and the labels of statements. The symbol table is searched

every time a name is encountered in the source next. When a new name or new information about

an existing name is discovered the content of the symbol table changes.

Exactly what information is stored in the symbol table depends on many things. The programming

language will determine much of the information that is stored, but the target architecture will also

influence what data is stored. In fact some assumptions about how to produce code can affect what

values are stored in the table. Different information will need to be stored for constants, variables,

procedures, enumerations, type definitions and so on. What follows is a description of various

common declarative language constructs and typical classes of information symbol table would

record for those constructs.

CONSTANTS:

Constants are identifiers that represent a fixed value- one that can never be changed. Since

programmers will wish to access these values by name, the name must be stored. Finally, since the

values must be used properly in the type system, type information is also included. No run time

location needs to be stored for constants. These are typically stored right into the code stream by

the compiler at compilation time.

VARIABLES:

Variables are identifiers whose value may change between executions and during a single

execution of a program. They represent the contents of some memory location. The symbol table

needs to record the variable‘s name as well as its allocated storage space at runtime. Typically this

location is stored as an offset relative to some position.

TYPES (user defined):

A user defined type is typically a conglomeration of 1 or more existing types. Types are accessed

by name and reference a type definition structure. Each structure will record important information

about itself, like its size, the name of its members or its upper or lower bounds. What information

is stored will depend on what type is being defined.

SUBPROGRAMS:

Procedures, functions and methods are named segments of code. Naturally, the symbol table

should record a procedure‘s name. The type they return if any should be noted. When subprograms

are accessed at run time it is typically by their location in the code stream, thus the location of the

code generated for a given procedure should also be recorded. The formal parameters and local

variables of a function are separate identifiers in their own right, and should be stored in separate

records. Thus, they are treated much like the fields of a user defined record. They are stored as a

list of variable records separated, but accessible from, the main procedure record.

CLASSES:

Classes are abstract data types which restrict access to its members and provide convenient

language level polymorphism. They are really a special case of user defined types, and are

structurally no different. But it may be convenient to store information about classes above and

beyond that required for other user defined types. This includes the location of the default

constructor and destructor, and the address of the virtual function table.

INHERITANCE:

There may be different ways to perform inheritance, and a symbol table record is needed to keep

track of which classes are being inherited and exactly how inheritance is performed. A compiler

might consider whether shared or non – shared inheritance is performed. A compiler might

consider whether keywords public, private and protected modify the visibility of inherited items,

and may be recorded with the inheritance information. A reference to the participating classes

could also be recorded in an inheritance structure.

ARRAYS:

Arrays represent a collection of uniformly typed elements that may be randomly accessed by

index. For each dimension of an array, the compiler will need to know about such things as the

lower boundary of the array i.e the lowest valid index, the upper boundary i.e the largest valid

index, the index size, index type, the total size and the type of the elements contained. When many

different types can be used to index an array, the index type and the size will also be recorded.

Finally, the total amount of space to be allocated for each dimension of an array should be stored.

RECORDS:

Records represent a collection of possibly heterogeneous members which can be accessed by

name. The symbol table probably needs to record each of the record‘s members. The compiler

also needs to know the size of the record how much space to allocate for all the members. Each of

the fields of the record will probably be a reference to another symbol table record like a variable

or a type which may in turn reference another record or array.

CLASS:

Just like a record, the fields of a class can be conveniently stored in a separate record. Classes will

also store their methods, constructors, destructors and virtual function table in this complex

information structure.

MODULE:

Stores the module size, its name, parent, its members and a time stamp. The time stamp is used to

guarantee that load time that the models have been compiled in the correct order or are all up to

date.

8.

a) Find the FIRST and FOLLOW sets for each of the non-terminals in the following

grammar (in the grammar below € denotes epsilon, the empty string).

AaBa

BbCb/bcD

CcCc/€

DDeb/€

b) Differentiate between syntax directed definition and syntax directed translation scheme?

Ans)

Syntax directed definition generalizes a context free grammar by associating a set of attribute with

each node in a parse tree. Each attribute gives some information about the node i.e. Syntax

directed definition is a generalization of a CFG in which each grammar symbol has an associated

set of attributes partitioned into two subsets called the synthesized & inherited attributes of that

grammar symbol.

The value of an attribute at a parse tree node is defined by the semantic rule associated

with the production used at that node. Semantic rules set up dependencies between attributes that

will be represented by a graph.

Syntax directed translation schemes indicate the order in which semantic rules

are to be evaluated, they allow some implementation details to be shown. Syntax directed

translation is a method of translating a string into a sequence of actions by attaching one such

action to each rule of a grammar. Thus, parsing a string of the grammar produces a sequence of

rule applications and syntax directed translation provides a simple way to attach semantics to any

such syntax.

Syntax directed translation refers to a method of compiler implementation where the

source language translation is completely driven by the parser. The parsing process and parse trees

are used to direct semantic analysis and translation of the source program. This can be a separate

phase of the compiler or we can augment our conventional grammar with information to control

the semantic analysis and translations. Such grammars are called attributed grammars.

c) Explain, why it is possible to design an independent lexical analyzer?

Ans)

FOURTH SEMESTER EXAMINATION-2011

COMPILER DESIGN

1. Answer the following: 2* 10

a. Explain why is it possible to design an independent lexical analyzer?

b. Define and differentiate between compile time error and runtime error?

c. Explain the machine dependent and machine independent code optimization?

Ans: code optimization is of two types:

Machine dependent code optimization:

In this code optimization we require the knowledge of the target machine architecture i.e. the

register, addressing mode, clock speed etc.

Machine Independent code optimization:

This optimization can be performed independent of target machine. These are the program

transformations that improve the target code without taking into consideration of any properties of

the target machine.

d. Explain the difference between Bottom-up and Top-down parsing?

Ans: Bottom-up parsing is a process of reducing an input string say ‘W‘ to the start symbol of the

grammar by tracing out the right most derivation (RMD) of ‗W‘ in reverse order.

Bottom-up parsing involves the selection of a substring that matches the right side of the

production, whose reduction to the non-terminal on the left side of the production represents one

step along the reverse of a right most derivation.

Basically top-down parsing attempts to find the left most derivations for the input string

‘W‘, since string ‗W‘ can be scanned by the parser left to right, one symbol/token at a time and the

left most derivations generates the leaves of the parse tree in the left to right order, which matches

the input scan order.

e. What are the drawbacks of SLR (1) parser?

Ans: In the SLR parsing table if there are multiple entries, so it is possible that our parser will be

in an indeterministic situation which is not allowed.

So it becomes clear that SLR is less powerful LR parser since SLR (1) grammars constitute a

small subset of context free grammars.

f. What do you means by porting of a compiler?

Ans: Porting is a process of moving the code from one platform to another while making sure that

it works on the target platform also.

High level languages are designed to be portable that is the programs written in a high level

language can be run on any computer that has a compiler or interpreter for those particular

languages.

Porting of compiler means that the compiler must be modular, supporting separate compilation.

g. Describe the structure of LL parser?

Ans:

S

X

Y

Z

$

X+Y$

The main constituent of a LL parser is it uses a Stack which consists of the grammar symbols and an

input buffer that contains the input string.

h. Describe the various data structures used to create a symbol table?

Ans: The various data structures used to create a symbol table are:

Unordered List: An unordered list would enter each name sequentially as it is declared

Ordered List: In ordered list the names are ordered according to the character.

Binary Tree: Binary tree combine the first search time of an ordered array, O (log n)

time on the average, with an insertion easy of a linked list.

Hash Table: From efficiency point of view hash table are the best method. The hash

table consists of finding a numerical value for the identifier, perhaps some

combinations of the ASCII code as a number or even its bit code and then performing

some of the techniques used for hashing numbers.

Stack: Stack is also a data structure for a symbol table where a pointer is kept to the top

of the stack for each block. In this data structure, names are pushed onto the stack as

they are encountered when a block is completed that portion and a pointer to it are

moved so that the containing block names are completed.

i. Distinguish between syntax and semantics of a programming language? Explain which

parts of a compiler are primarily concerned with each?

Ans: Syntax of a programming language is the form of its expressions, statements and program

modules.

Semantic of a programming language is the meaning given to the various syntactic

structures.

Front end of the compiler are primarily concerned with the syntax and semantics of the

programming language.

j. What is the major functioning of the five main stages of a compiler?

Ans:

PARSER

PROGRAM

Parse Table

LEXICAL ANALYZER: This module has the task of separating the continuous string of

characters into distinctive groups that make sense. Such a group is called token. A token may

be composed of a single character or a sequence of characters. This sequence of characters is

called lexme.

SYNTAX ANALYSER: This is the module in which the overall structure is identified and

involves an understanding of the order in which the symbols in a program may appear. In this

process of analyzing each sentence, the parser builds abstract tree structure .parser will

generate a parse tree.

SEMANTIC ANALYZER: The semantic analyzer gathers the type information and checks

the tree produced by the syntax analyzer for the semantic errors. This phase also generates a

tree called Annonated tree.

INTERMEDIATE CODE GENERATION: After passing through the above three phases the

source program will pass through Intermediate code generation where it will be converted into

a compact form using one of the following three methods:

Three Address Code

Quadruple

Post fix notation

CODE OPTIMIZATION: It is an optional phase which optimizes the source code for

effective memory utilization. If the code is optimized then no further optimization is required.

TARGET CODE GENERATION: The final phase of the compiler is the generation of the

target code, consisting normally of relocatable machine code or assembly code. Memory

locations are selected for each of the variable used in the programs.

2.

a) For the following grammar, find the FIRST and FOLLOW sets of each of the non-terminals:

S aAB / bA/ ε

A aAb / ε

B bB / c

Ans)

FIRST (S) = {a, b, ε}

FIRST (A) = {a, ε}

FIRST (B) = {b, ε}

FOLLOW (S) = {$}

S aAB is in the form of αBβ so

FOLLOW (A) = FIRST (β)

= FIRST (B)

= {b, c}

S aAB is in the form of αBβ and FIRST (B) has ‗ε‘ so

FOLLOW (B) = FOLLOW (S)

= {$}

S bA

FOLLOW (A) = FOLLOW (S) = {$}

So finally after combining the FOLLOW of the non-terminals are:

FOLLOW (S) = {$}

FOLLOW (A) = {$, b, c}

FOLLOW (B) = {$}

b) Differentiate between syntax directed definition and syntax directed translation scheme?

Ans)

Syntax directed definition generalizes a context free grammar by associating a set of attribute with

each node in a parse tree. Each attribute gives some information about the node i.e. Syntax

directed definition is a generalization of a CFG in which each grammar symbol has an associated

set of attributes partitioned into two subsets called the synthesized & inherited attributes of that

grammar symbol.

The value of an attribute at a parse tree node is defined by the semantic rule associated

with the production used at that node. Semantic rules set up dependencies between attributes that

will be represented by a graph.

Syntax directed translation schemes indicate the order in which semantic rules are to be

evaluated, they allow some implementation details to be shown. Syntax directed translation is a

method of translating a string into a sequence of actions by attaching one such action to each rule

of a grammar. Thus, parsing a string of the grammar produces a sequence of rule applications and

syntax directed translation provides a simple way to attach semantics to any such syntax.

Syntax directed translation refers to a method of compiler implementation where the

source language translation is completely driven by the parser. The parsing process and parse trees

are used to direct semantic analysis and translation of the source program. This can be a separate

phase of the compiler or we can augment our conventional grammar with information to control

the semantic analysis and translations. Such grammars are called attributed grammars.

c) Test whether the following grammar is LL (1)?

S aAb

A cd/ef

Ans)

S aAb

A cd/ef

FIRST (S) = {a}

FIRST (A) = {c,e}

FOLLOW (S) = {$}

S aAb is in the form of αBβ so

FOLLOW (A) = FIRST (b) = {b}

PREDICTIVE PARSER TABLE:

a c d e f $

S SaAb

A Acd Aef

As there are no multiple entries in the parsing table, so this grammar is a LL (1) grammar.

d) Explain the concept of boot strapping in compiler design process?

Ans) A compiler is a complex enough program that we would like to write it in a friendlier

language than assembly language. Even C compilers are written in C. using the facilities offered

by language to compile itself is the essence of bootstrapping. For boot strapping purposes, a

compiler is characterized by three languages: the source language ‗S‘ that it compiles, the target

language it ‗T‘ and the implementation language ‗I‘ that it is written in. We represent the three

languages using a T-diagram, because of its shape. The three languages S, I, and T may all be

quite different. For example, a compiler may run on one machine and produce target code for

another machine. Such a compiler is often called a cross-compiler.

Suppose we write a cross-compiler for a new language L in implementation language S to

generate code for machine N; that is we create LSN. If an existing compiler for S runs on machine

M and generates code for M, it is characterized by SMM. If LSN is run through SMM, we get a

compiler LMN that is a compiler from L to N that runs on M.

3.

a) Use T-diagram to describe the steps you would take to create a powerful compiler using a

quick dirty compiler?

b) Define and discuss the objectives of SDTS. What do you mean by underlying source

grammar? Explain with an example.

Ans) Syntax Directed Translation Schemes describe the order and timing of attribute

computation. Syntax directed translation schemes:

Embeds the semantic rules into the grammar

Each semantic rule can only use information computed by already executed semantic

rules

A translation scheme is a convenient way of describing an L-attributed definition.

It explains each production of the CFG according to the following rules:

1. If there is a production of the form X AB and X.i, A.i, B.i are the inherited

attributes of X, A, B respectively then:

A.i = F(X, i)

B.i = g(X.i, A.i)

Where A.i is the inherited attribute of A

2. If X.s, A.s, B.s are the synthesized attributes then: X.s = F(A.s, B.s)

3. If there is a production Xε then

X.s = X.i

They are independent to their successors.

4. The definitions must be written at the right side of the production by using

parenthesis like:

A {B.i Rule} {X.s Rule}

{X.i Rule}

5. If there is no inherited attribute definition then the synthesized definition is

sufficient.

Two main issues of Syntax Directed Translation Schemes are:

Triggering execution of the semantic rules

Managing and accessing attributes value

Underlying source program means all the attributes or the semantic rules are

being attached to the source program

c) Construct the DAG for the following statement

Z = X – Y + X * Y * U – V / W + X + V

Ans)

t 1 = X

t 2 = t 1 – Y

t 3 = t 2 *Y

t 4 = t 3 * U

t 5 = V

t 6 = t 5 / W

t 7 = t 4 – t 6

t 8 = t1 + V

t 9 = t 7 + t 8

t 9

t 8 t 7

t 6

t 4

+

+

-

* /

t 5

t 3

t 2

t 1

4.

a) Describe the contents of a symbol table. How is the symbol table involved in the

interactions between the different components of the compiler and in the error detection?

Give a simple example in each case.

Ans) Exactly what information is stored in the symbol table depends on many things. The

programming language will determine much of the information that is stored, but the target

architecture will also influence what data is stored. In fact some assumptions about how to

produce code can affect what values are stored in the table. Different information will need

to be stored for constants, variables, procedures, enumerations, type definitions and so on.

What follows is a description of various common declarative language constructs and typical

classes of information symbol table would record for those constructs.

CONSTANTS:

* U V W

-

X Y

Constants are identifiers that represent a fixed value- one that can never be changed. Since

programmers will wish to access these values by name, the name must be stored. Finally,

since the values must be used properly in the type system, type information is also included.

No run time location needs to be stored for constants. These are typically stored right into

the code stream by the compiler at compilation time.

VARIABLES:

Variables are identifiers whose value may change between executions and during a single

execution of a program. They represent the contents of some memory location. The symbol

table needs to record the variable‟s name as well as its allocated storage space at runtime.

Typically this location is stored as an offset relative to some position.

TYPES (user defined):

A user defined type is typically a conglomeration of 1 or more existing types. Types are

accessed by name and reference a type definition structure. Each structure will record

important information about itself, like its size, the name of its members or its upper or

lower bounds. What information is stored will depend on what type is being defined.

SUBPROGRAMS:

Procedures, functions and methods are named segments of code. Naturally, the symbol table

should record a procedure‟s name. The type they return if any should be noted. When

subprograms are accessed at run time it is typically by their location in the code stream, thus

the location of the code generated for a given procedure should also be recorded. The formal

parameters and local variables of a function are separate identifiers in their own right, and

should be stored in separate records. Thus, they are treated much like the fields of a user

defined record. They are stored as a list of variable records separated, but accessible from,

the main procedure record.

CLASSES:

Classes are abstract data types which restrict access to its members and provide convenient

language level polymorphism. They are really a special case of user defined types, and are

structurally no different. But it may be convenient to store information about classes above

and beyond that required for other user defined types. This includes the location of the

default constructor and destructor, and the address of the virtual function table.

INHERITANCE:

There may be different ways to perform inheritance, and a symbol table record is needed to

keep track of which classes are being inherited and exactly how inheritance is performed. A

compiler might consider whether shared or non – shared inheritance is performed. A

compiler might consider whether keywords public, private and protected modify the

visibility of inherited items, and may be recorded with the inheritance information. A

reference to the participating classes could also be recorded in an inheritance structure.

ARRAYS:

Arrays represent a collection of uniformly typed elements that may be randomly accessed by

index. For each dimension of an array, the compiler will need to know about such things as

the lower boundary of the array i.e the lowest valid index, the upper boundary i.e the largest

valid index, the index size, index type, the total size and the type of the elements contained.

When many different types can be used to index an array, the index type and the size will

also be recorded. Finally, the total amount of space to be allocated for each dimension of an

array should be stored.

RECORDS:

Records represent a collection of possibly heterogeneous members which can be accessed by

name. The symbol table probably needs to record each of the record‟s members. The

compiler also needs to know the size of the record how much space to allocate for all the

members. Each of the fields of the record will probably be a reference to another symbol

table record like a variable or a type which may in turn reference another record or array.

CLASS:

Just like a record, the fields of a class can be conveniently stored in a separate record.

Classes will also store their methods, constructors, destructors and virtual function table in

this complex information structure.

MODULE:

Stores the module size, its name, parent, its members and a time stamp. The time stamp is

used to guarantee that load time that the models have been compiled in the correct order or

are all up to date.

b) Explain the machine dependent and machine independent code optimization. What are

their advantages?

Ans) Machine independent optimizations can be performed independently of the target

machine for which the compiler is generating code; that is, the optimizations are not tied to

the target machine‟s specific language or platform. Examples of machine independent

optimizations are: elimination of loop invariant computation, induction variable elimination

and elimination of common sub expression.

Machine dependent optimization requires knowledge of the target machine.

An attempt to generate object code that will utilize the target machine‟s registers more

efficiently is an example of machine dependent code optimization.

Advantages are:

5.

a) Explain the working principle of operator precedence parsing algorithm. Explain the

parsing action for the input string id 1 – id 2 / id 3 * id 4 ↑ id 5 – id 1 with reference to the

operator precedence relation table given below:

- * / ↑ id $

- > < < < < >

* > > < < < >

/ > > > < < >

↑ > > > < < >

id > > > > > b) What information is recorded in the symbol table of a compiler for a block structured

language? Give examples of how this information is created and/or used at each stage of

compilation.

Ans)

Symbol table is a scratch pad where the compiler stores the information about the objects

in the program such as variables, functions and procedures.

It enables the compiler to do type checking and determine the scope of a variable.

There is no type compatibility constraint or scoping rules at run time.

No type error will occur when the program runs

A type system is said to be strongly typed if it passes only type safe programs

A language is strongly typed if its compiler is strongly typed

Scope rules of a language are used for specifying which declaration of a variable is

associated with a specific occurrence of the variable

Scope rules apply to variables, constants, new type definitions and functions

A set of statements enclosed within blocking symbols (BEGIN and END, „{„ and „}‟, etc.)

is called a block (compound statement)

Blocks nest inside other blocks

Blocks are either disjoint or nested

A block-structured language allows procedures/functions to nest within other

procedures/functions

6.

a) Construct LL ( 1 ) parsing table for the following grammar:

S aBDh

B cC

C bC / ε

D EF

E g / ε

F f / ε

Ans) S aBDh

B cC

C bC / ε

D EF

E g / ε

F f / ε

FIRST (F) = {f, ε}

FIRST (E) = {g, ε}

FIRST (D) = FIRST (E) – {ε} U FIRST (F)

= {g, ε} – {ε } U FIRST (F)

= {g} U {f, ε}

= {g, f}

FIRST (C) = {b, ε}

FIRST (B) = {c}

FIRST (S) = {a}

FOLLOW

FOLLOW (S) = {$}

S aBDh is in the form of αBβ so FOLLOW (B) is

FOLLOW (B) = FIRST (Dh)

= FIRST (D) – {ε} U FIRST (h)

= {g, f} U {h}

= {g, f, h}

S aBDh is in the form of αBβ so FOLLOW (D) is

FOLLOW (D) = FIRST (h)

= {h}

B cC is in the form of αB so

FOLLOW (C) = FOLLOW (B) = {g, f, h}

C bC is in the form of αB so

FOLLOW (C) = {g, f, h}

D EF is in the form of αB so

FOLLOW (F) = FOLLOW (D) = {h}

LL (1) PARSING TABLE

a c b g f H $

S SaBDh

B BcC

C CbC C ε C ε C ε

D DEF DEF

E Eg E ε

F Ff F ε

b) Explain how the scope rules and the block structure of a programming language decide

the structure of the symbol table?

Ans) Scoping is one of the applications of the symbol table. There are two types of scopes:

Local

Global

Symbol table decides whether the particular symbol is local or global

For example:

Int X;

Main ( )

{

Int Y;

Add ( );

}

Add ( )

{

Int Z;

}

{ } indicates the block i.e the life span of the variable is limited to that block

For individual blocks symbol table is created

The global value is stored in the top block so that it can be accessed by all the blocks

Address of the nodes

Leaf nodes of every block stores the address of the next block.

Void main ( )

{

Int x;

C.out<< “enter x”;

c.in>> x;

X

{

Int y;

C.out<< “ enter y”;

c.in>> y;

{

Int z;

c.out<< “enter z”;

c.in>> z;

}

}

}

MAIN BLOCK

BLOCK 1

BLOCK 2

7.

a) Construct the SLR parsing table for the following grammar:

E E+ T

E T

T T * F

Statement

X

x

y

x , y

z

T F

F id

L L,E / E

Ans) Augmented Grammar:

E‟E ------------------ (0)

E E+ T ------------------ (1)

ET ----------------- (2)

TT * F ------------------ (3)

TF ------------------ (4)

Fid ------------------ (5)

LL, E ------------------ (6)

L E ------------------ (7)

Item Set I0 :

E‟ .E

E .E + T

E .T

T .T * F

T . F

F . id

L .L, E

L .E

In item set (0) the symbols to be processed are E, T, F, id, L

PROCESS E

I1 = GOTO (I0, E)

I1: E‟E.

E E. + T

L E.

PROCESS T

I2 = GOTO (I0, T)

I2: ET.

T T. * F

PROCESS F

I3 = GOTO (I0, F)

I3: TF.

PROCESS id

I4 = GOTO (I0, id)

I4: Fid.

PROCESS L

I5 = GOTO (I0, L)

I5: LL . , E

In item set (1) symbol to be processed is „+‟

PROCESS +

I6 = GOTO (I1, +)

I6: EE + . T

T .T * F

T .F

F . id

In item set (2) symbol to be processed is „*‟

PROCESS *

I7 = GOTO(I2,*)

I7: TT * . F

F . id



In item set (5) symbol to be processed is „,‟

PROCESS ,

I8 = GOTO(I5, ,)

I8: LL , . E

E . E + T

E .T

In item set (6) symbols to be processed are T, F, id

PROCESS T

I9 = GOTO (I6, T)

I9: EE + T .

PROCESS F

Already processed in I3

PROCESS id


In item set (7) symbols to be processed are F, id

PROCESS F

I10 = GOTO (I7, F)

I10: TT * F .

PROCESS id


In item set (8) symbols to be processed are E, T

PROCESS E

I11 = GOTO (I8, E)

I11: LL , E .

PROCESS T


In item set (9) no symbols to be processed.

In item set (10) no symbols to be processed


ACTION GOTO

State + * , id $ E T F L

I0 4 1 2 3 5

I1 6

I2 7

I3

I4

I5 8

I6 4 9 3

I7 4

I8 11 2

I9

I10

I11

ACTION

State + * , id $

I0 S4

I1 S6/R7 R7 R7 R7 accepted

I2 R2 S7/ R2 R2 R2 R2

I3 R4 R4 R4 R4 R4

I4 R5 R5 R5 R5 R5

I5 S8

I6 S4

I7 S4

I8

I9 R1 R1 R1 R1 R1

I10 R3 R3 R3 R3 R3

I11 R6 R6 R6 R6 R6

b) What is the objective of intermediate code generation? What is the different form of

intermediate code generated by intermediate code generation phase?

Ans) The objective of Intermediate Code generation are:

Ease of re-targeting different machines.

Perform machine independent code optimization.

Create linear representation of a program

An intermediate representation spans the gap between the source and target

languages.

Implementable via syntax directed translation, so can be folded into the parsing

process.

The different forms of intermediate code generated by intermediate code generation

phase are:

Syntax trees

Post fix notation

Three address code

Qudruple

Triples

Indirect Triple

8.

a) What is the objective of intermediate code generation? Generate the three address code

for the following code segment:

Main ( )

{

int a = 1;

int b[10];

while (a<= 10)

b[a] = 2 ** a;

}

Ans)

Three Address Code:

a. a = 1

b. t 1 = 10 * 4

c. t 2 = add (b) – 4

d. t 3 = t 2[t 1]

e. if a <= 10 goto (7)

f. goto (12)

g. t 4 = a * 4

h. t 5 = add (b) – 4

i. t 6 = t 5[t4]

j. t 7 = 2 * * a

k. t 6 = t 7

l. exit

b) Find the canonical collection of sets of LR (1) items:

S AaAb

A BbBa

A ε

B ε

Ans) Augmented Grammar:

S‟ S ------------------------------- (0)

S AaAb ------------------------------- (1)

A BbBa ------------------------------- (2)

A ε ------------------------------- (3)

B ε -------------------------------- (4)

I 0 : S‟ . S, $ α ε B S, β ε, a$ S . AaAb, $

S . BbBa, $ FIRST (βa) = FIRST (ε$) = {$}

A . , a

B . , b

In item set (0) symbols to be processed are S, A, B

PROCESS S

I 1: GOTO (I 0, S)

I 1 = S‟S. , $

PROCESS A

I 2: GOTO (I 0, A)

I 2 = SA.aAb , $

PROCESS B

I 3: GOTO (I 0, B)

I 3 = SB. bBa , $


In item set (2) symbols to be processed is „a‟

PROCESS a

I 4: GOTO (I 2, a)

I4 = SAa. Ab , $ α Aa B A, β b, a$

FIRST (βa) = FIRST (b$) = {b}

A . , b

In item set (3) symbols to be processed is „b‟

PROCESS b

I 5: GOTO (I 3, b)

α Bb, B B, β a, a$ I 5 = SB b. Ba , $

B . , a FIRST (βa) = FIRST (a$) = {a}

In item set (4) symbols to be processed is „A‟

PROCESS A

I 6: GOTO (I 4, A)

I 6 = SAaA. b , $

In item set (5) symbols to be processed is „B‟

PROCESS B

I 7: GOTO (I 5, B)

I 7 = SBbB. a , $

In item set (6) symbols to be processed is „b‟

PROCESS b

I 8: GOTO (I 6, b)

I 8 = SAaAb. , $

In item set (7) symbols to be processed is „a‟

PROCESS a

I 9: GOTO (I 7, a)

I 9 = SBbBa. , $

LR (1) PARSING TABLE

a b $ A B S

I 0 R3 R4 2 3 1

I 1 Accepted

I 2 S4

I 3 S5

I 4 R3 6

I 5 R4 7

I 6 S8

I 7 S9

I 8 R1

I 9 R2

c) Write the quadruples, triples and indirect triples for the following expression:

X[i] := Y

X:= Y[i]

Ans)

QUADRUPLES

X[i]:= Y

t 1 = X[i]

t 2 = Y

t 1 = t 2

Operator

Operand

1

Operand

2

Result

[]

X i t1

=

Y t 2

= t 2 t 1

X = Y[i]

t 1 = Y[i]

t 2 = X

t 2 = t 1

Operator

Operand

1

Operand

2

Result

[]

Y i t1

=

X t 2

=

t 1 t 2

TRIPLE

X[i]:= Y

(0) [] X I

(1) = Y (0)

X = Y[i]

(0) [] X I

(1) = (0) Y

INDIRECT TRIPLE

POINTER X[i]:= Y

POINTER X = Y[i]

1. What is a compiler?

A compiler is a program that reads a program written in one language –the source language and translates it into

an equivalent program in another language-the target language. The compiler reports to its user the presence of

errors in the source program.

2. What are the two parts of a compilation? Explain briefly.

Analysis and Synthesis are the two parts of compilation.

The analysis part breaks up the source program into constituent pieces and creates an intermediate

representation of the source program.

The synthesis part constructs the desired target program from the intermediate representation.

(0)

(100)

(1)

(200)

(100)

[] X I

(200)

= Y (100)

(0)

(100)

(1)

(200)

(100)

[] X I

(200)

= (100) Y

3. List the subparts or phases of analysis part.

Analysis consists of three phases:

Linear Analysis.

Hierarchical Analysis.

Semantic Analysis.

4. Depict diagrammatically how a language is processed.

5. What is linear analysis?

Linear analysis is one in which the stream of characters making up the source program is read from left to right

and grouped into tokens that are sequences of characters having a collective meaning.

Also called lexical analysis or scanning.

6. Find the no. of tokens in the following code segment.

float fun(char *s)

/* Find a zero */

{

if(!strcmp(s,‖0‖))

return 0;

}

Ans-No of tokens is 22

7. Find the no. of tokens in the following code segments.

(a) printf(―i=%d,&i=%d‖,i,&i);

(b) int max(i,j)

int i,j;

{return (i>j?i:j);}

Ans-(a)10

(b)25

8. What is a symbol table?

A symbol table is a data structure containing a record for each identifier, with fields for the attributes of the

identifier. The data structure allows us to find the record for each identifier quickly and to store or retrieve data

from that record quickly.

Whenever an identifier is detected by a lexical analyzer, it is entered into the symbol table. The attributes of an

identifier cannot be determined by the lexical analyzer.

9. Mention some of the cousins of a compiler.

Cousins of the compiler are:

Preprocessors

Assemblers

Loaders and Link-Editors

10. List the phases that constitute the front end of a compiler.

The front end consists of those phases or parts of phases that depend primarily on the source language and are

largely independent of the target machine. These include

Lexical and Syntactic analysis

The creation of symbol table

Semantic analysis

Generation of intermediate code

A certain amount of code optimization can be done by the front end as well. Also includes error handling that

goes along with each of these phases.

11. Mention the back-end phases of a compiler.

The back end of compiler includes those portions that depend on the target machine and generally those

portions do not depend on the source language, just the intermediate language. These include

(i)Code optimization

(ii)Code generation, along with error handling and symbol- table operations.

12. Define compiler-compiler.

Systems to help with the compiler-writing process are often been referred to as compiler-compilers, compiler-

generators or translator-writing systems.

Largely they are oriented around a particular model of languages , and they are suitable for generating

compilers of languages similar model.

13. List the various compiler construction tools.

The following is a list of some compiler construction tools:

Parser generators

Scanner generators

Syntax-directed translation engines

Automatic code generators

Data-flow engines

14. Differentiate tokens, patterns, lexeme.

Tokens- Sequence of characters that have a collective meaning.(group of characters with

logical meaning).

Patterns- There is a set of strings in the input for which the same token is produced as output.

This set of strings is described by a rule called a pattern associated with the token(rule for

group of characters to form tokens).

Lexeme- A sequence of characters in the source program that is matched by the pattern for a

token.(Actual character stream that represent the token).

15. List the operations on languages.

Union – L U M ={s | s is in L or s is in M}

Concatenation – LM ={st | s is in L and t is in M}

Kleene Closure – L* (zero or more concatenations of L)

Positive Closure – L+ ( one or more concatenations of L)

16. Write a regular expression for an identifier.

An identifier is defined as a letter followed by zero or more letters or digits.

The regular expression for an identifier is given as

letter (letter | digit)*

17. Mention the various notational shorthands for representing regular expressions.

(i)One or more instances (+)

(ii)Zero or one instance (?)

(iii)Character classes ([abc] where a,b,c are alphabet symbols denotes the regular expressions a | b

| c.)

(iv)Non regular sets

18. What is the function of a hierarchical analysis?

Hierarchical analysis is one in which the tokens are grouped hierarchically into nested collections with

collective meaning.

Also termed as Parsing.

19. What does a semantic analysis do?

Semantic analysis is one in which certain checks are performed to ensure that components of a program

fit together meaningfully.

Mainly performs type checking.

20. List the various error recovery strategies for a lexical analysis.

Possible error recovery actions are:

(i)Panic mode recovery

(ii)Deleting an extraneous character

(iii)Inserting a missing character

(iv)Replacing an incorrect character by a correct character

(v)Transposing two adjacent characters

1. What are the benefits of intermediate code generation? A Compiler for different machines can be created by attaching different back end to the

existing front ends of each machine.

A Compiler for different source languages can be created by proving different front ends for

corresponding source languages t existing back end.

A machine independent code optimizer can be applied to intermediate code in order to optimize

the code generation.

2. What are the various types of intermediate code representation?

There are mainly three types of intermediate code representations.

Syntax tree

Postfix

Three address code

3. Define backpatching.

Backpatching is the activity of filling up unspecified information of labels using appropriate semantic

actions during the code generation process.In the semantic actions the functions used are

mklist(i),merge_list(p1,p2) and backpatch(p,i)

4. Mention the functions that are used in backpatching.

(i)mklist(i) creates the new list. The index i is passed as an argument to this function where I is

an index to the array of quadruple.

(ii)merge_list(p1,p2) this function concatenates two lists pointed by p1 and p2. It returns the

pointer to the concatenated list.

(iii)backpatch(p,i) inserts i as target label for the statement pointed by pointer p.

5. What is the intermediate code representation for the expression a or b and not c?

The intermediate code representation for the expression a or b and not c is the three address sequence

t1 := not c

t2 := b and t1

t3 := a or t2

6. What are the various methods of implementing three address statements?

The three address statements can be implemented using the following methods.

Quadruple : a structure with at most four fields such as operator(OP),arg1,arg2,result.

Triples : the use of temporary variables is avoided by referring the pointers in the symbol

table.

Indirect triples : the listing of triples has been done and listing pointers are used instead of

using statements.

7. Give the syntax-directed definition for if-else statement.

1. S → if E then S1

E.true := new_label()

E.false :=S.next

S1.next :=S.next

S.code :=E.code | | gen_code(E.true ‗: ‗) | | S1.code

2. S → if E then S1 else S2

E.true := new_label()

E.false := new_label()

S1.next :=S.next

S2.next :=S.next

S.code :=E.code | | gen_code(E.true ‗: ‗) | | S1.code| | gen_code(‗go to‘,S.next) |

|gen_code(E.false ‗:‘) | | S2.code

8. Distinguish between compile time and run time environments .

Compile time environment includes

a)Declaration of variables.

b)Scope of variables.

c)Definition of procedures.

Run time environment includes

a)Binding of variables.

b)Life time of variables.

c)Activation of procedures.

9. Write the procedure to generate TAC.

a)Convert to postfix form.

b)Use the procedure of evaluation of the expression to get the three address code.

Ex.—a*(b+c)/(d+e)

Postfix—abc+*de+/

TAC--- t1=b+c

t2=a*t1

t3=d+e

t4=t2/t3

10. How you will evaluate the attributes in L-attributed definition.

a)Traverse the parse tree in depth first left to right (in postorder).

b)Evaluate inherited attribute when a node is visited for the first time.

c)Evaluate synthesized attribute when a node is visited for last time.

General evaluation order is i/p string→parse tree→dependency graph→evaluation order.

1. Define parser.

Hierarchical analysis is one in which the tokens are grouped hierarchically into nested collections with

collective meaning.

Also termed as Parsing.

2. Mention the basic issues in parsing.

There are two important issues in parsing.

Specification of syntax

Representation of input after parsing.

3. Why lexical and syntax analyzers are separated out?

Reasons for separating the analysis phase into lexical and syntax analyzers:

Simpler design.

Compiler efficiency is improved.

Compiler portability is enhanced.

4. Define a context free grammar.

A context free grammar G is a collection of the following

V is a set of non terminals

T is a set of terminals

S is a start symbol

P is a set of production rules

G can be represented as G = (V,T,S,P)

Production rules are given in the following form

Non terminal → (V U T)*

5. Briefly explain the concept of derivation.

Derivation from S means generation of string w from S. For constructing derivation two things are

important.

i) Choice of non terminal from several others.

ii) Choice of rule from production rules for corresponding non terminal.

Instead of choosing the arbitrary non terminal one can choose

i) either leftmost derivation – leftmost non terminal in a sentinel form

ii) or rightmost derivation – rightmost non terminal in a sentinel form

6. Define ambiguous grammar.

A grammar G is said to be ambiguous if it generates more than one parse tree for some sentence of

language L(G).

i.e. both leftmost and rightmost derivations are same for the given sentence.

7. What is a operator precedence parser?

A grammar is said to be operator precedence if it possess the following properties:

1. No production on the right side is ε.

2. There should not be any production rule possessing two adjacent non terminals at the right hand side.

8. List the properties of LR parser.

1. LR parsers can be constructed to recognize most of the programming languages

for which the context free grammar can be written.

2. The class of grammar that can be parsed by LR parser is a superset of class of

grammars that can be parsed using predictive parsers.

3. LR parsers work using non backtracking shift reduce technique yet it is

efficient one.

9. Mention the types of LR parser.

(i)SLR parser- simple LR parser

(ii)LALR parser- lookahead LR parser

(iii)Canonical LR parser

10. What are the problems with top down parsing?

The following are the problems associated with top down parsing:

(i)Backtracking is costly and slow.

(ii)Left recursion may leads to infinite loop.

(iii)Left factoring may lead to ambiguity(Dangling else problem).

(iv)Debugging is difficult.

11. Write the algorithm for FIRST and FOLLOW.

FIRST

1. If X is terminal, then FIRST(X) IS {X}.

2. If X → ε is a production, then add ε to FIRST(X).

3. If X is non terminal and X → Y1Y2..Yk is a production, then place a in FIRST(X) if for some i , a is

in FIRST(Yi) , and if ε is in all of FIRST(Y1),…FIRST(Yk);then add ε to FIRST(X).

FOLLOW

1. Place $ in FOLLOW(S),where S is the start symbol and $ is the input right endmarker.

2. If there is a production A → αBβ, then everything in FIRST(β) except for ε is placed in

FOLLOW(B).

3. If there is a production A → αB, or a production A→ αBβ where FIRST(β) contains ε , then

everything in FOLLOW(A) is in FOLLOW(B).

12. List the advantages and disadvantages of operator precedence parsing.

Advantages

This type of parsing is simple to implement.

Disadvantages

1. The operator like minus has two different precedence (unary and binary).Hence it is hard to handle tokens

like minus sign.

2. This kind of parsing is applicable to only small class of grammars.

13. What is dangling else problem?

Ambiguity can be eliminated by means of dangling-else grammar which is show below:

stmt → if expr then stmt

| if expr then stmt else stmt

| other

14. Write short notes on YACC.

YACC is an automatic tool for generating the parser program.

YACC stands for Yet Another Compiler Compiler which is basically the utility available from UNIX.

Basically YACC is LALR parser generator.

It can report conflict or ambiguities in the form of error messages.

15. What is meant by handle pruning?

A rightmost derivation in reverse can be obtained by handle pruning.

If w is a sentence of the grammar at hand, then w = γn, where γn is the nth right-sentential form of some

as yet unknown rightmost derivation

S = γ0 => γ1…=> γn-1 => γn = w

16. Define LR(0) items.

An LR(0) item of a grammar G is a production of G with a dot at some position of the right side. Thus,

production A → XYZ yields the four items

A→.XYZ

A→X.YZ

A→XY.Z

A→XYZ.

17. What is meant by viable prefixes?

The set of prefixes of right sentential forms that can appear on the stack of a shift-reduce parser are called

viable prefixes. An equivalent definition of a viable prefix is that it is a prefix of a right sentential form that

does not continue past the right end of the rightmost handle of that sentential form.

18. Define handle.

A handle of a string is a substring that matches the right side of a production, and whose reduction to the

nonterminal on the left side of the production represents one step along the reverse of a rightmost

derivation.

A handle of a right – sentential form γ is a production A→β and a position of γ where the string β may be

found and replaced by A to produce the previous right-sentential form in a rightmost derivation of γ. That is

, if S =>αAw =>αβw,then A→β in the position following α is a handle of αβw.

19. What are kernel & non-kernel items?

Kernel items, whish include the initial item, S‘→ .S, and all items whose dots are not at the left end.

Non-kernel items, which have their dots at the left end.

20. What is phrase level error recovery?

Phrase level error recovery is implemented by filling in the blank entries in the predictive parsing table

with pointers to error routines. These routines may change, insert, or delete symbols on the input and

issue appropriate error messages. They may also pop from the stack.

21. Differentiate between top down and bottom up parser.

Top down parser(TDP) Bottom up parser(BUP)

(i) It creates parse tree starting from root and (i) It creates parse tree starting from children

proceeds to children . and proceeds to the root.

(ii)It uses left most derivation. (ii)It uses reverse of right most derivation.

(iii)problem:when a non terminal has more (iii)problem: when a handle is detected it is than one

alternative then it shouldhave criteria reduced .

to decide the right choice.

(iv)Parsing table size is small. (iv)Parsing table size is bigger than TDP.

(v)Less power. (v)High power.

(vi)Error detection is easy. (vi) Error detection is difficult.

22. Under which conditions predictive parsing can be constructed for a grammar?

The grammar must be free from left recursion and should be left factored.

1. Mention the properties that a code generator should possess.

The code generator should produce the correct and high quality code. In other words, the code

generated should be such that it should make effective use of the resources of the target

machine.

Code generator should run efficiently.

2. List the terminologies used in basic blocks.

Define and use – the three address statement a:=b+c is said to define a and to use b and c.

Live and dead – the name in the basic block is said to be live at a given point if its value is

used after that point in the program. And the name in the basic block is said to be dead at

a given point if its value is never used after that point in the program.

3. What is a flow graph?

A flow graph is a directed graph in which the flow control information is added to the basic blocks.

The nodes to the flow graph are represented by basic blocks

The block whose leader is the first statement is called initial block.

There is a directed edge from block B1 to block B2 if B2 immediately follows B1 in the given

sequence. We can say that B1 is a predecessor of B2.

4. What is a DAG? Mention its applications.

Directed acyclic graph(DAG) is a useful data structure for implementing transformations on basic

blocks.

DAG is used in

Determining the common sub-expressions.

Determining which names are used inside the block and computed outside the block.

Determining which statements of the block could have their computed value outside the

block.

Simplifying the list of quadruples by eliminating the common su-expressions and not

performing the assignment of the form x := y unless and until it is a must.

5. Define peephole optimization.

Peephole optimization is a simple and effective technique for locally improving target code. This

technique is applied to improve the performance of the target program by examining the short sequence

of target instructions and replacing these instructions by shorter or faster sequence.

6. List the characteristics of peephole optimization.

Redundant instruction elimination

Flow of control optimization

Algebraic simplification

Dead code elimination

Use of machine idioms

7. How do you calculate the cost of an instruction?

The cost of an instruction can be computed as one plus cost associated with the source and destination

addressing modes given by added cost.

Instruction cost

MOV R0,R1 1

MOV R1,M 2

SUB 5(R0),*10(R1) 3

8. What is a basic block? Define leader used in basic block and give one example.

A basic block is a sequence of consecutive statements in which flow of control enters at the beginning

and leaves at the end without halt or possibility of branching.

Leader

a)1st statement is a leader.

b)Target of a conditional or unconditional is a leader.

c)Statement that immediately follows a conditional or unconditional is a leader.

Statement starting from a leader upto the next leader,but not including the next leader is a basic

block.

Ex.— fact(x)

{int f=1;

for(i=2;i<=x;i++)

f=f*I;

return f;}

The TAC for the above code is

1. f=1

2. i=2

3. if(i>x) goto 8

4. f=f*i

5. t1=i+1

6. i=t1

7. goto 3

8. goto calling program

Here the leaders are statement 1,3,4 and 8.1st Block B1 consists of statement 1 and 2.Block B2

consists of only statement 3.Block B3 is from statement 4 to statement 7.Block B4 consists of

only statement 8.

9. How would you represent the following equation using DAG?

a:=b*-c+b*-c

10. Give some examples of SDT.

(1)To store type information into symbol table.

(2)To build syntax tree.

(3)To issue error messages.

(4)To perform consistency checks like type checking ,parameter checking etc.

(5)To generate intermediate code.

1. Mention the issues to be considered while applying the techniques for code optimization.

The semantic equivalence of the source program must not be changed.

The improvement over the program efficiency must be achieved without changing the

algorithm of the program.

2. What are the basic goals of code movement? To reduce the size of the code i.e. to obtain the space complexity.

To reduce the frequency of execution of code i.e. to obtain the time complexity.

3. What do you mean by machine dependent and machine independent optimization?

The machine dependent optimization is based on the characteristics of the target machine for the

instruction set used and addressing modes used and registers used for the instructions to

produce the efficient target code. This also includes peephole optimization.

The machine independent optimization is based on the characteristics of the programming

languages for appropriate programming structure and usage of efficient arithmetic

properties in order to reduce the execution time. This includes loop optimization(code

motion, loop jamming, loop unrolling), dead code elimination, common sub-expression

elimination, constant propagation, constant folding, strength reduction etc.

4. What are the different data flow properties?

Available expressions

Reaching definitions

Live variables

Busy variables

5. Eliminate left recursion and left factor the following grammar.

E→aba|abba|Eb|EbE

Ans---Elimination of left recursion

E→abaE1|abbaE

1

E1→bE

1|bEE

1|ε

Left factor

E→abA

A→aE1|baE

1

E1→bB|ε

B→E1|EE

1

6. Eliminate left recursion in more than one level.

S→Aa|b

A→Ac|Sd|ε

Ans-Substitute the productions of S in the second production of A.We get

S→Aa|b

A→Ac|Aad|bd|ε

Elimination of left recursion

S→Aa|b

A→bdA1|A

1

A1→c A

1|ad A

1|ε

7. What is dynamic scoping?

In dynamic scoping a use of non-local variable refers to the non-local data declared in most recently called

and still active procedure. Therefore each time new findings are set up for local names called procedure. In

dynamic scoping symbol tables can be required at run time.

9. What is code motion?

Code motion is an optimization technique in which amount of code in a loop is decreased. This

transformation is applicable to the expression that yields the same result independent of the number of

times the loop is executed. Such an expression is placed before the loop.

Ex.- while(i<100)

{

x=i*sin(A)/sin(B);

}

Can be written as

t=sin(A)/sin(B);

while(i<100)

{

x=i*t;

}

10. What are the properties of optimizing compiler?

The source code should be such that it should produce minimum amount of target code.

There should not be any unreachable code.

Dead code should be completely removed from source language.

The optimizing compilers should apply following code improving transformations on source language.

i) common sub-expression elimination

ii) dead code elimination

iii) code movement

iv) strength reduction

11. What are the various ways to pass a parameter in a function?

Call by value

Call by reference

Copy-restore

Call by name

12. Suggest a suitable approach for computing hash function.

(i)Using hash function we should obtain exact locations of name in symbol table.

(ii)The hash function should result in uniform distribution of names in symbol table.

(iii)The hash function should be such that there will be minimum number of collisions. Collision is such

a situation where hash function results in same location for storing the names.

13. What is the difference between S-attributed and L-attributed definitions?

S-attributed L-attributed

1. Uses synthesized attributes only 1. Allows both synthesized and inherited attribute.

Each inherited attribute can inherit either from parent

or sibling only.

2. Semantic rules are placed at the 2. Semantic actions can be placed anywhere on r.h.s

of productions.

end of the production.

3.Translations are carried out 3.Carry out the translation by traversing the parse tree

depth first left to right.

during bottom up parsing.

14. What is dead code elimination?

The process of detecting the code that is useless and eliminating during its optimization is called dead

code elimination.

15. Draw the DAG for the following basic block

t 1 = a + b

t 2 = c + d

t 3 = e -t 2

X = t 1 -t 3

16. What is interprocedural analysis?Why interprocedural analysis is essential(What are the

applications of interprocedual analysis)?

A data-flow analysis that tracks information across procedures boundaries is said to be

interprocedural.Many analyses such as point-to analysis,can only be donein a meaningful way

if they are interprocedural.

Applications of interprocedural analysis are

(i)Virtual method invocation.

(ii)Pointer alias analysis.

(iii)Parallelization.

(iv)Detection of software errors and vulnerabilities.

(v)SQL injection

(vi)Buffer overflow.

17. What is a call site? What is a call graph?

Programs call procedures at certain points referred to as call sites.

A call graph for a program is a bipartite graph with nodes for call sites and nodes for procedures.An

edge goes from a call site node to a procedure node if that procedure may be called at the site.

18. What do you mean by flow sensitivity and context sensitivity?

A data –flow analysis that produces facts that depend on location in the program is said to be flow-

sensitive.If the analysis produces facts that depend on the history of procedure calls is said to be context

sensitive.A data flow analysis can be either flow- or context-sensitive ,both or neither.

19. What is datalog?What are datalog rules?What is a datalog program?

Datalog is a language that uses a Prolog-like notation, but whose semantic is far simpler than that of

Prolog. The elements of Datalog are atoms of the form p(X1,X2,…..,Xn).Here

(i) p is a predicate-a symbol that represents a type of statement such as ―a definition reaches the

beginning of a block.‖

(ii) X1,X2,…..,Xn are terms such as variables and constants.

Rules are a way of expressing logical inferences. The form of a rule is

H:-B1 & B2 &…& Bn

The components are as follows:

(i)H and B1 , B2 ,… Bn are literals-either items or negated items.

(ii)H is the head and B1 , B2 ,… Bn form body of the rule

(iii)Each of the Bi‘s is sometimes called a subgoal of the rule.

20. What is BDD(Binary Decision Diagram)?

A BDD is a representation of Boolean functions by rooted DAGs. The interior nodes correspond to

Boolean variables and have two children, low (representing truth value 0) and high(representing 1).A

truth assignment makes the represented function true if and only if the path from the root in which we

go to the low child if the variable at a node is 0 and to the high child otherwise, leads to the 1 leaf.

21. Explain the concept of bootstrapping in compiler design.

Bootstrapping is the process of designing a compiler in its source language.For bootstrapping a

compiler can be characterized by three languages: the source language (S), the target language (T), and

the implementation language (I).Implementation language means the language in which the compiler is

written.

. The three language S, I, and T can be quite different. Such a compiler is called cross-compiler

22. What do you mean by run time storage allocation?

Ans:

The runtime storage might be subdivided into

- Target code

- Data objects

- Stack to keep track of procedure activation

- Heap to keep all other information

23. What do you mean by postfix translations?

cd previous qa 2010

Documents