code generation. 2 the target machine runtime environment basic blocks and flow graphs instruction...

94
Code Generation Code Generation

Upload: lila-slader

Post on 14-Dec-2015

222 views

Category:

Documents


2 download

TRANSCRIPT

Code GenerationCode Generation

2

Code Generation

• The target machine

• Runtime environment

• Basic blocks and flow graphs

• Instruction selection

• Instruction selector generator

• Register allocation

• Peephole optimization

3

The Target Machine

• A byte addressable machine with four bytes to a word and n general purpose registers

• Two address instructions– op source, destination

• Six addressing modes– absolute M M 1– register R R 0– indexed c(R) c+content(R) 1– ind register *R content(R) 0– ind indexed *c(R) content(c+content(R)) 1– literal #c c 1

4

Examples

MOV R0, MMOV 4 (R0), MMOV *R0, MMOV *4 (R0), MMOV #1, R0

5

Instruction Costs

• Cost of an instruction = 1 + costs of source and destination addressing modes

• This cost corresponds to the length (in words) of the instruction

• Minimize instruction length also tend to minimize the instruction execution time

6

Examples

MOV R0, R1 1MOV R0, M 2MOV #1, R0 2MOV 4 (R0), *12 (R1) 3

7

An Example

Consider a := b + c

1. MOV b, R0 2. MOV b, a ADD c, R0 ADD c, a MOV R0, a

3. R0, R1, R2 contains 4. R1, R2 contains the addresses of a, b, c the values of b, c MOV *R1, *R0 ADD R2, R1 ADD *R2, *R0 MOV R1, a

8

Instruction Selection

• Code skeleton x := y + z a := b + c d := a + e MOV y, R0 MOV b, R0 MOV a, R0 ADD z, R0 ADD c, R0 ADD e, R0 MOV R0, x MOV R0, a MOV R0, d

• Multiple choices a := a + 1 MOV a, R0 INC a

ADD #1, R0 MOV R0, a

9

Register Allocation

• Register allocation: select the set of variables that will reside in registers

• Register assignment: pick the specific register that a variable will reside in

• The problem is NP-complete

10

An Example

t := a + b t := a + bt := t * c t := t + ct := t / d t := t / d

MOV a, R1 MOV a, R0ADD b, R1 ADD b, R0MUL c, R0 ADD c, R0DIV d, R0 SRDA R0, 32MOV R1, t DIV d, R0

MOV R1, t

11

Runtime Environments

• A translation needs to relate the static source text of a program to the dynamic actions that must occur at runtime to implement the program

• Essentially, the relationship between names and data objects

• The runtime support system consists of routines that manage the allocation and deallocation of data objects

12

Activations

• A procedure definition associates an identifier (name) with a statement (body)

• Each execution of a procedure body is an activation of the procedure

• An activation tree depicts the way control enters and leaves activations

13

An Exampleprogram sort (input, output);var a: array [0..10] of integer;procedure readarray; var i: integer; begin for i := 1 to 9 do read(a[i]) end;procedure partition(y, z: integer): integer; var i, j, x, v: integer; begin … end;procedure quicksort(m, n: integer); var i: integer; begin if (n > m) then begin I := partition(m, n); quicksort(m, I-1); quicksort (I+1, n) end end;begin a[0] := -9999; a[10] := 9999; readarray; quicksort(1,9) end.

14

An Example

s

r q(1,9)

p(1,9) q(1,3) q(5,9)

q(1,0) q(5,5) q(7,9)q(2,3)

q(2,1) q(9,9)q(7,7)q(3,3)

p(1,3)

p(2,3)

p(5,9)

p(7,9)

15

Scope

• A declaration associates information with a name

• Scope rules determine which declaration of a name applies

• The portion of the program to which a declaration applies is called the scope of that declaration

16

Bindings of Names

• The same name may denote different data objects (storage locations) at runtime

• An environment is a function that maps a name to a storage location

• A state is a function that maps a storage location to the value held there

name storage location value

environment state

17

Static and Dynamic Notions

Static

Definition of a procedure Declaration of a name Scope of a declaration

Dynamic

Activation of a procedure Binding of a name Lifetime of a binding

18

Storage Organization

• Target code: static

• Static data objects: static

• Dynamic data objects: heap

• Automatic data objects: stack

code

static data

stack

heap

19

Activation Records

returned value

actual parameters

optional control link

optional access link

machine status

local data

temporary data

stack

20

Activation Records

returned value and parameters

links and machine status

local and temporary data

returned value and parameters

links and machine status

local and temporary dataframe pointer

stack pointer

21

DeclarationsP {offset := 0} DD D “;” DD id “:” T {enter(id.name, T.type, offset);

offset := offset + T.width}T integer {T.type := integer; T.width := 4}T float {T.type := float; T.width := 8}T array “[” num “]” of T1

{T.type := array(num.val, T1.type); T.width := num.val T1.width}

T “*” T1 {T.type := pointer(T1.type); T.width := 4}

22

Nested Procedures

P DD D “;” D | id “:” T | proc id “;” D “;” S

nil headerheader

a

x

readarray

exchange

quicksort

i

header

header

k

v

partition

header

i

j

23

Symbol Table Handling

• Operations– mktable(previous): creates a new table and returns a poi

nter to the table

– enter(table, name, type, offset): creates a new entry for name in the table

– addwidth(table, width): records the cumulative width of entries in the header

– enterproc(table, name, newtable): creates a new entry for procedure name in the table

• Stacks– tblptr: pointers to symbol tables

– offset : the next available relative address

24

Declarations

P M D {addwidth(top(tblptr), top(offset));pop(tblptr); pop(offset)}

M {t := mktable(nil); push(t, tblptr); push(0, offset)}D D “;” D D proc id “;” N D “;” S

{t := top(tblptr); addwidth(t, top(offset)); pop(tblptr); pop(offset); enterproc(top(tblptr), id.name, t)}D id “:” T {enter(top(tblptr), id.name, T.type, top(offset)); top(offset) := top(offset) + T.width}N {t := mktable(top(tblptr)); push(t, tblptr); push(0, offset)}

25

Records

T record D end

T record L D end {T.type := record(top(tblptr)); T.width := top(offset); pop(tblptr); pop(offset)}L {t := mktable(nil); push(t, tblptr); push(0, offset)}

26

Basic Blocks

• A basic block is a sequence of consecutive statements in which control enters at the beginning and leaves at the end without halt or possibility of branching except at the end

27

An Example

(1) prod := 0(2) i := 1(3) t1 := 4 * i(4) t2 := a[t1](5) t3 := 4 * i(6) t4 := b[t3](7) t5 := t2 * t4(8) t6 := prod + t5(9) prod := t6(10) t7 := i + 1(11) i := t7(12) if i <= 20 goto (3)

28

Control Flow Graphs

• A (control) flow graph is a directed graph

• The nodes in the graph are basic blocks

• There is an edge from B1 to B2 iff B2 immediately follows B1 in some execution sequence

– there is a jump from B1 to B2

– B2 immediately follows B1 in program text

• B1 is a predecessor of B2, B2 is a successor of B1

29

An Example

(1) prod := 0(2) i := 1

(3) t1 := 4 * i(4) t2 := a[t1](5) t3 := 4 * i(6) t4 := b[t3](7) t5 := t2 * t4(8) t6 := prod + t5(9) prod := t6(10) t7 := i + 1(11) i := t7(12) if i <= 20 goto (3)

B0

B1

30

Construction of Basic Blocks

• Determine the set of leaders– the first statement is a leader– the target of a jump is a leader– any statement immediately following a jump is

a leader

• For each leader, its basic block consists of the leader and all statements up to but not including the next leader or the end of the program

31

Representation of Basic Blocks

• Each basic block is represented by a record consisting of– a count of the number of statements– a pointer to the leader– a list of predecessors– a list of successors

32

DAG Representation of Blocks

• Easy to determine:

• common subexpressions

• names used in the block but evaluated outside the block

• names whose values could be used outside the block

33

DAG Representation of Blocks

• Leaves labeled by unique identifiers

• Interior nodes labeled by operator symbols

• Interior nodes optionally given a sequence

of identifiers, having the value represented

by the nodes

34

An Example

(1) t1 := 4 * i(2) t2 := a[t1](3) t3 := 4 * i(4) t4 := b[t3](5) t5 := t2 * t4(6) t6 := prod + t5(7) prod := t6(8) t7 := i + 1(9) i := t7(10) if i <= 20 goto (1) i04 1

<=

*

[]

+

b

[]

a

*prod0

+ 20t1,t3

t4t2

t5

t6, prod

(1)

t7, i

35

Constructing a DAG

• Consider x := y op z. Other statements can be

handled similarly

• If node(y) is undefined, create a leaf labeled y

and let node(y) be this leaf. If node(z) is

undefined, create a leaf labeled z and let

node(z) be that leaf

36

Constructing a DAG

• Determine if there is a node labeled op, whose left child is node(y) and its right child is node(z). If not, create such a node. Let n be the node found or created.

• Delete x from the list of attached identifiers for node(x). Append x to the list of attached identifiers for the node n and set node(x) to n

37

Reconstructing Quadruples• Evaluate the interior nodes in topological order

• Assign the evaluated value to one of its attached identifier x, preferring one whose value is needed outside the block

• If there is no attached identifier, create a new temp to hold the value

• If there are additional attached identifiers y1, y2, …, yk whose values are also needed outside the block, add

y1 := x, y2 := x, …, yk := x

38

An Example

(1) t1 := 4 * i(2) t2 := a[t1](3) t3 := b[t1](4) t4 := t2 * t3(5) prod := prod + t4(6) i := i + 1(7) if i <= 20 goto (1)

i04 1

<=

*

[]

+

b

[]

a

*prod0

+ 20

prod

(1)

i

39

Generating Code From DAGs

t1 := a + bt2 := c + dt3 := e - t2t4 := t1 - t3

(1) MOV a, R0(2) ADD b, R0(3) MOV c, R1(4) ADD d, R1(5) MOV R0, t1(6) MOV e, R0(7) SUB R1, R0(8) MOV t1, R1(9) SUB R0, R1(10) MOV R1, t4

-

+ -

+a0 b0 e0

c0 d0

t1

t2

t3

t4

Only R0 and R1 available

40

Rearranging the Order

t2 := c + dt3 := e - t2t1 := a + bt4 := t1 - t3

(1) MOV c, R0(2) ADD d, R0(3) MOV e, R1(4) SUB R0, R1(5) MOV a, R0(6) ADD b, R0(7) SUB R1, R0(8) MOV R0, t4

-

+ -

+a0 b0 e0

c0 d0

t1

t2

t3

t4

41

A Heuristic Ordering for DAG

• Attempt as far as possible to make the

evaluation of a node immediately follow the

evaluation of its left most argument

42

Node Listing Algorithm

while unlisted interior nodes remain do begin select an unlisted node n, all of whose parents have been listed; list n; while the leftmost child m of n has no unlisted parents and is not a leaf do begin list m; n := m; endend

43

An Example

32

1

-

+

*

a0 b0

c0

+

d0 e0

6

-+

*

4

5 7

t7 := d + et6 := a + bt5 := t6 - ct4 := t5 * t7t3 := t4 - et2 := t6 + t4t1 := t2 * t3

44

Generating Code From Trees

• There exists an algorithm that determines th

e optimal order in which to evaluate stateme

nts in a block when the dag representation o

f the block is a tree

• Optimal order here means the order that yiel

ds the shortest instruction sequence

45

Optimal Ordering for Trees

• Label each node of the tree bottom-up with an integer denoting fewest number of registers required to evaluate the tree with no stores of immediate results

• Generate code during a tree traversal by first evaluating the operand requiring more registers

46

The Labeling Algorithm

if n is a leaf then if n is the leftmost child of its parent then label(n) := 1 else label(n) := 0else begin let n1, n2, …, nk be the children of n ordered by label so that label(n1) label(n2) … label(nk); label(n) := max1 i k(label(ni) + i - 1)end

47

An Example

t1

t4

t2a b

c

t3

d

e1

2

1 2

0

1 0

11

For binary interior nodes:

label(n) = max(l1, l2), if l1 l2l1 + 1, if l1 = l2

48

Code Generation From a Labeled Tree

• Use a stack rstack to allocate registers R0, R1, …, R(r-1)

• The value of a tree is always computed in the top register on rstack

• The function swap(rstack) interchanges the top two registers on rstack

• Use a stack tstack to allocate temporary memory locations T0, T1, ...

49

Cases Analysis

op

n1 n2n name name

op

n1 n2

op

n1 n2

op

n1 n2

label(n1) < label(n2) label(n2) label(n1) both labels r

50

The Function gencodeprocedure gencode(n);begin if n is a left leaf representing operand name and n is the leftmost child of its parent then print 'MOV' || name || ',' || top(rstack) else if n is an interior node with operator op, left child n1, and right child n2 then

if label(n2) = 0 then /* case 1 */

else if 1 label(n1) < label(n2) and label(n1) < r then /* case 2 */

else if 1 label(n2) label(n1) and label(n2) < r then /* case 3 */

else /* case 4, both labels r */end

51

The Function gencode/* case 1 */begin let name be the operand represented by n2; gencode(n1); print op || name || ',' || top(rstack)end

/* case 2 */begin swap(rstack); gencode(n2); R := pop(rstack); gencode(n1); print op || R || ',' || top(rstack); push(rstack, R); swap(rstack);end

52

The Function gencode/* case 3 */begin gencode(n1); R := pop(rstack); gencode(n2); print op || R || ',' || top(rstack); push(rstack, R);end

/* case 4 */begin gencode(n2); T := pop(tstack); print 'MOV' || top(rstack) || ',' || T; gencode(n1); push(tstack, T); print op || T || ',' || top(rstack);end

53

An Example

t1

t4

t2a b

c

t3

d

e1

2

1 2

0

1 0

11

gencode(t4) [R1, R0] /* 2 */ gencode(t3) [R0, R1] /* 3 */ gencode(e) [R0, R1] /* 0 */ print MOV e, R1 gencode(t2) [R0] /* 1 */ gencode(c) [R0] /* 0 */ print MOV c, R0 print ADD d, R0 print SUB R0, R1 gencode(t1) [R0] /* 1 */ gencode(a) [R0] /* 0 */ print MOV a, R0 print ADD b, R0 print SUB R1, R0

+

-+

-

54

Common Subexpressions

• Nodes with more than one parent in a dag are

called shared nodes

• Optimal code generation for dags on both a o

ne-register machine or an unlimited number o

f registers machine are NP-complete

55

Partitioning a DAG into Trees

• Partition a dag into a set of trees by finding for each root and shared node n, the maximal subtree with n as root that includes no other shared nodes, except as leaves

• Determine a code generation ordering for the trees

• Generate code for each tree using the algorithms for generating code from trees

56

An Example

32

1

-

+

*

a0 b0

c0

+

d0 e0

6

-+

*

4

5 7

1

32

+ * * e06

-+

*

44

4

75

+ e06

+-

*

e0

d0c0

+

a0 b0

6

57

Dynamic Programming Code Dynamic Programming Code GenerationGeneration

• The dynamic programming algorithm applies to a broad class of register machines with complex instruction sets

• Machines has r interchangeable registers

• Machines has instructions of the formRi = E

where E is any expression containing operators, registers, and memory locations. If E involves registers, then Ri must be one of them

58

Dynamic ProgrammingDynamic Programming

• The dynamic programming algorithm partitions the problem of generating optimal code for an expression into sub-problems of generating optimal code for the sub-expressions of the given expression

+

T1 T2

59

Contiguous EvaluationContiguous Evaluation

• We say a program P evaluates a tree T contiguously if

• it first evaluates those subtrees of T that need to be computed into memory

• it then evaluates the subtrees of the root in either order

• it finally evaluates the root

60

Optimally Contiguous ProgramOptimally Contiguous Program

• For the machines defined above, given any program P to evaluate an expression tree T, we can find an equivalent program P' such that– P' is of no higher cost than P

– P' uses no more registers than P

– P' evaluates the tree in a contiguous fashion

• This implies that every expression tree can be evaluated optimally by a contiguous program

61

Dynamic Programming AlgorithmDynamic Programming Algorithm

• Phase 1: compute bottom-up for each node n of the expression tree T an array C of costs, in which the ith component C[i] is the optimal cost of computing the subtree S rooted at n into a register, assuming i registers are available for the computation. C[0] is the optimal cost of computing the subtree S into memory

62

Dynamic Programming AlgorithmDynamic Programming Algorithm

• To compute C[i] at node n, consider each ma

chine instruction R := E whose expression E

matches the subexpression rooted at node n

• Determine the costs of evaluating the operan

ds of E by examining the cost vectors at the c

orresponding descendants of n

63

Dynamic Programming AlgorithmDynamic Programming Algorithm

• For those operands of E that are registers, consider all possible orders in which the corresponding subtrees of T can be evaluated into registers

• In each ordering, the first subtree corresponding to a register operand can be evaluated using i available registers, the second using i-1 registers, and so on

64

Dynamic Programming AlgorithmDynamic Programming Algorithm

• For node n, add in the cost of the instruction R := E that was used to match node n

• The value C[i] is then the minimum cost over all possible orders

• At each node, store the instruction used to achieve the best cost for C[i] for each i

• The smallest cost in the vector gives the minimum cost of evaluating T

65

Dynamic Programming AlgorithmDynamic Programming Algorithm

• Phase 2: traverse T and use the cost vectors

to determine which subtrees of T must be co

mputed into memory

• Phase 3: traverse T and use the cost vectors

and associated instructions to generate the fi

nal target code

66

An ExampleAn Example

Consider a machine with two registers R0 and R1and instructions Ri := Mj Mi := Ri Ri := Rj Ri := Ri op Rj Ri := Ri op Mj

-

+

(0, 1, 1)

(8, 8, 7)(3, 2, 2)

a b /*

(5, 5, 4)

(0, 1, 1)

(0, 1, 1)

(3, 2, 2)

e

c d

(0, 1, 1)

(0, 1, 1)

67

An ExampleAn Example

-

+

(0, 1, 1)

(8, 8, 7)(3, 2, 2)

a b /*

(5, 5, 4)

(0, 1, 1)

(0, 1, 1)

(3, 2, 2)

c

d e

(0, 1, 1)

(0, 1, 1)R0 := cR1 := dR1 := R1 / eR0 := R0 * R1R1 := aR1 := R1 - bR1 := R1 + R0

68

Code Generator GeneratorsCode Generator Generators

• A tool to automatically construct the instruction selection phrase of a code generator

• Such tools may use tree grammars or context free grammars to describe the target machines

• Register allocation will be implemented as a separate mechanism

69

Tree RewritingTree Rewriting

:=

ind +

memb const1+

+

+

ind

consti

consta regsp

regsp

a[i] := b + 1

70

Tree RewritingTree Rewriting• The code is generated by reducing the input

tree into a single node using a sequence of tree-rewriting rules

• Each tree rewriting rule is of the formreplacement template { action }

– replacement is a single node– template is a tree– action is a code fragment

• A set of tree-rewriting rules is called a tree-translation scheme

71

An ExampleAn Example

regi

+

regi regj

{ ADD Rj, Ri }

Each tree template represents a computation performed

by the sequence of machines instructions emitted by the

associated action

72

Tree Rewriting RulesTree Rewriting Rules

(1) regi constc { MOV #c, Ri }

(2) regi mema { MOV a, Ri }

(3):=

mema regi

mem { MOV Ri, a }

(4):=

ind regj

mem

regi

{ MOV Rj, *Ri }

+

constc regj

regi

ind

(5) { MOV c(Rj), Ri }

73

Tree Rewriting RulesTree Rewriting Rules

(6) regi { ADD c(Rj), Ri }

+const1

regi (8) { INC Ri }regi

+

regj

regi (7) { ADD Rj, Ri }regi

+

ind

regj

regi

+constc

74

An ExampleAn Example

:=

ind +

memb const1+

+

+

ind

consti

consta regsp

regsp(1){ MOV #a, R0 }

75

An ExampleAn Example

:=

ind +

memb const1+

+

+

ind

consti

reg0 regsp

regsp(7)

{ ADD SP, R0 }

76

An ExampleAn Example

:=

ind +

memb const1+

+

ind

consti

reg0

regsp

(5)(6)

{ MOV i (SP), R1 }

{ ADD i (SP), R0 }

77

An ExampleAn Example

:=

ind +

memb const1reg0

(2)

{ MOV b, R1 }

78

An ExampleAn Example

:=

ind +

reg1 const1reg0

(8)

{ INC R1 }

79

An ExampleAn Example

:=

ind reg1

reg0

(4)

{ MOV R1, *R0 }

80

Tree Pattern MatchingTree Pattern Matching

• The tree pattern matching algorithm can be implemented by extending the multiple-keyword pattern matching algorithm

• Each tree template is represented by a set of strings, each of which represents a path from the root to a leave

• Each rule is associated with cost information

• The dynamic programming algorithm can be used to select an optimal sequence of matches

81

Semantic PredicatesSemantic Predicates

regi

+

regi constc

{ if c = 1 thenINC Ri

elseADD #c, Ri }

The general use of semantic actions and predicates can

provide greater flexibility and ease of description than

a purely grammatical specification

82

Graph ColoringGraph Coloring

• In the first pass, target machine instructions are selected as though there were an infinite number of symbolic registers

• In the second pass, physical registers are assigned to symbolic registers using graph coloring algorithms

• During the second pass, if a register is needed when all available registers are used, some of the used registers must be spilled

83

Interference GraphInterference Graph

• For each procedure, a register-interference

graph is constructed

• The nodes in the graph are symbolic

registers

• An edge connects two nodes if one is live at

a point where the other is defined

84

K-Colorable GraphsK-Colorable Graphs

• A graph is said to be k-colorable if each node can be assigned one of the k colors such that no two adjacent nodes have the same color

• A color represents a register

• The problem of determining whether a graph is k-colorable is NP-complete

85

A Graph Coloring AlgorithmA Graph Coloring Algorithm

• Remove a node n and its edges if it has fewer than k neighbors

• Repeat the removing step above until we end up with the empty graph or a graph in which each node has k or more adjacent nodes

• In the latter case, a node is selected and spilled by deleting that node and its edges, and the removing step above continues

86

A Graph Coloring AlgorithmA Graph Coloring Algorithm

• The nodes in the graph can be colored in the

reverse order in which they are removed

• Each node can be assigned a color not

assigned to any of its neighbors

• Spilled nodes can be assigned any color

87

An ExampleAn Example

1

3

4

2

5

3

4

2

5

3

4 5 4 5 5

88

An ExampleAn Example

G

B

G

R

R

B

G

R

R

B

G R G R R

89

Peephole Optimization

• Improve the performance of the target program by examining and transforming a short sequence of target instructions

• May need repeated passes over the code

• Can also be applied directly after intermediate code generation

90

Examples

• Redundant loads and storesMOV R0, aMOV a, Ro

• Algebraic Simplificationx := x + 0x := x * 1

• Constant foldingx := 2 + 3 x := 5y := x + 3 y := 8

91

Examples

• Unreachable code#define debug 0if (debug) (print debugging information)

if 0 <> 1 goto L1 print debugging informationL1:

if 1 goto L1 print debugging informationL1:

92

Examples

• Flow-of-control optimization

goto L1 goto L2… …

L1: goto L2 L2: goto L2

goto L1 if a < b goto L2… goto L3

L1: if a < b goto L2 …L3: L3:

93

Examples

• Reduction in strength: replace expensive operations by cheaper ones– x2 x * x– fixed-point multiplication and division by a

power of 2 shift– floating-point division by a constant floating-

point multiplication by a constant

94

Examples

• Use of machine Idioms: hardware instructions for certain specific operations– auto-increment and auto-decrement addressing

mode (push or pop stack in parameter passing)