introduction to compilation

34
1 Introduction to Compilation Aaron Bloomfield CS 415 Fall 2005

Upload: dima

Post on 11-Jan-2016

28 views

Category:

Documents


4 download

DESCRIPTION

Introduction to Compilation. Aaron Bloomfield CS 415 Fall 2005. Interpreters & Compilers. Interpreter A program that reads a source program and produces the results of executing that program Compiler - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to Compilation

1

Introduction to Compilation

Aaron Bloomfield

CS 415

Fall 2005

Page 2: Introduction to Compilation

2

Interpreters & Compilers

• Interpreter– A program that reads a source program and

produces the results of executing that program

• Compiler– A program that translates a program from one

language (the source) to another (the target)

Page 3: Introduction to Compilation

3

Common Issues

• Compilers and interpreters both must read the input – a stream of characters – and “understand” it; analysis

w h i l e ( k < l e n g t h ) { <nl> <tab> i f ( a [ k ] > 0

) <nl> <tab> <tab>{ n P o s + + ; } <nl> <tab> }

Page 4: Introduction to Compilation

4

Interpreter

• Interpreter– Execution engine– Program execution interleaved with analysis

running = true;while (running) { analyze next statement; execute that statement;}

– May involve repeated analysis of some statements (loops, functions)

Page 5: Introduction to Compilation

5

Compiler

• Read and analyze entire program• Translate to semantically equivalent program in

another language– Presumably easier to execute or more efficient– Should “improve” the program in some fashion

• Offline process– Tradeoff: compile time overhead (preprocessing

step) vs execution performance

Page 6: Introduction to Compilation

6

Typical Implementations

• Compilers– FORTRAN, C, C++, Java, COBOL, etc. etc.– Strong need for optimization, etc.

• Interpreters– PERL, Python, awk, sed, sh, csh, postscript

printer, Java VM– Effective if interpreter overhead is low relative

to execution cost of language statements

Page 7: Introduction to Compilation

7

Hybrid approaches

• Well-known example: Java– Compile Java source to byte codes – Java Virtual

Machine language (.class files)– Execution

• Interpret byte codes directly, or• Compile some or all byte codes to native code

– (particularly for execution hot spots)– Just-In-Time compiler (JIT)

• Variation: VS.NET– Compilers generate MSIL– All IL compiled to native code before execution

Page 8: Introduction to Compilation

8

Compilers: The Big picture

Source code

Compiler

Assembly code

AssemblerObject code(machine code)

Fully-resolved objectcode (machine code)

Executable image

Linker

Loader

Page 9: Introduction to Compilation

9

Idea: Translate in Steps

• Series of program representations

• Intermediate representations optimized for program manipulations of various kinds (checking, optimization)

• Become more machine-specific, less language-specific as translation proceeds

Page 10: Introduction to Compilation

10

Structure of a Compiler

• First approximation– Front end: analysis

• Read source program and understand its structure and meaning

– Back end: synthesis• Generate equivalent target language program

Source TargetFront End Back End

Page 11: Introduction to Compilation

11

Implications

• Must recognize legal programs (& complain about illegal ones)

• Must generate correct code• Must manage storage of all variables• Must agree with OS & linker on target format

Source TargetFront End Back End

Page 12: Introduction to Compilation

12

More Implications

• Need some sort of Intermediate Representation (IR)

• Front end maps source into IR• Back end maps IR to target machine code

Source TargetFront End Back End

Page 13: Introduction to Compilation

13

Standard Compiler StructureSource code(character stream)

Lexical analysis

ParsingToken stream

Abstract syntax tree

Intermediate Code Generation

Intermediate code

Optimization

Code generationAssembly code

Intermediate code

Front end(machine-independent)

Back end(machine-dependent)

Page 14: Introduction to Compilation

14

Front End

• Split into two parts– Scanner: Responsible for converting character

stream to token stream• Also strips out white space, comments

– Parser: Reads token stream; generates IR

• Both of these can be generated automatically– Source language specified by a formal grammar– Tools read the grammar and generate scanner &

parser (either table-driven or hard coded)

Scanner Parsersource tokens IR

Page 15: Introduction to Compilation

15

Tokens

• Token stream: Each significant lexical chunk of the program is represented by a token– Operators & Punctuation: {}[]!+-=*;: …– Keywords: if while return goto– Identifiers: id & actual name– Constants: kind & value; int, floating-point

character, string, …

Page 16: Introduction to Compilation

16

Scanner Example

• Input text// this statement does very littleif (x >= y) y = 42;

• Token Stream

– Note: tokens are atomic items, not character strings

IF LPAREN ID(x) GEQ ID(y)

RPAREN ID(y) BECOMES INT(42) SCOLON

Page 17: Introduction to Compilation

17

Parser Output (IR)

• Many different forms– (Engineering tradeoffs)

• Common output from a parser is an abstract syntax tree– Essential meaning of the program without the

syntactic noise

Page 18: Introduction to Compilation

18

Parser Example

• Token Stream Input • Abstract Syntax Tree

IF LPAREN ID(x)

GEQ ID(y) RPAREN

ID(y) BECOMES

INT(42) SCOLON

ifStmt

>=

ID(x) ID(y)

assign

ID(y) INT(42)

Page 19: Introduction to Compilation

19

Static Semantic Analysis

• During or (more common) after parsing– Type checking– Check for language requirements like “declare

before use”, type compatibility– Preliminary resource allocation– Collect other information needed by back end

analysis and code generation

Page 20: Introduction to Compilation

20

Back End

• Responsibilities– Translate IR into target machine code– Should produce fast, compact code– Should use machine resources effectively

• Registers• Instructions• Memory hierarchy

Page 21: Introduction to Compilation

21

Back End Structure

• Typically split into two major parts with sub phases– “Optimization” – code improvements

• May well translate parser IR into another IR

– Code generation• Instruction selection & scheduling• Register allocation

Page 22: Introduction to Compilation

22

The Result

• Input

if (x >= y) y = 42;

• Output

mov eax,[ebp+16]

cmp eax,[ebp-8]

jl L17

mov [ebp-8],42

L17:

Page 23: Introduction to Compilation

23

Optimized Codes4addq $16,0,$0mull $16,$0,$0addq $16,1,$16mull $0,$16,$0mull $0,$16,$0ret $31,($26),1

Unoptimized Code

Example (Output assembly code)

lda $30,-32($30)stq $26,0($30)stq $15,8($30)bis $30,$30,$15bis $16,$16,$1stl $1,16($15)lds $f1,16($15)sts $f1,24($15)ldl $5,24($15)bis $5,$5,$2s4addq $2,0,$3ldl $4,16($15)mull $4,$3,$2ldl $3,16($15)addq $3,1,$4mull $2,$4,$2ldl $3,16($15)addq $3,1,$4mull $2,$4,$2stl $2,20($15)ldl $0,20($15)br $31,$33

$33:bis $15,$15,$30ldq $26,0($30)ldq $15,8($30)addq $30,32,$30ret $31,($26),1

Page 24: Introduction to Compilation

24

Compilation in a Nutshell 1

Source code(character stream)

Lexical analysis

Parsing

Token stream

Abstract syntax tree(AST)

Semantic Analysis

if (b == 0) a = b;

if ( b ) a = b ;0==

if==

b 0

=

a b

if

==

int b int 0

=

int alvalue

int b

boolean

Decorated ASTint

;

;

Page 25: Introduction to Compilation

25

Compilation in a Nutshell 2

Intermediate Code Generation

Optimization

Code generation

if

==

int b int 0

=

int alvalue

int b

boolean int;

CJUMP ==

MEM

fp 8

+

CONST MOVE

0 MEM MEM

fp 4 fp 8

NOP

+ +

CJUMP ==

CONST

MOVE

0 DX

CX

NOP

CX CMP CX, 0

CMOVZ DX,CX

Page 26: Introduction to Compilation

26

Why Study Compilers? (1)

• Compiler techniques are everywhere– Parsing (little languages, interpreters)– Database engines– AI: domain-specific languages– Text processing

• Tex/LaTex -> dvi -> Postscript -> pdf

– Hardware: VHDL; model-checking tools– Mathematics (Mathematica, Matlab)

Page 27: Introduction to Compilation

27

Why Study Compilers? (2)

• Fascinating blend of theory and engineering– Direct applications of theory to practice

• Parsing, scanning, static analysis

– Some very difficult problems (NP-hard or worse)

• Resource allocation, “optimization”, etc.• Need to come up with good-enough solutions

Page 28: Introduction to Compilation

28

Why Study Compilers? (3)

• Ideas from many parts of CSE– AI: Greedy algorithms, heuristic search– Algorithms: graph algorithms, dynamic

programming, approximation algorithms– Theory: Grammars DFAs and PDAs, pattern

matching, fixed-point algorithms– Systems: Allocation & naming, synchronization,

locality– Architecture: pipelines & hierarchy management,

instruction set use

Page 29: Introduction to Compilation

29

Programming Language Specs

• Since the 1960s, the syntax of every significant programming language has been specified by a formal grammar– First done in 1959 with BNF (Backus-Naur

Form or Backus-Normal Form) used to specify the syntax of ALGOL 60

– Borrowed from the linguistics community (Chomsky?)

Page 30: Introduction to Compilation

30

Grammar for a Tiny Language

• program ::= statement | program statement• statement ::= assignStmt | ifStmt• assignStmt ::= id = expr ;• ifStmt ::= if ( expr ) stmt• expr ::= id | int | expr + expr• Id ::= a | b | c | i | j | k | n | x | y | z• int ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Page 31: Introduction to Compilation

31

Productions• The rules of a grammar are called productions• Rules contain

– Nonterminal symbols: grammar variables (program, statement, id, etc.)

– Terminal symbols: concrete syntax that appears in programs (a, b, c, 0, 1, if, (, …)

• Meaning of nonterminal ::= <sequence of terminals and nonterminals>

In a derivation, an instance of nonterminal can be replaced by the sequence of terminals and nonterminals on the right of the production

• Often, there are two or more productions for a single nonterminal – can use either at different times

Page 32: Introduction to Compilation

32

Alternative Notations

• There are several syntax notations for productions in common use; all mean the same thingifStmt ::= if ( expr ) stmt

ifStmt if ( expr ) stmt

<ifStmt> ::= if ( <expr> ) <stmt>

Page 33: Introduction to Compilation

33

Example derivation

a = 1 ;

if ( a + 1 )

b = 2 ;

program ::= statement | program statementstatement ::= assignStmt | ifStmtassignStmt ::= id = expr ;ifStmt ::= if ( expr ) stmtexpr ::= id | int | expr + exprid ::= a | b | c | i | j | k | n | x | y | zint ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

program

program

ID(a) expr

stmt

stmt

assign

int (1)

ifStmt

expr stmt

expr expr

int (1)ID(a) ID(b) expr

assign

int (2)

Page 34: Introduction to Compilation

34

Parsing

• Parsing: reconstruct the derivation (syntactic structure) of a program

• In principle, a single recognizer could work directly from the concrete, character-by-character grammar

• In practice this is never done