國立台灣大學 資訊工程學系 薛智文 [email protected] cwhsueh/ 98 spring compiler...

28
國國國國國國 國國國國國國 薛薛薛 [email protected] http://www.csie.ntu.edu.tw/~cwhsue h/ 98 Spring Compiler TH 234, DTH 103

Upload: bruno-watson

Post on 14-Dec-2015

240 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

國立台灣大學資訊工程學系

薛智文[email protected]

http://www.csie.ntu.edu.tw/~cwhsueh/98 Spring

Compiler

TH 234, DTH 103

Page 2: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:36 /272

Page 3: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:36 /273

Why Study Compilers?

1. Excellent software-engineering example --- theory meets practice.

2. Essential software tool.3. Influences hardware design, e.g., RISC, VLIW.4. Tools (mostly “optimization”) for enhancing

software reliability and security.

Page 4: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:36 /274

Compilers & Architecture

Modern architectures have very complex structures, especially opportunities for parallel execution.Sequential programs can only make effective use of these features via an optimizing compiler.Hardware question: If we implemented this, could a compiler use it?

Page 5: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:36 /275

Software Reliability

Optimization technology (data-flow analysis) used in:

Lock/unlock errors.

Buffers not range-checked.

Memory Leaks.

SQL injection bugs..

Page 6: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:36 /276

What this Course Offers?

Compiler methodology for both compiler implementation and related applications.

Theoretical framework.

Key algorithms.

Hands-on experience.

Nongoal: build a complete optimizing compiler.

Page 7: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:36 /277

Course Outline

Part 1 --- Introduction.Part 2 --- Scanner.Part 3 --- Parser.Part 4 --- Syntax-Directed Translation.Part 5 --- Symbol Table. Part 6 --- Intermediate Code Generation. Part 7 --- Run Time Storage Organization.Part 8 --- Optimization. Part 9 --- How to Write a Compiler.Part 10 --- A Simple Code Generation (PSEUDO) Example.

Page 8: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:36 /278

Introduction

Compiler is one of language processors.

input

Compiler

source program

target program output

Page 9: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:36 /279

What is a Compiler?Definitions:

The software system translates description of computations into a program executable by a computer.Source and target must be equivalent!

Compiler writing spans:programming languages;machine architecture;language theory;algorithms and data structures;software engineering.

History:1950: the first FORTRAN compiler took 18 man-years;now: using software tools, can be done in a few months as a student’s project.

input

Compiler

source program

target program output

Page 10: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:36 /2710

An Interpreter

inputInterpretersource program

output

Page 11: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:36 /2711

A Hybrid Compiler

Translator

source program

input

Virtual Machine

intermediate programoutput

Page 12: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:36 /2712

A Language-Processing Systemsource program

library filesrelocatable object files

Preprocessor

modified source program

Compiler

target assembly program

Assembler

relocatable machine code

Linker/Loader

target machine code

Page 13: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:36 /2713

Applications

Computer language compilers.

Translator: from one format to another.query interpreter

text formatter

silicon compiler

infix notation postfix notation:

pretty printers

· · ·

Software productivity tools.

3 + 5 – 6 * 6 3 5 + 6 6 * –

Page 14: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:36 /2714

Relations with Computational Theory

Computational theory:a set of grammar rules ≡ the definition of a particular machine.

also equivalent to a set of languages recognized by this machine.

a type of machines: a family of machines with a given set of operations, or capabilities;

power of a type of machines

≡ the set of languages that can be recognized by this type of machines.

Page 15: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:36 /2715

A Language-Processing Systemsource program

library filesrelocatable object files

Preprocessor

modified source program

Compiler

target assembly program

Assembler

relocatable machine code

Linker/Loader

target machine code

Page 16: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:36 /2716

Phases of a Compilercharacter stream

Lexical Analyzer (scanner)token stream

Machine-Independent Code Optimizeroptimized intermediate representation

Code Generatorrelocatable machine code

Machine-Dependent Code Optimizer

target-machine code

Semantic Analyzerannotated abstract-syntax tree

Syntax Analyzer (parser)abstract-syntax tree

Intermediate Code Generatorintermediate representation

SymbolTable

ErrorHandling

front-end

back-end

Page 17: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:37 /2717

Lexical Analyzer (Scanner)

Actions:Reads characters from the source program;Groups characters into lexemeslexemes , i.e., sequences of characters that “go together”, following a given patternpattern ;Each lexeme corresponds to a tokentoken .

the scanner returns the next token, plus maybe some additional information, to the parser;

The scanner may also discover lexical errors, i.e., erroneous characters.

The definitions of what a lexemelexeme, tokentoken or badbad charactercharacter is depend on the definition of the source language.

Page 18: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:37 /2718

Scanner Example for CLexeme: C statement

position = initial + rate * 60;

(Lexeme) position = initial + rate * 60 ; < id,1> <=> <id,2> <+> <id,3> < * > <60> <;> (Token) ID ASSIGN ID PLUS ID TIME INT SEMI-COL

Arbitrary number of blanks (white spaces) between lexemes.Erroneous sequence of characters, that are not parts of comments, for the C language:

control characters@2abc

1 position …

2 initial …

3 rate …

Symbol Table

Page 19: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:37 /2719

Syntax Analyzer (Parser)

Actions:Group tokens into grammatical phrasesgrammatical phrases , to discover the underlying structure of the sourceFind syntax errorssyntax errors , e.g., the following C source line:

(Lexeme) index = 12 * ; (Token) ID ASSIGN INT TIMES SEMI-COL Every token is legal, but the sequence is erroneous!Every token is legal, but the sequence is erroneous!

May find some static semantic errorsstatic semantic errors , e.g., use of undeclared variables or multiple declared variables.May generate code, or build some intermediate representation of the source program, such as an abstract-syntax tree.

Page 20: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:37 /2720

Source code: position = initial + rate * 60 < id,1> <=> <id,2> <+> <id,3> < * > <60>

Abstract-syntax tree:

interior nodes of the tree are OPERATORS;a node’s children are its OPERANDS;each subtree forms a logical unitlogical unit .the subtree with * at its root shows that * has higher precedence than +, the operation “rate * 60” must be performed as a unit, not “initial + rate”.

Where is ”;”?

Parser Example for C

=+

*<id,3>

< id,1><id,2>

60

1 position …

2 initial …

3 rate …

Symbol Table

Page 21: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:37 /2721

Semantic AnalyzerActions:

Check for more static semantic errors, e.g., type errorstype errors .

May annotate and/or change the abstract syntax tree.

=+

*<id,3>

< id,1><id,2>

60

=+

*<id,3>

< id,1><id,2>

int_to_float

60

1 position …

2 initial …

3 rate …

Symbol Table

Page 22: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:37 /2722

Intermediate Code GeneratorActions: translate from abstract-syntax trees to intermediate codes.One choice for intermediate code is 3-address code :

Each statement containsat most 3 operands;in addition to “=”, i.e., assignment, at most one operator.

An ”easy” and “universal” format that can be translated into most assembly languages.

t1 = int_to_float(60)

t2 = id3 * t1

t3 = id2 + t2

id1 = t3

=+

*<id,3>

< id,1><id,2>

int_to_float

60

Page 23: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:37 /2723

Optimizer

Improve the efficiency of intermediate code.Goal may be to make code run fasterfaster , and/or to use the least number of registers · · ·

Current trends:to obtain smaller, but maybe slower, equivalent code for embedded systems;to reduce power consumption.

t1 = id3 * 60.0

id1 = id2 + t1

t1 = int_to_float(60)

t2 = id3 * t1

t3 = id2 + t2

id1 = t3

Page 24: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:37 /2724

Code GenerationA compiler may generate

pure machine codes (machine dependent assembly language) directly, which is rarewhich is rare now ;virtual machine code.

Example:PASCAL compiler P-code interpreter executionSpeed is roughly 4 times slower than running directly generated machine codes.

Advantages:simplify the job of a compiler;decrease the size of the generated code: 1/3 for P-code1/3 for P-code ;can be run easily on a variety of platforms

P-machine is an ideal general machine whose interpreter can be written easily;divide and conquer;recent example: JAVA and Byte-code.

Page 25: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:37 /2725

Code Generation Example

LDF R2, id3

MULF R2, R2, #60.0

LDF R1, id2

ADDF R1, R1, R2

STF id1, R1

t1 = id3 * 60.0

id1 = id2 + t1

Page 26: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:37 /2726

Practical Considerations (1/2)

Preprocessing phase:macro substitution:

#define MAXC 10

rational preprocessing: add new features for old languages.

BASIC

C C ++

compiler directives:#include <stdio.h>

non-standard language extensions.adding parallel primitives

Page 27: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:37 /2727

Practical Considerations (2/2)

Passes of compilingA pass reads an input file and writes an output file.First pass reads the text file once.May need to read the text one more time for any forward addressed objects, i.e., anything that is used before its declaration.Example: C language

goto error_handling;

· · ·

error_handling:

· · ·

Page 28: 國立台灣大學 資訊工程學系 薛智文 cwhsueh@csie.ntu.edu.tw cwhsueh/ 98 Spring Compiler TH 234, DTH 103

資工系網媒所 NEWS實驗室

17:37 /2728

Reduce Number of PassesEach pass takes I/O time.Back-patchingBack-patching : leave a blank slot for missing information, and fill in the empty slot when the information becomes available.Example: C language

when a label is usedif it is not defined before, save a tracetrace into the to-be-processed table

label name corresponds to LABEL TABLE[i]code generated: GOTO LABEL TABLE[i]

when a label is definedcheck known labels for redefined labelsif it is not used before, save a trace into the to-be-processed tableif it is used before, then find its trace and fill the current address into the trace

Time and space trade-off !Time and space trade-off !