國立台灣大學 資訊工程學系 薛智文 [email protected] cwhsueh/ 98 spring compiler...
TRANSCRIPT
國立台灣大學資訊工程學系
http://www.csie.ntu.edu.tw/~cwhsueh/98 Spring
Compiler
TH 234, DTH 103
資工系網媒所 NEWS實驗室
17:36 /272
資工系網媒所 NEWS實驗室
17:36 /273
Why Study Compilers?
1. Excellent software-engineering example --- theory meets practice.
2. Essential software tool.3. Influences hardware design, e.g., RISC, VLIW.4. Tools (mostly “optimization”) for enhancing
software reliability and security.
資工系網媒所 NEWS實驗室
17:36 /274
Compilers & Architecture
Modern architectures have very complex structures, especially opportunities for parallel execution.Sequential programs can only make effective use of these features via an optimizing compiler.Hardware question: If we implemented this, could a compiler use it?
資工系網媒所 NEWS實驗室
17:36 /275
Software Reliability
Optimization technology (data-flow analysis) used in:
Lock/unlock errors.
Buffers not range-checked.
Memory Leaks.
SQL injection bugs..
資工系網媒所 NEWS實驗室
17:36 /276
What this Course Offers?
Compiler methodology for both compiler implementation and related applications.
Theoretical framework.
Key algorithms.
Hands-on experience.
Nongoal: build a complete optimizing compiler.
資工系網媒所 NEWS實驗室
17:36 /277
Course Outline
Part 1 --- Introduction.Part 2 --- Scanner.Part 3 --- Parser.Part 4 --- Syntax-Directed Translation.Part 5 --- Symbol Table. Part 6 --- Intermediate Code Generation. Part 7 --- Run Time Storage Organization.Part 8 --- Optimization. Part 9 --- How to Write a Compiler.Part 10 --- A Simple Code Generation (PSEUDO) Example.
資工系網媒所 NEWS實驗室
17:36 /278
Introduction
Compiler is one of language processors.
input
Compiler
source program
target program output
資工系網媒所 NEWS實驗室
17:36 /279
What is a Compiler?Definitions:
The software system translates description of computations into a program executable by a computer.Source and target must be equivalent!
Compiler writing spans:programming languages;machine architecture;language theory;algorithms and data structures;software engineering.
History:1950: the first FORTRAN compiler took 18 man-years;now: using software tools, can be done in a few months as a student’s project.
input
Compiler
source program
target program output
資工系網媒所 NEWS實驗室
17:36 /2710
An Interpreter
inputInterpretersource program
output
資工系網媒所 NEWS實驗室
17:36 /2711
A Hybrid Compiler
Translator
source program
input
Virtual Machine
intermediate programoutput
資工系網媒所 NEWS實驗室
17:36 /2712
A Language-Processing Systemsource program
library filesrelocatable object files
Preprocessor
modified source program
Compiler
target assembly program
Assembler
relocatable machine code
Linker/Loader
target machine code
資工系網媒所 NEWS實驗室
17:36 /2713
Applications
Computer language compilers.
Translator: from one format to another.query interpreter
text formatter
silicon compiler
infix notation postfix notation:
pretty printers
· · ·
Software productivity tools.
3 + 5 – 6 * 6 3 5 + 6 6 * –
資工系網媒所 NEWS實驗室
17:36 /2714
Relations with Computational Theory
Computational theory:a set of grammar rules ≡ the definition of a particular machine.
also equivalent to a set of languages recognized by this machine.
a type of machines: a family of machines with a given set of operations, or capabilities;
power of a type of machines
≡ the set of languages that can be recognized by this type of machines.
資工系網媒所 NEWS實驗室
17:36 /2715
A Language-Processing Systemsource program
library filesrelocatable object files
Preprocessor
modified source program
Compiler
target assembly program
Assembler
relocatable machine code
Linker/Loader
target machine code
資工系網媒所 NEWS實驗室
17:36 /2716
Phases of a Compilercharacter stream
Lexical Analyzer (scanner)token stream
Machine-Independent Code Optimizeroptimized intermediate representation
Code Generatorrelocatable machine code
Machine-Dependent Code Optimizer
target-machine code
Semantic Analyzerannotated abstract-syntax tree
Syntax Analyzer (parser)abstract-syntax tree
Intermediate Code Generatorintermediate representation
SymbolTable
ErrorHandling
front-end
back-end
資工系網媒所 NEWS實驗室
17:37 /2717
Lexical Analyzer (Scanner)
Actions:Reads characters from the source program;Groups characters into lexemeslexemes , i.e., sequences of characters that “go together”, following a given patternpattern ;Each lexeme corresponds to a tokentoken .
the scanner returns the next token, plus maybe some additional information, to the parser;
The scanner may also discover lexical errors, i.e., erroneous characters.
The definitions of what a lexemelexeme, tokentoken or badbad charactercharacter is depend on the definition of the source language.
資工系網媒所 NEWS實驗室
17:37 /2718
Scanner Example for CLexeme: C statement
position = initial + rate * 60;
(Lexeme) position = initial + rate * 60 ; < id,1> <=> <id,2> <+> <id,3> < * > <60> <;> (Token) ID ASSIGN ID PLUS ID TIME INT SEMI-COL
Arbitrary number of blanks (white spaces) between lexemes.Erroneous sequence of characters, that are not parts of comments, for the C language:
control characters@2abc
1 position …
2 initial …
3 rate …
Symbol Table
資工系網媒所 NEWS實驗室
17:37 /2719
Syntax Analyzer (Parser)
Actions:Group tokens into grammatical phrasesgrammatical phrases , to discover the underlying structure of the sourceFind syntax errorssyntax errors , e.g., the following C source line:
(Lexeme) index = 12 * ; (Token) ID ASSIGN INT TIMES SEMI-COL Every token is legal, but the sequence is erroneous!Every token is legal, but the sequence is erroneous!
May find some static semantic errorsstatic semantic errors , e.g., use of undeclared variables or multiple declared variables.May generate code, or build some intermediate representation of the source program, such as an abstract-syntax tree.
資工系網媒所 NEWS實驗室
17:37 /2720
Source code: position = initial + rate * 60 < id,1> <=> <id,2> <+> <id,3> < * > <60>
Abstract-syntax tree:
interior nodes of the tree are OPERATORS;a node’s children are its OPERANDS;each subtree forms a logical unitlogical unit .the subtree with * at its root shows that * has higher precedence than +, the operation “rate * 60” must be performed as a unit, not “initial + rate”.
Where is ”;”?
Parser Example for C
=+
*<id,3>
< id,1><id,2>
60
1 position …
2 initial …
3 rate …
Symbol Table
資工系網媒所 NEWS實驗室
17:37 /2721
Semantic AnalyzerActions:
Check for more static semantic errors, e.g., type errorstype errors .
May annotate and/or change the abstract syntax tree.
=+
*<id,3>
< id,1><id,2>
60
=+
*<id,3>
< id,1><id,2>
int_to_float
60
1 position …
2 initial …
3 rate …
Symbol Table
資工系網媒所 NEWS實驗室
17:37 /2722
Intermediate Code GeneratorActions: translate from abstract-syntax trees to intermediate codes.One choice for intermediate code is 3-address code :
Each statement containsat most 3 operands;in addition to “=”, i.e., assignment, at most one operator.
An ”easy” and “universal” format that can be translated into most assembly languages.
t1 = int_to_float(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
=+
*<id,3>
< id,1><id,2>
int_to_float
60
資工系網媒所 NEWS實驗室
17:37 /2723
Optimizer
Improve the efficiency of intermediate code.Goal may be to make code run fasterfaster , and/or to use the least number of registers · · ·
Current trends:to obtain smaller, but maybe slower, equivalent code for embedded systems;to reduce power consumption.
t1 = id3 * 60.0
id1 = id2 + t1
t1 = int_to_float(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
資工系網媒所 NEWS實驗室
17:37 /2724
Code GenerationA compiler may generate
pure machine codes (machine dependent assembly language) directly, which is rarewhich is rare now ;virtual machine code.
Example:PASCAL compiler P-code interpreter executionSpeed is roughly 4 times slower than running directly generated machine codes.
Advantages:simplify the job of a compiler;decrease the size of the generated code: 1/3 for P-code1/3 for P-code ;can be run easily on a variety of platforms
P-machine is an ideal general machine whose interpreter can be written easily;divide and conquer;recent example: JAVA and Byte-code.
資工系網媒所 NEWS實驗室
17:37 /2725
Code Generation Example
LDF R2, id3
MULF R2, R2, #60.0
LDF R1, id2
ADDF R1, R1, R2
STF id1, R1
t1 = id3 * 60.0
id1 = id2 + t1
資工系網媒所 NEWS實驗室
17:37 /2726
Practical Considerations (1/2)
Preprocessing phase:macro substitution:
#define MAXC 10
rational preprocessing: add new features for old languages.
BASIC
C C ++
compiler directives:#include <stdio.h>
non-standard language extensions.adding parallel primitives
資工系網媒所 NEWS實驗室
17:37 /2727
Practical Considerations (2/2)
Passes of compilingA pass reads an input file and writes an output file.First pass reads the text file once.May need to read the text one more time for any forward addressed objects, i.e., anything that is used before its declaration.Example: C language
goto error_handling;
· · ·
error_handling:
· · ·
資工系網媒所 NEWS實驗室
17:37 /2728
Reduce Number of PassesEach pass takes I/O time.Back-patchingBack-patching : leave a blank slot for missing information, and fill in the empty slot when the information becomes available.Example: C language
when a label is usedif it is not defined before, save a tracetrace into the to-be-processed table
label name corresponds to LABEL TABLE[i]code generated: GOTO LABEL TABLE[i]
when a label is definedcheck known labels for redefined labelsif it is not used before, save a trace into the to-be-processed tableif it is used before, then find its trace and fill the current address into the trace
Time and space trade-off !Time and space trade-off !