淺談編譯器最佳化技術
DESCRIPTION
大部分人對於編譯器印象停留在課堂上所教導的語法分析的理論以及實作, 而讓許多人對於編譯器技術相當怯步, 但在整個編譯過程中語法分析只是個開端, 但其實編譯器技術最有趣的部份在於中後端的最佳化技術, 其神秘的技術可讓程式加速許多, 在這次的分享中主要會介紹一些編譯器的基礎最佳化, 例如 Propagation, Dead Code Elimination, Inline, Common Subexpression Elimination 及 Loop Unrolling 等, 並透過 llvm 的一些小工具來輔助觀察這些最佳化的結果, 以此作為入門磚來了解編譯器如何運作。TRANSCRIPT
2
自我介紹
安第斯山脈Compiler Team專業打雜工
3
Compiler?
4
Compiler?
5
Compilation Flow
[1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
6
Compilation Flow
[1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
通常大學部編譯器課程僅能涵蓋 Parser 部份
以及陽春的 Code Generation
7
Compilation Flow
[1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
但 Compiler 超好玩超神奇的部份其實都在最佳化的地方
8
Compilation Flow
[1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
但 Compiler 超好玩超神奇的部份其實都在最佳化的地方
透過最佳化 ,程式可以變得又小又快!
9
Compilation Flow
[1] Compilers: Principles, Techniques, and Tools (2nd Edition) p.5
10
•這次分享基本上不會涉及太多高深理論 , 僅會透過介紹概念並透過範例來講解
•使用 LLVM 來作為說明以及展示的輔助工具
11基礎知識惡補
12
基礎知識惡補
• Basic Block• Control Flow Graph• Static Single Assignment Form
13
Basic Block
•單一進入點 , 單一出口點的程式區段
• http://en.wikipedia.org/wiki/Basic_block
14
Control Flow Graph
•簡稱 CFG, 簡單來說就是程式的流程圖
• http://en.wikipedia.org/wiki/Control_flow_graph
15
Basic Block
int foo (ini n){ int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret;}
16
Basic Block
int foo (ini n){ int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret;}
17
Basic Block
int foo (ini n){ int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret;}
int ret;if (n > 10)
ret = n * 2; ret = n + 2;
return ret;
18
CFG
int foo (ini n){ int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret;}
int ret;if (n > 10)
ret = n * 2; ret = n + 2;
return ret;
19
Basic Block
int sum (int n){ int ret = 0; int i; for (i = 0; i < n; ++i) ret += i; return ret;}
20
Basic Block
int sum (int n){ int ret = 0; int i; for (i = 0; i < n; ++i) ret += i; return ret;}
21
Basic Block
int sum (int n){ int ret = 0; int i; for (i = 0; i < n; ++i) ret += i; return ret;}
int ret = 0;int i;
i = 0;
i < n;
ret += i;
++i
return ret
22
CFG
int sum (int n){ int ret = 0; int i; for (i = 0; i < n; ++i) ret += i; return ret;}
int ret = 0;int i;
i = 0;
i < n;
ret += i;
++i
return ret
23
Static Single Assignment
•將變數標上版本
•每個值只會賦值 /寫入一次
• http://en.wikipedia.org/wiki/Static_single_assignment_form
24
SSA
int foo (){ int ret; ret = 10; ret = 20; return ret;}
25
SSA
int foo (){ int ret; ret = 10; ret = 20; return ret;}
int foo (){ int ret; ret1 = 10; ret2 = 20; return ret2;}
每次賦值都會一個版本號
26
SSA
int foo (){ int ret; ret = 10; ret = 20; return ret;}
int foo (){ int ret; ret1 = 10; ret2 = 20; return ret2;}
每次賦值都會一個版本號
標完後可以馬上知道是使用哪個運算式的結果
27
SSA
int foo (ini n){ int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret;}
28
SSA
int foo (ini n){ int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret;}
int foo (ini n){ int ret; if (n > 10) ret1 = n * 2; else ret2 = n + 2; return ret?;}
程式中有分歧點會合時無法判定是從何而來
29
SSA
int foo (ini n){ int ret; if (n > 10) ret = n * 2; else ret = n + 2; return ret;}
int foo (ini n){ int ret; if (n > 10) ret1 = n * 2; else ret2 = n + 2; ret3 = Φ (ret1, ret2) return ret3;}
此時需要使用Φ來處理這種情況 ,表示值的定義
需由程式流程決定並給予新的版本號
30L L V M
31
LLVM
•好用好玩而且最近很夯的 Compiler, 安裝方法如下 :– sudo apt-get install llvm clang xdot– sudo yum install llvm clang python-xdot python-setuptools
32
LLVM
•好用好玩而且最近很夯的 Compiler, 安裝方法如下 :– sudo apt-get install llvm clang xdot– sudo yum install llvm clang python-xdot python-setuptools
xdot 是要看圖用的
33
LLVM
•好用好玩而且最近很夯的 Compiler, 安裝方法如下 :– sudo apt-get install llvm clang xdot– sudo yum install llvm clang python-xdot python-setuptools
xdot 是要看圖用的
這個嘛 ...Fedora 套件系統相依性沒設定好 , xdot 的相依套件
34
LLVM
•好用好玩而且最近很夯的 Compiler, 安裝方法如下 :– sudo apt-get install llvm clang xdot– sudo yum install llvm clang python-xdot python-setuptools
– 不是 apt-get 或 yum ? 那就假設你是高手會自己想辦法 XD
35
LLVM
•好用好玩而且最近很夯的 Compiler, 安裝方法如下 :– sudo apt-get install llvm clang xdot– sudo yum install llvm clang python-xdot python-setuptools
– 不是 apt-get 或 yum ? 那就假設你是高手會自己想辦法 XD
– Windows !? 聽說官網有安裝檔 ?
36
LLVM
•好用好玩而且最近很夯的 Compiler, 安裝方法如下 :– sudo apt-get install llvm clang xdot– sudo yum install llvm clang python-xdot python-setuptools
– 不是 apt-get 或 yum ? 那就假設你是高手會自己想辦法 XD
– Windows !? 聽說官網有安裝檔 ?– 建議自己 build, 不然會沒有部份 debug功能
37
LLVM IR
• v = operation type op1, op2, opn...– %sum = add i32 %op1, %op2
運算元
型態
運算子們
運算結果
38
空空的 LLVM函數
define void @empty() { ret void}
宣告函數的起手式
回傳型態
@函數名稱
參數列
回傳 + 型態
39
有一個參數的 LLVM函數
define void @arg1(i32 %a) { ret void}
參數列 , 有一個參數叫 %a
40
有一個參數並且直接回傳的 LLVM函數
define i32 @arg1(i32 %a) { ret i32 %a}
回傳值是 i32
回傳 + 型態 + 回傳值
41
有一個參數並且回傳其參數加十的 LLVM函數
define i32 @arg1(i32 %a) { %t = add i32 %a, 10 ret i32 %t}
%a加 10放到 %1
42
LLVM IR
• SSA-Based IR– %sum = add i32 %op1, %op2– %sum = mul i32 %op1, %op2– error: multiple definition of local value named 'sum'
43
SSA!?
•對編譯器來講 SSA Form 很友善 , 但對於正常人來說寫 SSA Form 不太直覺 ...
44
SSA!?
•對編譯器來講 SSA Form 很友善 , 但對於正常人來說寫 SSA Form 不太直覺 ...– 習慣 Functional programming者例外 ...XD
45
SSA!?
•對編譯器來講 SSA Form 很友善 , 但對於正常人來說寫 SSA Form 不太直覺 ...– 習慣 Functional programming者例外 ...XD
•手動插入 PHI 更是件麻煩事
46
alloca
•用來產生區域變數– 分配到的空間放到 stack
• 使用上有點類似 C語言的 malloc, 但概念不太一樣
47
alloca
define void @foo() { %var = alloca i32 ret void}
型別所產生的位置 , 可以看作是一個 i32*
48
alloca
•每次存取都必須透過 load/store– 但在最佳化過程中 , 若非必要則會變為Register ( 透過 mem2reg pass)
•若為 array 或必須對其取位址 , 則可能無法變成 Register
49
alloca/store
define void @foo() { %var = alloca i32 store i32 10, i32* %var ret void}
型別跟要存的目標位置要存的值與型別
50
alloca/load
define void @foo() { %var = alloca i32 store i32 10, i32* %var %t0 = load i32* %var ret void}
讀取回來的值 型別跟要讀取的目標位置
51
LLVM/Clang
•今天的分享中只會使用以下兩個工具 :– clang : 把 c 變成 LLVM IR– opt : 進行最佳化以及觀察的工具
52
View CFG by LLVM
• clang foo.c -S -emit-llvm• opt foo.ll -veiw-cfg
int foo(int a, int b){ if (a > b) return a; else return b;}
53
View CFG by LLVM
垃圾指令有點多 ,但在觀察階段開最佳化 ,
又會干擾學習
54
View CFG by LLVM
垃圾指令有點多 ,但在觀察階段開最佳化 ,
又會干擾學習
opt foo.ll -O1 -veiw-cfg開完最佳化後剩三道指令一個 BB...
55
opt 使用注意事項 (1/3)
•參數的位置很重要 !!
opt foo.ll -view-cfg -O1
先秀出 CFG 再進行最佳化
opt foo.ll -O1 -view-cfg
先進行最佳化 再來看 CFG
56
opt 使用注意事項 (2/3)
•參數可以重複下
opt foo.ll -view-cfg -O1 -view-cfg
先秀出 CFG
再進行最佳化
最後再看一次 CFG
57
opt 使用注意事項 (3/3)
•參數可以重複下 , 最佳化也可以重複作
opt foo.ll -O1 -view-cfg -O1 -view-cfg
再進行最佳化
進行最佳化
58
mem2reg
• mem2reg: 不必要的 alloca 以及 load/store 砍掉
•並且把程式變得比較有 SSA Form 的樣子
59
mem2reg
opt foo.ll -mem2reg -view-cfg
phi node 出現了 !並且也將 alloca
以及 load/store 砍光
60Compiler Optimization
編譯器最佳化
61
Propagation
• Propagation: 傳遞– Constant Propagation– Copy Propagation
62
Constant Propagation
int foo(int a){ int magic_num = 10; return a + magic_num;}
int foo(int a){ int magic_num = 10; return a + 10;}
63
Constant Propagation
opt foo.ll -mem2reg -view-cfg
這種最佳化太基本了 ,在 mem2reg 過程順便作掉
int foo(int a){ int magic_num = 10; return a + magic_num;}
int foo(int a){ int magic_num = 10; return a + 10;}
64
Constant Propagation
千萬不要覺得寫成右邊那樣會比較快就寫一堆
該死的 Magic Number!!!!
int foo(int a){ int magic_num = 10; return a + magic_num;}
int foo(int a){ int magic_num = 10; return a + 10;}
65
Copy Propagation
b = ac = b
b = ac = a
66
Constant Folding
• Constant Folding: 常數折疊!– 若運算對象都是常數 ,那就先算出來!
67
Constant Folding
• Constant Folding: 常數折疊!– 若運算對象都是常數 ,那就先算出來!
• a = 123 + 456
68
Constant Folding
• Constant Folding: 常數折疊!– 若運算對象都是常數 ,那就先算出來!
• a = 123 + 456– a = 579
69
Constant Folding
• Constant Folding: 常數折疊!– 若運算對象都是常數 ,那就先算出來!
• a = 123 + 456– a = 579
•程式中不一定有一堆這種常數運算 , 但經過 Constant Propagation 後會慢慢出現
70
Constant Folding
a = 10b = 100 + a
71
Constant Folding
a = 10b = 100 + a
a = 10b = 100 + 10
Constant Propagation
72
Constant Folding
a = 10b = 100 + a
a = 10b = 100 + 10
a = 10b = 110
Constant Propagation
Constant Folding
73
•程式中哪來中這麼多常數可以玩 !?
• Propagation跟 Folding 都是基礎小招 , 與其它最佳化搭配起來可發揮最大效用 !
74
• LLVM 這幾樣基礎最佳化都是順便做的 , 難以獨立觀察 ...
• Copy/Constant Propagation 基本上都會在 mem2reg 過程中順便處理掉
75
觀察 Constant Folding
• Constant Folding 則可以在 LLVM 的 Constant Propagation Pass 中處理
define i32 @folding() { %t = add i32 10, 20 ret i32 %t}
opt -S cfolding.ll -constprop
define i32 @folding() { ret i32 30}
76
Function Inline
• Inline: 行內函數 ? 內嵌函數 ?
•概念就是把函數內容複製一份到呼叫端
•節省掉函數的呼叫並且可探索更多的最佳化機會 !
77
Inline + Propagation
• Inline 後原本參數的傳遞變成單純的拷貝行為– Copy Propagation– Constant Propagation
78
Inline + Propagationint add(int a, int b){ return a + b;}
int foo(int n){ int sum = 0; int i, t; for (i = 0; i < n ;++i) { t = add(10, 20); sum = add(sum, i); sum = add(sum, t); } return sum;}
79
Inline + Propagationint add(int a, int b){ return a + b;}
int foo(int n){ int sum = 0; int i, t; for (i = 0; i < n ;++i) { t = add(10, 20); sum = add(sum, i); sum = add(sum, t); } return sum;}
define i32 @add(i32 %a, i32 %b) { %1 = add i32 %a, %b ret i32 %1}
define i32 @foo(i32 %n) { br label %1; <label>:1 %sum.0 = phi i32 [ 0, %0 ], [ %6, %7 ] %i.0 = phi i32 [ 0, %0 ], [ %8, %7 ] %2 = icmp slt i32 %i.0, %n br i1 %2, label %3, label %9; <label>:3 %4 = call i32 @add(i32 10, i32 20) %5 = call i32 @add(i32 %sum.0, i32 %i.0) %6 = call i32 @add(i32 %5, i32 %4) br label %7; <label>:7 %8 = add i32 %i.0, 1 br label %1; <label>:9 ret i32 %sum.0}
clang -emit-llvm -S inline.copt inline.ll -mem2reg -S
80
Inline + Propagationdefine i32 @add(i32 %a, i32 %b) { %1 = add i32 %a, %b ret i32 %1}
define i32 @foo(i32 %n) { br label %1; <label>:1 %sum.0 = phi i32 [ 0, %0 ], [ %6, %7 ] %i.0 = phi i32 [ 0, %0 ], [ %8, %7 ] %2 = icmp slt i32 %i.0, %n br i1 %2, label %3, label %9; <label>:3 %4 = call i32 @add(i32 10, i32 20) %5 = call i32 @add(i32 %sum.0, i32 %i.0) %6 = call i32 @add(i32 %5, i32 %4) br label %7; <label>:7 %8 = add i32 %i.0, 1 br label %1; <label>:9 ret i32 %sum.0}
define i32 @foo(i32 %n) { br label %1
; <label>:1 %sum.0 = phi i32 [ 0, %0 ], [ %5, %6 ] %i.0 = phi i32 [ 0, %0 ], [ %7, %6 ] %2 = icmp slt i32 %i.0, %n br i1 %2, label %3, label %8
; <label>:3 %4 = add i32 %sum.0, %i.0 %5 = add i32 %4, 30 br label %6
; <label>:6 %7 = add i32 %i.0, 1 br label %1
; <label>:8 ret i32 %sum.0}
opt inline.ll -mem2reg -inline -S
81
DCE
• DCE: Dead Code Elimination,死碼消除 ?
•在經過前面介紹的幾樣最佳化後 , 慢慢的會出現一些冗於的程式碼 , 以及一些明顯永遠不會成立的跳躍條件
82
DCEint foo(){ a = 5; if (a > 10) b = 10; else b = 20; return b;}
83
DCEint foo(){ a = 5; if (a > 10) b = 10; else b = 20; return b;}
int foo(){ a = 5; if (5 > 10) b = 10; else b = 20; return b;}
Constant Propagation
84
DCEint foo(){ a = 5; if (a > 10) b = 10; else b = 20; return b;}
int foo(){ a = 5; if (5 > 10) b = 10; else b = 20; return b;}
int foo(){ a = 5; if (false) b = 10; else b = 20; return b;}
Constant Propagation
ConstantFolding
85
DCEint foo(){ a = 5; if (a > 10) b = 10; else b = 20; return b;}
int foo(){ a = 5; if (5 > 10) b = 10; else b = 20; return b;}
int foo(){ a = 5; if (false) b = 10; else b = 20; return b;}
int foo(){ b = 20; return b;}
Constant Propagation
ConstantFolding
DCE
86
DCEint foo(){ a = 5; if (a > 10) b = 10; else b = 20; return b;}
int foo(){ a = 5; if (5 > 10) b = 10; else b = 20; return b;}
int foo(){ a = 5; if (false) b = 10; else b = 20; return b;}
int foo(){ b = 20; return b;}
int foo(){ return 20;}
Constant Propagation
ConstantFolding
DCEConstantPropagation
87
用 LLVM觀察 DCE (1/5)
int foo(){ int a; int b; a = 5; if (a > 10) b = a + 10; else b = a + 20; return b;}
clang -S -emit-llvm dce.c
define i32 @foo() {entry: %a = alloca i32 %b = alloca i32 store i32 5, i32* %a %0 = load i32* %a %cmp = icmp sgt i32 %0, 10 br i1 %cmp, label %if.then, label %if.else
if.then: %1 = load i32* %a %add = add i32 %1, 10 store i32 %add, i32* %b br label %if.end
if.else: %2 = load i32* %a %add1 = add i32 %2, 20 store i32 %add1, i32* %b br label %if.end
if.end: %3 = load i32* %b ret i32 %3}
88
用 LLVM觀察 DCE (2/5)
opt dce.c -mem2reg -S
define i32 @foo() {entry: %a = alloca i32 %b = alloca i32 store i32 5, i32* %a %0 = load i32* %a %cmp = icmp sgt i32 %0, 10 br i1 %cmp, label %if.then, label %if.else
if.then: %1 = load i32* %a %add = add i32 %1, 10 store i32 %add, i32* %b br label %if.end
if.else: %2 = load i32* %a %add1 = add i32 %2, 20 store i32 %add1, i32* %b br label %if.end
if.end: %3 = load i32* %b ret i32 %3}
define i32 @foo() {entry: %cmp = icmp sgt i32 5, 10 br i1 %cmp, label %if.then, label %if.else
if.then: %add = add i32 5, 10 br label %if.end
if.else: %add1 = add i32 5, 20 br label %if.end
if.end: %b.0 = phi i32 [ %add, %if.then ], [ %add1, %if.else ] ret i32 %b.0}
89
用 LLVM觀察 DCE (3/5)
opt dce.ll -mem2reg -constprop -S
define i32 @foo() {entry: %cmp = icmp sgt i32 5, 10 br i1 %cmp, label %if.then, label %if.else
if.then: %add = add i32 5, 10 br label %if.end
if.else: %add1 = add i32 5, 20 br label %if.end
if.end: %if.else, %if.then %b.0 = phi i32 [ %add, %if.then ], [ %add1, %if.else ] ret i32 %b.0}
-constprop
define i32 @foo() {entry: br i1 false, label %if.then, label %if.else
if.then: br label %if.end
if.else: br label %if.end
if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0}
90
用 LLVM觀察 DCE (4/5)
opt dce.ll -mem2reg -constprop -dce -S
-dce
define i32 @foo() {entry: br i1 false, label %if.then, label %if.else
if.then: br label %if.end
if.else: br label %if.end
if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0}
define i32 @foo() {entry: br i1 false, label %if.then, label %if.else
if.then: br label %if.end
if.else: br label %if.end
if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0}
91
用 LLVM觀察 DCE (4/5)
-dce
define i32 @foo() {entry: br i1 false, label %if.then, label %if.else
if.then: br label %if.end
if.else: br label %if.end
if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0}
define i32 @foo() {entry: br i1 false, label %if.then, label %if.else
if.then: br label %if.end
if.else: br label %if.end
if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0}
看起來好像沒變化 ??opt dce.ll -mem2reg -constprop -dce -S
92
用 LLVM觀察 DCE (4/5)
-dce
define i32 @foo() {entry: br i1 false, label %if.then, label %if.else
if.then: br label %if.end
if.else: br label %if.end
if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0}
define i32 @foo() {entry: br i1 false, label %if.then, label %if.else
if.then: br label %if.end
if.else: br label %if.end
if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0}
看起來好像沒變化 ??LLVM 將 CFG 化簡部份交給 -simplifycfg pass
93
用 LLVM觀察 DCE (5/5)
opt dce.ll -mem2reg -constprop -simplifycfg -S
-simplifycfg
define i32 @foo() {entry: br i1 false, label %if.then, label %if.else
if.then: br label %if.end
if.else: br label %if.end
if.end: %b.0 = phi i32 [ 15, %if.then ], [ 25, %if.else ] ret i32 %b.0}
define i32 @foo() {entry: ret i32 25}
94
用 LLVM觀察 DCE - 2 (1/2)
opt dce.ll -mem2reg -simplifycfg -S
-simplifycfg
define i32 @foo() {entry: %cmp = icmp sgt i32 5, 10 %add = add i32 5, 10 %add1 = add i32 5, 20 %b.0 = select i1 %cmp, i32 %add, i32 %add1 ret i32 %b.0}
define i32 @foo() {entry: %cmp = icmp sgt i32 5, 10 br i1 %cmp, label %if.then, label %if.else
if.then: %add = add i32 5, 10 br label %if.end
if.else: %add1 = add i32 5, 20 br label %if.end
if.end: %if.else, %if.then %b.0 = phi i32 [ %add, %if.then ], [ %add1, %if.else ] ret i32 %b.0}
95
用 LLVM觀察 DCE - 2 (2/2)
opt dce.ll -mem2reg -simplifycfg -constprop -S
-constprop
define i32 @foo() {entry: %cmp = icmp sgt i32 5, 10 %add = add i32 5, 10 %add1 = add i32 5, 20 %b.0 = select i1 %cmp, i32 %add, i32 %add1 ret i32 %b.0}
define i32 @foo() {entry: ret i32 25}
96
CSE
• CSE:Common subexpression elimination– 把可以共用的部份共用 !
97
CSE
a = b * c + g;d = b * c * e;
t = b * c;a = t + g;d = t * e;
98
用 LLVM觀察 CSE (1/2)
define i32 @foo(i32 %b, i32 %c, i32 %g, i32 %e) {entry: %mul = mul i32 %b, %c %add = add i32 %mul, %g %mul1 = mul i32 %b, %c %mul2 = mul i32 %mul1, %e %add3 = add i32 %add, %mul2 ret i32 %add3}
int foo(int b, int c, int g, int e){ int a = b * c + g; int d = b * c * e; return a + d;}
clang -emit-llvm -S cse.copt cse.ll -mem2reg -S
99
用 LLVM觀察 CSE (2/2)
define i32 @foo(i32 %b, i32 %c, i32 %g, i32 %e) {entry: %mul = mul i32 %b, %c %add = add i32 %mul, %g %mul1 = mul i32 %b, %c %mul2 = mul i32 %mul1, %e %add3 = add i32 %add, %mul2 ret i32 %add3}
opt cse.ll -mem2reg -early-cse -S
define i32 @foo(i32 %b, i32 %c, i32 %g, i32 %e) {entry: %mul = mul i32 %b, %c %add = add i32 %mul, %g %mul2 = mul i32 %mul, %e %add3 = add i32 %add, %mul2 ret i32 %add3}
-early-cse
100
Loop Unroll
• Loop Unroll:迴圈展開– 跳躍指令在大多數架構下比一般運算指令貴– 展開後 Loop index 可能從變數變成常數
sum = 0;for (i = 0; i < 3; ++i) sum = sum + i
sum = 0;sum = sum + 0sum = sum + 1sum = sum + 2
101
用 LLVM觀察 Loop Unroll (1/8)
int add(int a, int b){ return a + b;}int foo(){ int sum = 0; int i; for (i = 0; i < 3; ++i) sum = add(sum, i); return sum;}
clang -emit-llvm -S for.copt for.ll -mem2reg -S
define i32 @add(i32 %a, i32 %b) {entry: %add = add i32 %a, %b ret i32 %add}
define i32 @foo() {entry: br label %for.condfor.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.endfor.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.incfor.inc: %inc = add i32 %i.0, 1 br label %for.condfor.end: ret i32 %sum.0}
102
用 LLVM觀察 Loop Unroll (2/8)
opt for.ll -mem2reg -loop-unroll -S
define i32 @foo() {entry: br label %for.condfor.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.endfor.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.incfor.inc: %inc = add i32 %i.0, 1 br label %for.condfor.end: ret i32 %sum.0}
define i32 @foo() {entry: br label %for.condfor.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.endfor.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.incfor.inc: %inc = add i32 %i.0, 1 br label %for.condfor.end: %sum.0.lcssa = phi i32 [ %sum.0, %for.cond ] ret i32 %sum.0.lcssa}
-loop-unroll
103
用 LLVM觀察 Loop Unroll (2/8)
opt for.ll -mem2reg -loop-unroll -S
define i32 @foo() {entry: br label %for.condfor.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.endfor.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.incfor.inc: %inc = add i32 %i.0, 1 br label %for.condfor.end: ret i32 %sum.0}
define i32 @foo() {entry: br label %for.condfor.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.endfor.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.incfor.inc: %inc = add i32 %i.0, 1 br label %for.condfor.end: %sum.0.lcssa = phi i32 [ %sum.0, %for.cond ] ret i32 %sum.0.lcssa}
似乎 Unroll 不開 ????
-loop-unroll
104
用 LLVM觀察 Loop Unroll (3/8)
opt for.ll -mem2reg -loop-unroll -S -debug
define i32 @foo() {entry: br label %for.condfor.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.endfor.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.incfor.inc: %inc = add i32 %i.0, 1 br label %for.condfor.end: ret i32 %sum.0}
$ opt -mem2reg -S for.ll -loop-unroll -debugArgs: opt -mem2reg -S for.ll -loop-unroll -debug Loop Unroll: F[foo] Loop %for.cond Loop Size = 8 Can't unroll; loop not terminated by a conditional branch.
跟你抱怨這個 Loop,Loop Unroll Pass 認不得 !?
-loop-unroll-debug
105
用 LLVM觀察 Loop Unroll (4/8)
opt for.ll -mem2reg -loop-rotate -S
define i32 @foo() {entry: br label %for.condfor.cond: %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %sum.0 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %cmp = icmp slt i32 %i.0, 3 br i1 %cmp, label %for.body, label %for.endfor.body: %call = call i32 @add(i32 %sum.0, i32 %i.0) br label %for.incfor.inc: %inc = add i32 %i.0, 1 br label %for.condfor.end: ret i32 %sum.0}
define i32 @foo() {entry: br label %for.body
for.body: %sum.02 = phi i32 [ 0, %entry ], [ %call, %for.inc ] %i.01 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %call = call i32 @add(i32 %sum.02, i32 %i.01) br label %for.inc
for.inc: %inc = add i32 %i.01, 1 %cmp = icmp slt i32 %inc, 3 br i1 %cmp, label %for.body, label %for.end
for.end: %sum.0.lcssa = phi i32 [ %call, %for.inc ] ret i32 %sum.0.lcssa}
翻轉吧!迴圈!
-loop-rorate
106
用 LLVM觀察 Loop Unroll (5/8)
opt for.ll -mem2reg -view-cfg -loop-rotate -view-cfg -S翻轉吧!迴圈!
-loop-rorate
107
用 LLVM觀察 Loop Unroll (6/8)
opt for.ll -mem2reg -loop-rotate -loop-unroll -view-cfg -S
-loop-unroll
108
用 LLVM觀察 Loop Unroll (7/8)
opt for.ll -mem2reg -loop-rotate -loop-unroll -simplifycfg -view-cfg -S
-simplifycfg
define i32 @foo() {entry: br label %for.body
for.body: %call = call i32 @add(i32 0, i32 0) br label %for.inc
for.inc: %call.1 = call i32 @add(i32 %call, i32 1) br label %for.inc.1
for.inc.1: %call.2 = call i32 @add(i32 %call.1, i32 2) br label %for.inc.2
for.inc.2: ret i32 %call.2}
define i32 @foo() {entry: %call = call i32 @add(i32 0, i32 0) %call.1 = call i32 @add(i32 %call, i32 1) %call.2 = call i32 @add(i32 %call.1, i32 2) ret i32 %call.2}
109
用 LLVM觀察 Loop Unroll (8/8)
opt for.ll -mem2reg -loop-rotate -loop-unroll -simplifycfg \ -inline -constprop -S
-inline
define i32 @add(i32 %a, i32 %b) {entry: %add = add i32 %a, %b ret i32 %add}
define i32 @foo() {entry: %call = call i32 @add(i32 0, i32 0) %call.1 = call i32 @add(i32 %call, i32 1) %call.2 = call i32 @add(i32 %call.1, i32 2) ret i32 %call.2}
define i32 @foo() {entry: %add.i = add i32 1, 2 ret i32 %add.i}
110
用 LLVM觀察 Loop Unroll (8/8)
opt for.ll -mem2reg -loop-rotate -loop-unroll -simplifycfg \ -inline -constprop -S
-inline
define i32 @add(i32 %a, i32 %b) {entry: %add = add i32 %a, %b ret i32 %add}
define i32 @foo() {entry: %call = call i32 @add(i32 0, i32 0) %call.1 = call i32 @add(i32 %call, i32 1) %call.2 = call i32 @add(i32 %call.1, i32 2) ret i32 %call.2}
define i32 @foo() {entry: %add.i = add i32 1, 2 ret i32 %add.i}
define i32 @foo() {entry: ret i32 3}
-constprop
111
Compiler Optimization
•編譯器不同最佳化之間可以交互作用
•順序也會影響最佳化結果
112
LLVM
•透過 opt -help 可以看到
113
114
Overview of GCC Optimization Pass$ gcc a.c -fdump-tree-all -fdump-rtl-all -O3$ ls a.c.*a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayouta.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_swa.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmconsa.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.iraa.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reloada.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreloada.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.reea.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epiloguea.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csaa.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardrega.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dcea.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotosa.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbroa.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stacka.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignmentsa.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.macha.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriersa.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shortena.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrowa.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.finala.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinisha.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
共 165 個 pass 的 dump file!
115
Propagation$ gcc a.c -fdump-tree-all -fdump-rtl-all -O3$ ls a.c.*a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayouta.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_swa.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmconsa.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.iraa.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reloada.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreloada.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.reea.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epiloguea.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csaa.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardrega.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dcea.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotosa.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbroa.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stacka.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignmentsa.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.macha.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriersa.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shortena.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrowa.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.finala.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinisha.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
28 / 165 的 pass 在 Propagation!
116
Inline$ gcc a.c -fdump-tree-all -fdump-rtl-all -O3$ ls a.c.*a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayouta.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_swa.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmconsa.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.iraa.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reloada.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreloada.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.reea.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epiloguea.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csaa.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardrega.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dcea.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotosa.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbroa.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stacka.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignmentsa.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.macha.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriersa.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shortena.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrowa.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.finala.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinisha.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
3 / 165 的 pass 在 Inline!
117
DCE$ gcc a.c -fdump-tree-all -fdump-rtl-all -O3$ ls a.c.*a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayouta.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_swa.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmconsa.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.iraa.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reloada.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreloada.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.reea.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epiloguea.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csaa.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardrega.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dcea.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotosa.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbroa.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stacka.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignmentsa.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.macha.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriersa.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shortena.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrowa.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.finala.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinisha.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
13 / 165 的 pass 在 DCE!
118
CSE$ gcc a.c -fdump-tree-all -fdump-rtl-all -O3$ ls a.c.*a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayouta.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_swa.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmconsa.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.iraa.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reloada.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreloada.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.reea.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epiloguea.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csaa.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardrega.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dcea.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotosa.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbroa.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stacka.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignmentsa.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.macha.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriersa.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shortena.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrowa.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.finala.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinisha.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
4 / 165 的 pass 在 CSE!
119
Unroll$ gcc a.c -fdump-tree-all -fdump-rtl-all -O3$ ls a.c.*a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayouta.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_swa.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmconsa.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.iraa.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reloada.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreloada.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.reea.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epiloguea.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csaa.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardrega.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dcea.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotosa.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbroa.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stacka.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignmentsa.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.macha.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriersa.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shortena.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrowa.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.finala.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinisha.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
2 / 165 的 pass 在 Unroll!
120
Propagation + DCE + CSE + Inline + Unroll
a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayouta.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_swa.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmconsa.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.iraa.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reloada.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreloada.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.reea.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epiloguea.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csaa.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardrega.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dcea.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotosa.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbroa.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stacka.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignmentsa.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.macha.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriersa.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shortena.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrowa.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.finala.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinisha.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
50 / 165 !
121
Propagation + DCE + CSE + Inline + Unroll
a.c.001t.tu a.c.059t.forwprop2 a.c.093t.sink a.c.138t.copyrename4 a.c.204r.outof_cfglayouta.c.003t.original a.c.060t.objsz1 a.c.096t.loop a.c.139t.crited2 a.c.205r.split1a.c.004t.gimple a.c.061t.alias a.c.097t.loopinit a.c.141t.uncprop1 a.c.206r.subreg2a.c.006t.omplower a.c.062t.retslot a.c.098t.lim1 a.c.142t.local-pure-const2 a.c.208r.mode_swa.c.007t.lower a.c.063t.fre2 a.c.099t.copyprop5 a.c.168t.nrv a.c.209r.asmconsa.c.010t.eh a.c.064t.copyprop3 a.c.100t.dceloop1 a.c.169t.optimized a.c.213r.iraa.c.011t.cfg a.c.065t.mergephi2 a.c.101t.unswitch a.c.170r.expand a.c.214r.reloada.c.015t.ssa a.c.066t.vrp1 a.c.102t.sccp a.c.171r.vregs a.c.215r.postreloada.c.017t.inline_param1 a.c.067t.dce1 a.c.104t.ldist a.c.172r.into_cfglayout a.c.216r.gcse2a.c.018t.einline a.c.068t.cdce a.c.105t.copyprop6 a.c.173r.jump a.c.217r.split2a.c.019t.early_optimizations a.c.069t.cselim a.c.111t.ivcanon a.c.174r.subreg1 a.c.218r.reea.c.020t.copyrename1 a.c.070t.ifcombine a.c.113t.ifcvt a.c.175r.dfinit a.c.221r.pro_and_epiloguea.c.021t.ccp1 a.c.071t.phiopt1 a.c.114t.vect a.c.176r.cse1 a.c.222r.dse2a.c.022t.forwprop1 a.c.072t.tailr2 a.c.115t.dceloop3 a.c.177r.fwprop1 a.c.223r.csaa.c.023t.ealias a.c.073t.ch a.c.116t.pcom a.c.178r.cprop1 a.c.224r.jump2a.c.024t.esra a.c.075t.cplxlower1 a.c.117t.cunroll a.c.179r.pre a.c.225r.peephole2a.c.025t.fre1 a.c.076t.sra a.c.118t.slp a.c.181r.cprop2 a.c.226r.ce3a.c.026t.copyprop1 a.c.077t.copyrename3 a.c.120t.ivopts a.c.184r.ce1 a.c.228r.cprop_hardrega.c.027t.mergephi1 a.c.078t.dom1 a.c.121t.lim3 a.c.185r.reginfo a.c.229r.rtl_dcea.c.028t.cddce1 a.c.079t.isolate-paths a.c.122t.loopdone a.c.186r.loop2 a.c.230r.compgotosa.c.029t.eipa_sra a.c.080t.phicprop1 a.c.123t.veclower21 a.c.187r.loop2_init a.c.231r.bbroa.c.030t.tailr1 a.c.081t.dse1 a.c.125t.reassoc2 a.c.188r.loop2_invariant a.c.233r.split4a.c.031t.switchconv a.c.082t.reassoc1 a.c.126t.slsr a.c.189r.loop2_unswitch a.c.234r.sched2a.c.033t.profile_estimate a.c.083t.dce2 a.c.127t.dom2 a.c.192r.loop2_done a.c.236r.stacka.c.034t.local-pure-const1 a.c.084t.forwprop3 a.c.128t.phicprop2 a.c.194r.cprop3 a.c.237r.alignmentsa.c.035t.fnsplit a.c.085t.phiopt2 a.c.129t.vrp2 a.c.195r.cse2 a.c.239r.macha.c.036t.release_ssa a.c.086t.strlen a.c.130t.cddce2 a.c.196r.dse1 a.c.240r.barriersa.c.037t.inline_param2 a.c.087t.ccp3 a.c.132t.dse2 a.c.197r.fwprop2 a.c.244r.shortena.c.054t.copyrename2 a.c.088t.copyprop4 a.c.133t.forwprop4 a.c.199r.init-regs a.c.245r.nothrowa.c.055t.ccp2 a.c.089t.sincos a.c.134t.phiopt3 a.c.200r.ud_dce a.c.246r.dwarf2a.c.056t.copyprop2 a.c.090t.bswap a.c.135t.fab1 a.c.201r.combine a.c.247r.finala.c.057t.cunrolli a.c.091t.crited1 a.c.136t.widening_mul a.c.202r.ce2 a.c.248r.dfinisha.c.058t.phiprop a.c.092t.pre a.c.137t.tailc a.c.203r.bbpart a.c.249t.statistics
50 / 165 !
聽完這次的分享等於已經略懂約三分之一 GCC惹!!!
122Machine Dependent Compiler Optimization
機器相依的編譯器最佳化
123
Machine Dependent Compiler Optimization
• Register Allocation• Instruction Scheduling• Peephole Optimization
124Advanced Compiler Optimization
高階編譯器最佳化
125
Advanced Compiler Optimization
• Loop Optimization• Inter Procedure Optimization• Auto Vectorization • Auto Parallelization
126
總結
• Compiler Optimization 很有趣 , 但開始玩之前一定要先讀一些基礎理論
• LLVM則是一個相當好的理論與實作的接軌
127
工商時間安第斯山脈 工商時間
128
工商時間安第斯山脈
好山好水好無聊準時下班氣氛佳
工商時間
129
工商時間安第斯山脈
好山好水好無聊準時下班氣氛佳
Open Source++
工商時間
130
工商時間安第斯山脈
好山好水好無聊準時下班氣氛佳
Toolchain 長期徵人中 ~
Open Source++
工商時間
131