coscup 2014 : open source compiler 戰國時代的軍備競賽

Post on 21-Apr-2017

4.232 Views

Category:

Engineering

7 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Open Source Compiler戰國時代

的軍備競賽

COSCUP'2014

Date : July 19th, 2014 Kito Cheng

kito.cheng@gmail.com

2

自我介紹

安第斯山脈Compiler Team專業打雜工

3

Open Source Compiler

進入戰國時代

4

yum update -ysudo apt-get upgrade

升級 Compiler?

5

戰爭

6

戰爭

軍備競賽

7

戰爭

軍備競賽

技術快速發展!

8

Compiler 技術發展

9

Compiler 技術發展

更好 Debug!

程式跑更快

下游與週邊產業蓬勃發展

10

下游產業蓬勃發展

GNU 陣線ld.bfd / ld.gold

gdb

as/objdump

libstdc++

libgcc

11

下游產業蓬勃發展

GNU 陣線 LLVM 陣線ld.bfd / ld.gold lld / mclinker

gdb lldb

as/objdump MC layer in LLVM

libstdc++ libc++

libgcc libcompiler-rt

12

binutils vs MC Layer

• binutils 內的 Library 不易使用 , 要拿到外部使用相當困難 .

• LLVM 提供的 MC Layer , 讓 assembler 及 disassembler 也可輕鬆當成 Library 使用 .

•打開整合 Toolchain 的大門

13

libstdc++ vs libc++

•以往在 Linux 平台上 C++ 只有 libstdc++ 可選擇

14

libstdc++ vs libc++

•以往在 Linux 平台上 C++ 只有 libstdc++ 可選擇– 其中有一些令人詬病的實作如那不可思議的 std::string...

15

libstdc++ vs libc++

•以往在 Linux 平台上 C++ 只有 libstdc++ 可選擇– 其中有一些令人詬病的實作如那不可思議的 std::string...

• libc++ 提供更快更省記憶體的新選擇 !

16

libstdc++ vs libc++

•以往在 Linux 平台上 C++ 只有 libstdc++ 可選擇– 其中有一些令人詬病的實作如那不可思議的 std::string...

• libc++ 提供更快更省記憶體的新選擇 !

• STLPort 表示 :

17

libstdc++ vs libc++

•以往在 Linux 平台上 C++ 只有 libstdc++ 可選擇– 其中有一些令人詬病的實作如那不可思議的 std::string...

• libc++ 提供更快更省記憶體的新選擇 !

• STLPort 表示 :– STLPort 缺乏底層 C++ Runtime Library

18

週邊產業蓬勃發展

•VM/JIT•New Programming Language•分析 /除錯工具

19

VM/JIT

20

VM/JIT

21

VM/JIT

Kaffe VM 的杯具 : 辛苦刻 JIT 卻比 Interpreter 慢

22

VM/JIT

Kaffe VM 的杯具 : 辛苦刻 JIT 卻比 Interpreter 慢

Just in time

23

VM/JIT

Kaffe VM 的杯具 : 辛苦刻 JIT 卻比 Interpreter 慢

Just in time

Just too late!

24

Pyston

FTLWebKit’s LLVM based JIT

25

New Programming Language

26

New Programming Language

•如果想要尊絕不凡的 Native Execution 速度 ...

27

New Programming Language

•如果想要尊絕不凡的 Native Execution 速度 ...–自己幹一個 code gen

28

New Programming Language

•如果想要尊絕不凡的 Native Execution 速度 ...–自己幹一個 code gen–噴成 C code 再餵給 Compiler

29

New Programming Language

•如果想要尊絕不凡的 Native Execution 速度 ...–自己幹一個 code gen–噴成 C code 再餵給 Compiler–接上現有的 Compiler

30

New Programming Language

•以往在只有 GCC 時難以重用 ...

31

New Programming Language

•以往在只有 GCC 時難以重用 ...–架構複雜 , 缺乏文件

•GCC IR - GIMPLE 沒多少人看得懂 XD

–授權嚴格 : GPLv3

32

New Programming Language

•以往在只有 GCC 時難以重用 ...–架構複雜 , 缺乏文件

•GCC IR - GIMPLE 沒多少人看得懂 XD

–授權嚴格 : GPLv3

•現在則有 LLVM 處理一切後端雜事 !

33

Rust

34

分析 /除錯工具

• youcompleteme

• clang static analyzer

35

除錯工具 in Compiler

•除錯工具– Address-sanitizer– Undefined-sanitizer– Thread-sanitizer

36

Address-sanitizer

•快速記憶體錯誤偵測器

•早期發現早期治療 :)

•比 Valgrind 快速好用 !– 相對的缺點是必須重新編譯程式

37

越界存取int main(int argc, char **argv) { int stack_array[100]; stack_array[1] = 0; return stack_array[argc + 100]; // BOOM}

===================================================================28706==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fff61e1f344 at pc 0x4a5dfb bp 0x7fff61e1f170 sp 0x7fff61e1f168READ of size 4 at 0x7fff61e1f344 thread T0 #0 0x4a5dfa in main /home/kito/test.cpp:4 #1 0x7ff11a8a1d64 in __libc_start_main (/lib64/libc.so.6+0x21d64) #2 0x404c98 (/home/kito/a.out+0x404c98)

Address 0x7fff61e1f344 is located in stack of thread T0 at offset 436 in frame #0 0x4a5d29 in main /home/kito/test.cpp:1

This frame has 1 object(s): [32, 432) 'stack_array' <== Memory access at offset 436 overflows this variableHINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext (longjmp and C++ exceptions *are* supported)SUMMARY: AddressSanitizer: stack-buffer-overflow /home/kito/test.cpp:4 mainShadow bytes around the buggy address: 0x10006c3bbe10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00...=>0x10006c3bbe60: 00 00 00 00 00 00 00 00[f4]f4 f3 f3 f3 f3 00 00... 0x10006c3bbeb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00...==28706==ABORTING

38

釋放後使用

==12254==ERROR: AddressSanitizer: heap-use-after-free on address 0x60200000eff0 at pc 0x4a5db4 bp 0x7fff3ff57520 sp 0x7fff3ff57518READ of size 4 at 0x60200000eff0 thread T0 #0 0x4a5db3 in main /home/kito/coscup2014/use-after-free.c:6 #1 0x3c52221d64 in __libc_start_main (/lib64/libc.so.6+0x3c52221d64) #2 0x404c98 (/home/kito/coscup2014/a.out+0x404c98)

0x60200000eff0 is located 0 bytes inside of 4-byte region [0x60200000eff0,0x60200000eff4)freed by thread T0 here: #0 0x476c79 in __interceptor_free /home/kito/gcc/gcc-src/libsanitizer/asan/asan_malloc_linux.cc:63 #1 0x4a5d7c in main /home/kito/coscup2014/use-after-free.c:5 #2 0x3c52221d64 in __libc_start_main (/lib64/libc.so.6+0x3c52221d64)

previously allocated by thread T0 here: #0 0x476f19 in __interceptor_malloc /home/kito/gcc/gcc-src/libsanitizer/asan/asan_malloc_linux.cc:73 #1 0x4a5d2b in main /home/kito/coscup2014/use-after-free.c:3 #2 0x3c52221d64 in __libc_start_main (/lib64/libc.so.6+0x3c52221d64)

#include <malloc.h>int main() { int *a = malloc(sizeof(int)); *a = 100; free(a); return *a;}

39

不成對 free/delete/delete[]int main() { int *arr = new int[10]; delete arr; return 0;}

===================================================================12421==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x60400000dfd0 #0 0x478219 in operator delete(void*) /home/kito/gcc/gcc-src/libsanitizer/asan/asan_new_delete.cc:85 #1 0x4a5efb in main /home/kito/coscup2014/mismatch-delete.cpp:3 #2 0x3c52221d64 in __libc_start_main (/lib64/libc.so.6+0x3c52221d64) #3 0x404e58 (/home/kito/coscup2014/a.out+0x404e58)

0x60400000dfd0 is located 0 bytes inside of 40-byte region [0x60400000dfd0,0x60400000dff8)allocated by thread T0 here: #0 0x477e29 in operator new[](unsigned long) /home/kito/gcc/gcc-src/libsanitizer/asan/asan_new_delete.cc:55 #1 0x4a5eeb in main /home/kito/coscup2014/mismatch-delete.cpp:2 #2 0x3c52221d64 in __libc_start_main (/lib64/libc.so.6+0x3c52221d64)

40

Address-sanitizer

Valgrind AddressSanitizer

Heap 越界存取 Y Y

Stack 越界存取 N Y

全域變數越界存取 N Y

釋放 (free/delete)後使用 Y Y

回傳後使用(例 :回傳區域變數指標 )

N Y( 部份 )

讀取未初始化的值 Y N

不成對 free/delete/delete[] Y Y

額外開銷 10x-30x 1.5x-3x

41

Undefined-Sanitizer

• Undefined behavior 可能潛藏在程式各處

42

Undefined-Sanitizer

• Undefined behavior 可能潛藏在程式各處

•但正常人不會去 k standard 去看啥是 undefined behavior...

43

Undefined-Sanitizer

• Undefined behavior 可能潛藏在程式各處

•但正常人不會去 k standard 去看啥是 undefined behavior...– c99 UB列了 1x 頁

44

Undefined-Sanitizer

• Undefined behavior 可能潛藏在程式各處

•但正常人不會去 k standard 去看啥是 undefined behavior...– c99 UB列了 1x 頁

•撞到鬼的時候才會發現這是 Undefined behavior !!!!!

45

Undefined-Sanitizer

• Undefined behavior 可能潛藏在程式各處

•但正常人不會去 k standard 去看啥是 undefined behavior...– c99 UB列了 1x 頁

•撞到鬼的時候才會發現這是 Undefined behavior !!!!!– 雖然是這麼說但其實並沒有涵蓋到所有 UB...XD

46

Divde by 0

int main(int argc, const char *argv[]){ return argc/0;}

div0.cpp:2:14: runtime error: division by zeroFloating point exception

47

Deference Null pointer

int main(int argc, const char *argv[]){ int *a = nullptr; return *a;}

derefnull.cpp:3:11: runtime error: load of null pointer of type 'int'Segmentation fault

48

Shift

int main(int argc, const char *argv[]){ return argc >> 32;}

shift.cpp:2:15: runtime error: shift exponent 32 is too large for 32-bit type 'int'

49

Signed Integer Overflow

#include <limits.h>int main(int argc, const char *argv[]){ int a = INT_MAX; return a + argc;}

overflow.cpp:4:14: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'

50

Thread-Sanitizer

• Race Condition 檢測器

•簡單好用 !早期治療!

• Race Condition 麻煩的地方在於難以重現, Thread-Sanitizer 則可透過特殊演算法偵測 .

51

Race Condition

#include <pthread.h>#include <stdio.h>

int Global;

void *Thread1(void *x) { Global++; return NULL;}

void *Thread2(void *x) { Global--; return NULL;}

int main() { pthread_t t[2]; pthread_create(&t[0], NULL, Thread1, NULL); pthread_create(&t[1], NULL, Thread2, NULL); pthread_join(t[0], NULL); pthread_join(t[1], NULL);}

52

Race Condition

#include <pthread.h>#include <stdio.h>

int Global;

void *Thread1(void *x) { Global++; return NULL;}

void *Thread2(void *x) { Global--; return NULL;}

int main() { pthread_t t[2]; pthread_create(&t[0], NULL, Thread1, NULL); pthread_create(&t[1], NULL, Thread2, NULL); pthread_join(t[0], NULL); pthread_join(t[1], NULL);}

==================WARNING: ThreadSanitizer: data race (pid=21757) Write of size 4 at 0x7ffa3e002ef4 by thread T2: #0 Thread2 /home/kito/coscup2014/race.c:12 (race+0x0000000c1a75)

Previous write of size 4 at 0x7ffa3e002ef4 by thread T1: #0 Thread1 /home/kito/coscup2014/race.c:7 (race+0x0000000c1a05)

Location is global 'Global' of size 4 at 0x7ffa3e002ef4 (race+0x000000e03ef4)

Thread T2 (tid=21760, running) created by main thread at: #0 pthread_create /home/kito/llvm/src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:842 (race+0x00000005e1f1) #1 main /home/kito/coscup2014/race.c:19 (race+0x0000000c1b03)

Thread T1 (tid=21759, finished) created by main thread at: #0 pthread_create /home/kito/llvm/src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:842 (race+0x00000005e1f1) #1 main /home/kito/coscup2014/race.c:18 (race+0x0000000c1ad9)

SUMMARY: ThreadSanitizer: data race /home/kito/coscup2014/race.c:12 Thread2==================ThreadSanitizer: reported 1 warnings

53

Race Condition#include <pthread.h>#include <stdio.h>

int Global;pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

void *Thread1(void *x) { pthread_mutex_lock(&lock); Global++; pthread_mutex_unlock(&lock); return NULL;}

void *Thread2(void *x) { Global--; return NULL;}

int main() { pthread_t t[2]; pthread_create(&t[0], NULL, Thread1, NULL); pthread_create(&t[1], NULL, Thread2, NULL); pthread_join(t[0], NULL); pthread_join(t[1], NULL);}

54

Race Condition#include <pthread.h>#include <stdio.h>

int Global;pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

void *Thread1(void *x) { pthread_mutex_lock(&lock); Global++; pthread_mutex_unlock(&lock); return NULL;}

void *Thread2(void *x) { Global--; return NULL;}

int main() { pthread_t t[2]; pthread_create(&t[0], NULL, Thread1, NULL); pthread_create(&t[1], NULL, Thread2, NULL); pthread_join(t[0], NULL); pthread_join(t[1], NULL);}

==================WARNING: ThreadSanitizer: data race (pid=21765) Write of size 4 at 0x7feaa10dcf20 by thread T2: #0 Thread2 /home/kito/coscup2014/race-2.c:15 (race-2+0x0000000c1ad5)

Previous write of size 4 at 0x7feaa10dcf20 by thread T1 (mutexes: write M0): #0 Thread1 /home/kito/coscup2014/race-2.c:9 (race-2+0x0000000c1a57)

Location is global 'Global' of size 4 at 0x7feaa10dcf20 (race-2+0x000000e03f20)

Mutex M0 (0x7feaa10dcef8) created at: #0 pthread_mutex_lock /home/kito/llvm/src/projects/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:2956 (race-2+0x00000007f260) #1 Thread1 /home/kito/coscup2014/race-2.c:8 (race-2+0x0000000c1a37)

Thread T2 (tid=21768, running) created by main thread at: #0 pthread_create /home/kito/llvm/src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:842 (race-2+0x00000005e231) #1 main /home/kito/coscup2014/race-2.c:22 (race-2+0x0000000c1b63)

Thread T1 (tid=21767, finished) created by main thread at: #0 pthread_create /home/kito/llvm/src/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:842 (race-2+0x00000005e231) #1 main /home/kito/coscup2014/race-2.c:21 (race-2+0x0000000c1b39)

SUMMARY: ThreadSanitizer: data race /home/kito/coscup2014/race-2.c:15 Thread2==================ThreadSanitizer: reported 1 warnings

55

GCC 快速發展 !

56

錯誤診斷訊息改良class T { public: int a;}

#include <vector>

少一個分號

57

錯誤診斷訊息改良class T { public: int a;}

#include <vector>

少一個分號

In file included from /home/kito/gcc-workspace/native-4.4/lib/gcc/x86_64-unknown-linux-gnu/4.4.7/../../../../include/c++/4.4.7/cstddef:44, from /home/kito/gcc-workspace/native-4.4/lib/gcc/x86_64-unknown-linux-gnu/4.4.7/../../../../include/c++/4.4.7/bits/stl_algobase.h:61, from /home/kito/gcc-workspace/native-4.4/lib/gcc/x86_64-unknown-linux-gnu/4.4.7/../../../../include/c++/4.4.7/vector:61, from test.cpp:5:/home/kito/gcc-workspace/native-4.4/lib/gcc/x86_64-unknown-linux-gnu/4.4.7/include/stddef.h:149: error: two or more data types in declaration of ‘ptrdiff_t’.../home/kito/gcc-workspace/native-4.4/lib/gcc/x86_64-unknown-linux-gnu/4.4.7/../../../../include/c++/4.4.7/bits/vector.tcc:629: error: there are no arguments to ‘difference_type’ that depend on a template parameter, so a declaration of ‘difference_type’ must be available

餵進 gcc 4.4 狂噴 132 行錯誤訊息 XD

58

錯誤診斷訊息改良class T { public: int a;}

#include <vector>

少一個分號

餵進 gcc 4.9

test.cpp:4:1: error: expected ‘;’ after class definition } ^

糾甘心ㄟ不但會上色還會指出哪邊錯(聽說這是 clang 好像早就有的功能 ?

59

更多範例 :

http://web.archive.org/web/20120622065456/http://people.redhat.com/bkoz/diagnostics/diagnostics.html

https://gcc.gnu.org/wiki/ClangDiagnosticsComparison

http://tinyurl.com/cxxdiagcmp

http://tinyurl.com/clangcmp

60

* Sanitizer

•初始版本實作於 Clang/LLVM

•在 GCC 4.8 後陸續加入支援 !

61

LTO 大幅改善

•大量減少記憶體使用及編譯時間

•可建置大型專案 !–Firefox–Linux Kernel

62

libgccjit.so

• gcc 也有 JIT engine !!

63

libgccjit.so

• gcc 也有 JIT engine !!

•不過實作方式有點惡搞 :

64

libgccjit.so

• gcc 也有 JIT engine !!

•不過實作方式有點惡搞 :–偷偷呼叫 gcc + dlopen

65

內部架構大翻修

•內部開發語言轉為 C++

•增加可讀性

•提升編譯速度

66

LLVM 迅速追趕 !

67

與 GCC 相容性改善

•編譯選項的相容

•擴充語法的相容

68

Named register variables

register int *foo asm ("a5");

69

Named register variables

•部份系統程式使用此擴充語法來指定暫存器使用 , 可有效減少 Inline asm 的數量

register int *foo asm ("a5");

70

Named register variables

•部份系統程式使用此擴充語法來指定暫存器使用 , 可有效減少 Inline asm 的數量

•意味著不支援此語法有些東西會無法編譯或著必須修改能編譯– 最大的例子就是 Linux Kernel

register int *foo asm ("a5");

71

Named register variables

•部份系統程式使用此擴充語法來指定暫存器使用 , 可有效減少 Inline asm 的數量

•意味著不支援此語法有些東西會無法編譯或著必須修改能編譯– 最大的例子就是 Linux Kernel

•近期已經進入了 Trunk, 預計 3.5 可支援register int *foo asm ("a5");

72

OpenMP Support

•千呼萬喚始出來的功能

•快速平行化程式的好物

int main(int argc, char *argv[]) { const int N = 100000; int i, a[N];

#pragma omp parallel for for (i = 0; i < N; i++) a[i] = 2 * i;

return 0;}

73

逐漸穩定發展

•以往版本號只有 1.x 2.x 3.x

74

逐漸穩定發展

•以往版本號只有 1.x 2.x 3.x

• bug fix 都是往前衝不會 back port

75

逐漸穩定發展

•以往版本號只有 1.x 2.x 3.x

• bug fix 都是往前衝不會 back port– bug? Fix in ToT!

76

逐漸穩定發展

•以往版本號只有 1.x 2.x 3.x

• bug fix 都是往前衝不會 back port– bug? Fix in ToT!

•在 3.4 後開始出現 3.4.1, 3.4.2!

77

效能與 gcc 相比有輸有贏 !

http://vmakarov.fedorapeople.org/spec/2014/llvmgcc64.html

78

戰爭混亂期

79

編譯選項的不完全相容

•基本常用的選項大致上是通用– -O -I -l -D -u -fPIC -Wl, -Wa, ...

•部份 Open Source Project 會開啟一些 gcc 非預設啟用的最佳化或著是參數來榨取效能

80

編譯選項的不完全相容

• Clang (~3.4.x) 對於不支援選項僅顯示 Warning, 不影響 return code

81

編譯選項的不完全相容

• Clang (~3.4.x) 對於不支援選項僅顯示 Warning, 不影響 return code

• Clang (trunk) 對於不支援選項直接發 Error 並且 abort!– 忽略白名單

82

發 Warning 標準不一

•有些人很龜毛喜歡編譯用 -Wall -Werror– 甚至加掛 -Wextra

•換用 clang 後發現有些地方比 gcc 更龜毛 ...

83

Code 編不過 !?

•「我的 Code 在 gcc 上可以動啊 !」

84

Code 編不過 !?

•「我的 Code 在 gcc 上可以動啊 !」

• gcc/g++ 預設標準是 gnu89/ gnu++98 而不是 c89/c++98–預設支援許多 GNU extension

85

Code 編不過 !?

•「我的 Code 在 gcc 上可以動啊 !」

• gcc/g++ 預設標準是 gnu89/ gnu++98 而不是 c89/c++98–預設支援許多 GNU extension

• clang 預設標準是 c99!–支援許多常見 GNU extension

86

GCC and LLVM collaboration

87

GCC and LLVM collaboration

• GNU Tools Cauldron 2014– July 18-20, 2014 at Cambridge

88

GCC and LLVM collaborationWith LLVM mature enough to feature as the default toolchain in some Unix distributions, and with the inherent (and profitable) share of solutions, ideas and code between the two, we need to start talking at a more profound level. There will always be problems that can't be included in any standard (language, extension, or machine-specific) and are intrinsic to the compilation infrastructure. For those, and other common problems, we need common solutions to at least both LLVM and GCC, but ideally any open source (and even closed source) toolchain. In this BoF session, we shall discuss to what extent this collaboration can take us, how we should start and what are the next steps to make this happen.

89

工商時間安第斯山脈 工商時間

90

工商時間安第斯山脈

好山好水好無聊準時下班氣氛佳

工商時間

91

工商時間安第斯山脈

好山好水好無聊準時下班氣氛佳

Open Source++

工商時間

92

工商時間安第斯山脈

好山好水好無聊準時下班氣氛佳

Toolchain 長期徵人中 ~

Open Source++

工商時間

93

top related