optimizing android performance with gcc compiler mar-12-2010, fri name - geunsik lim e-mail - nick...

43
Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri • Name - Geunsik Lim • e-Mail - leemgs.at.gmail.com • Nick - invain ( 인인인 ) • Blog - http://blog.naver.com/invain/ 인 인인인 인인인인 인인 인 인인인인 인인 인인 , 인인인 인인인인 “인인인인”인 인인인인인 인인인인 인인인 .

Upload: godwin-damon-mcdonald

Post on 27-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

Optimizing Android Performancewith GCC Compiler

Mar-12-2010, Fri

• Name - Geunsik Lim • e-Mail - leemgs.at.gmail.com• Nick - invain ( 인베인 )• Blog - http://blog.naver.com/invain/

본 문서는 자유롭게 수정 및 재배포가 가능 하나 , 자료의 재사용시 “ 자료출처” 를 우측하단에 표기해야 합니다 .

Page 2: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

CONTENTS

Optimization Strategies for the lightweight android

Android Toolchain Roadmap

Building Android Toolchain

GA Search For Compiler Options

Thoughtful abstraction & specifications

Profile-Guided Optimization

FDO Illustration & Performance

Lightweight IPO (LIPO)

Redundancy Elimination

Optimizing Dalvik Memory Management

Observation of WebView Bench & Fhourstones

Experimental Result

Systematic Optimizations

Android Technology Session

Reference: GCC internals manual, Shih-wei Liao’s Paper, Dan Kegel’s crosstool, Fedora11 documentation(SMP)

http://leemgs.fedorapeople.org

Page 3: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

3/435th Korea Android Conference

PerformanceOptimization

• In mathematics and computer science, mathematical programming, refers to choosing the best element from some set of available alternatives.

• The first optimization technique, which is known as steepest descent, goes back to Gauss (mathematician and scientist).

• This means solving problems in which one seeks to minimize or maximize a real function by systematically choosing the values of real or integer variables from within an allowed set.

What is Optimization?

• Studies in optimizing: Code size,

Performance, Power Embedded s/w size

2000 2005 2010 2015 2020 2025

2030

Page 4: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

4/435th Korea Android Conference

PerformanceOptimizationWhere is a Hole for

Optimization?

Application ?

Hardware ?

OS Kernel ?

Middleware ?

(Dalvik, Core/Func lib)

(Snapdragon,S5PC1XX)

(Application framework, Application)

(Linux)

Page 5: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

5/435th Korea Android Conference

PerformanceOptimization

1) Data-driven tool deployment: Regularly evaluate & then leverage the winner among

optimizing toolchains

2) Judicious abstraction & specifications: A fundamental methodology Visibility of a function should match the API spec in

programmer’s design Tradeoff in splitting into Java and Native: This interface affects

performance PacketVideo(=Opencore/OpenMax; Multimedia framework):

Semiconductor industry looks for APIs to differentiate

3) Systematic parameter setting: A key driver in performance/size

7 Optimization Strategies 1/2

Page 6: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

6/435th Korea Android Conference

PerformanceOptimization

4) Profile-guided optimizations: A useful methodology Feedback-Directed Optimizations (FDO): Build-Run-Build

with our arm-xxx-eabi-gcc Class loading profiler (aka Preload profiler): Zygote’s

preloading Trade-off between boot-up time and app init time.

5) Scope-enhancing optimizations: Interprocedural optimizations via arm-xxx-eabi-gcc –fripa In the current implementation, -fripa only turns on cross

module inlining analysis.

6) Redundancy elimination: Identical Code Folding (ICF)

7) Memory management optimization in Dalvik in the interest of time.

7 Optimization Strategies 2/2

Page 7: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

7/435th Korea Android Conference

PerformanceOptimization

• Analyze the tools candidates:

Data-driven tool deployment

gcc-4.4.1(open source)

gcc-4.4.0(google)

gcc-4.3.3(cs)

gcc-4.3.3(open source)

gcc-4.3.1(google)

gcc-4.2.1(google)

Without Code Sourcery 2009Q3

Android’s toolchain for eclair

Code Sourcery 2009Q1

Without Code Sourcery 2009Q1

Android toolchain for eclair

Android toolchain for Dount

Size improvement on Dream phone

Speedup on Dream phone (Run 100X)

Google track 13 numbers daily. They got space to show 4 here.

원가경쟁력 제품 차별화

Source: google

Page 8: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

8/435th Korea Android Conference

PerformanceOptimization

• Based on Google Android perflab benchmark results, – Baseline: Donut(ver1.6)’s toolchain: gcc-4.2.1 – Size:

Both gcc-4.4.X : 17.8% improvement Both gcc-4.3.3 & gcc-4.3.3 Code Sourcery Version: 15%

better gcc-4.3.1: 3% improvement

– Performance: No significant variance among 6 toolchains - gcc-4.4.3’s size benefit comes with no performance penalty

• Code Sourcery for ARM doesn’t have significant performance / size benefit over Android’s version of gcc.

– Code Sourcery’s strength: Addressing ARM’s hardware errata early. We have to port the fixes to gcc-4.4.3

• gcc-4.4.3 wins Toolchain moved to 4.4.3; Skipping 4.3

Analyze 6 Toolchains

Page 9: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

9/435th Korea Android Conference

PerformanceOptimization

• All pieces from open source GCC, binutils, gdb, gmp, mpfr Patch for bug fixing and optimization

• Take patches from upstream• Submit our patches to upstream• Also, native developers can use Android NDK

http://developer.android.com/sdk/android-2.1.html (API Level 7, Jan 2010)

Android Toolchain Roadmap

cupcake donut eclair (armv7) kandroid

gcc 4.2.1 4.2.1 4.4.0 4.4.3

binutils 2.17 2.17 2.19 2.20

gdb 6.6 6.6 6.6 7.0.1

gmp 4.2.2 4.2.2 4.2.4 4.3.2

mpfr 2.3.0 2.3.0 2.4.1 2.4.2

S/WBranch

Page 10: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

10/435th Korea Android Conference

PerformanceOptimization

• Google changed default cross-compiler on Nov-16-2009.• Default architecture is still armv5te for compatibility.

Latest Android Toolchain

Page 11: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

11/435th Korea Android Conference

PerformanceOptimization

• Android uses Bionic C library BSD license: Keeps GPL out of user’s sphere for Android

market. Small and fast more than glibc , uclibc. . glibc 2.11 : /lib/libc.so 1,208,224 bytes

. uClibc 0.9.30: /lib/libc.so 424,235 bytes

. Bionic éclair : /system/lib/libc.so 243,948 bytes

Bionic has built-in support for important Android specific services, - e.g., system properties, logging

Very limited support for POSIX, C++, etc

• If need libstdc++-v3: Enable libstdc++-v3 when configure the toolchain. Statically link in the necessary components . -/system/lib/libstdc++.so ( 5,124bytes)

Building Android Toolchain (1/2)

Reduce size extremely.

Page 12: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

12/435th Korea Android Conference

PerformanceOptimization

• Barebone-style building: Inside Android tree Specify all system and bionic header file paths, shared

library, paths, libgcc.a, crtbegin_*.o, crtend_*.o, etc.

• Standalone-style building: Latest prebuilt gcc-4.4.0 toolchain Convenient for native developers: arm-xxx-eabi-gcc -mandroid --sysroot=<path-to-sysroot > hello.c -o

hello (<path to sysroot> is a pre-compiled copy of Bionic) Download:

Old) http://android.git.kernel.org/?p=platform/prebuilt.git;a=tree;f=linux-x86/toolchain;h=1cf27fca792be850f7b18e0c76762787c7b5c8c9;hb=4b06260a916be762d0dd1b93e97306f1b90e3889

Now) http://android.git.kernel.org/pub/?C=M;O=D

Building Android Toolchain (2/2)

Page 13: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

13/435th Korea Android Conference

PerformanceOptimization

• Bionic library includes POSIX C thread libraries with /system/lib/libc.so file.(./bionic/libc/include/pthread.h)

• Android's POSIX thread api don’t support pthread_rwlock_*** , thread_rwlock_attr_*** , pthread_barrior_***, pthread_barrior_attr_***, pthread_spin_*** for POSIX 1003.1J-2000 Standard.• Android toolchain consist of GDB utility using /system/lib/lib_thread_db.so for thread debugging of Android application.

Thread API List

Thread functions according to bionic

eclair

Page 14: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

14/435th Korea Android Conference

PerformanceOptimization

• Utilize your Linux Desktop based on multi-core to build Android.

• The purpose of the “make(by Paul Smith)” utility is to determine automatically which pieces of a large program need to be recompiled, and issue the commands to recompile them.

• The `-j' or `--jobs' option tells make to execute many commands simultaneously.

How to compile android source faster 1/2

F11-invain#> vi build-android-kernel.sh#!/bin/bash# created by invain for the best performance when compiling kernel source.realnum=`cat /proc/cpuinfo | grep cores | wc -l `let bestnum=$realnum+$(printf %.0f `echo "$realnum*0.2"|bc`)schedtool –B –n 1 –e make -j `echo $bestnum` uImage

• This is a Bash shell script to compile of android full sources quickly.

Page 15: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

15/435th Korea Android Conference

PerformanceOptimization

• Evaluation when compiling android full sources.

How to compile android source faster 2/2

Tested on Intel Core i5 Lynfield 750 (Quad @2.66Ghz) by DeolPooltime make -j4   : 19m 10stime make -j5   : 18m 52s Recommendation time make -j8   : 19m 15stime make  -j64 : 19m 54s

ConnectBot

Tested on Intel Core2 Quad Yourkfield  Q9400 (Quad @2.66Ghz) by invaintime make -j4   : 22m 49stime make -j5   : 22m 31s Recommendationtime make -j8   : 28m 47stime make -j64  : 51m 19s

Page 16: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

16/435th Korea Android Conference

PerformanceOptimization

• CPU Core Specification

How to confirm 32bit/62bit about CPU & Linux

[invain@fedora11 ~]$ grep flag /proc/cpuinfoflags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 lahf_lm tpr_shadow vnmi flexpriority

• lm flag is abbreviation of “Long Mode(64bit)”.

[invain@fedora11 ~]$ uname -aLinux invain 2.6.33-rt4-smp #1 SMP Tue Feb 26 23:11:04 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux

• Linux Kernel Information

Page 17: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

17/435th Korea Android Conference

PerformanceOptimization

1. Goal: Visibility of a function should match the API spec in programmer’s design.

2. Solution:First, systematically applying the 5

steps.Fundamentally, need to go through theAPIs of each library:

Consciously decide what should be

“public” and what shouldn’t.

3. Result: ~500 KB savings for Opencore libs

4. Key: The whole hidden functions can be

garbage collected if unused locally: 5. Toolchain’s options:-ffunction-sections, -Wl,--gcsections,

Thoughtful abstraction & specifications

-fvisibility=hidden

Linux-arm.mk

Android.mk+

*.h__attribute__((visibility(“public”))

)

function decl

invain@fedora11$> make -j <???>

/tmp/GoOgLe.o: In function foo

Bar.c: undefined reference to “baz”

__attribute__((visibility(“public”)))

Int baz;

Until no failure

1

2

3

4

5

Page 18: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

18/435th Korea Android Conference

PerformanceOptimizationParameter Setting

• Parameters setting is a key driver in performance/size optimizations

• Case study: For Android tree, find the best: Compiler parameters Compiler options

• Parameter space exploration via genetic algorithm. (GA)

Genetic algorithm (GA)? a search technique used in computing to find exact or approximate solutions to optimization and search

problems. Ref http://www.genetic-programming.com

Page 19: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

19/435th Korea Android Conference

PerformanceOptimizationGA Search For Compiler

Options

Optimization target Fitness function

Performance Inverse of execution time

Size Inverse of code size

Initial a population of random generated

option sets

Drop a portion of the option Sets that build binaries with

Lower fitness values

An expected result Reaches or we don’t

Have enough time forsearching

Produce new option sets byCrossover and mutation of

The remaining ones

initialization Selection

Termination ReproductionTerminatio

n

Page 20: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

20/435th Korea Android Conference

PerformanceOptimizationOptions That Control Optimization

• “-O0”: Reduce compilation time and make debugging produce the expected results. This is the default.

• “-O1”: Optimizing compilation takes somewhat more time, and a lot more memory for a large function.

• “-O2”: Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. For Kernel/App.

• “-O3”: Turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload and -ftree-vectorize options.

• “-Os”: Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size

These options control various sorts of optimizations.

Page 21: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

21/435th Korea Android Conference

PerformanceOptimization

• We search for a configuration that reduces size the most using compiler option search approach

Reduce Code Size by Option Search

• Android default inline options:-finline-functions-fno-inline-functions-called-once

• Options that we found:-finline-fno-inline-functions-finline-functions-called-once--param max-inline-insns-auto=62--param inline-unit-growth=0--param large-unit-insns=0--param inline-call-cost=4

GCC-4.2.1 GCC-4.4.3 GCC-4.4.3(tuned inline options)

Native systemimage

23,839,291

23,027,032

22,087,436

(unit: byte)

GCC-4.2.1 GCC-4.4.3 GCC-4.4.3(tuned)Native system image size

Page 22: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

22/435th Korea Android Conference

PerformanceOptimization

Profile-Guided Optimization: Toolchainenables FDO (Feedback-Directed Optimization)

Must spill tmp1 or tmp2

Before defining tmp3

tmp1 = . . . tmp2 = . . .

. . . tmp3 = . . .

tmp1 = . . . tmp2 = . . .

. . . tmp3 = . . .

. . . = tmp1 . . . = tmp1 . . . = tmp2 . . . = tmp2

. . . . . .

Page 23: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

23/435th Korea Android Conference

PerformanceOptimization

1. Build twice.2. Find representative input3. Instrumentation run: 2~3X slower but this perturbation is OK, because threading in Android is not that time sensitive (After all, ARM11 or Coretex-A8 core)4. 1 profile per file, dumped at application exit.

Instrumentation Based FDO

arm-xxx-eabi-gcc –fprofile-generate=./profile . . .

arm-xxx-eabi-gcc –fprofile-generate=./profile . . .

arm-xxx-eabi-gcc –fprofile-use=./profile.zip . . .

arm-xxx-eabi-gcc –fprofile-use=./profile.zip . . .

OptimizedBinary

with FDO

OptimizedBinary

with FDO

Run the instrumented binary

Run the instrumented binary Profile.zipProfile.zip

Instrumented Binary

Instrumented Binary

RepresentativeInput Data

RepresentativeInput Data

1

2

3

http://gcc.gnu.org/onlinedocs/gcc-4.4.3/gcc.pdf (Page 102)

Page 24: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

24/435th Korea Android Conference

PerformanceOptimization

Global hotness for ARM (HOT_BB_COUNT_FRACTION, Branch prediction routine for the GNU compiler, gcc-4.4.x/gcc/predict.c)

1% improvement on android's skia library as belows. smaller effects on smaller android benchmarks.

FDO Performance

Content Work default fdo-default fdo-modified

Size of libskia 7,879,646 7,396,032 7,319,668

Size reduction 0.00% 6.14% 7.11%

Stdev (over 100 runs)

0.28 0.63 0.26

Speedup 1 0.98 0.97

(unit: bytes)

Source: google

Page 25: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

25/435th Korea Android Conference

PerformanceOptimization

• Optimization opportunity

Decided by scope of the code compiler can see

• Scope limited mainly by artificial source boundaries

IPO enhances the scope

Scope-Enhancing OptimizationInter-Procedural Optimizations (IPO)

parent.c:• int foo(int i, int j)• {• return bar (i,j) + bar (j,i);• }

child.c:• int bar(int i, int j)• {• return i - j;• }

Page 26: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

26/435th Korea Android Conference

PerformanceOptimization

• Parameters setting is a key driver in performance/size optimizations

• Case study: For Android tree, find the best: Compiler parameters

Problem with Traditional IPO

CMI: Cross Module Inlining

Page 27: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

27/435th Korea Android Conference

PerformanceOptimization

• To get the best potential out of IPO Integrate IPO with FDO, seamlessly! perf (IPO + FDO) > perf (IPO) + perf (FDO)

• Move Inter-Procedural Analysis (IPA) to the end of training run execution, into the binary -- make global decisions earlier!

• Write IPA results into profile

• During profile-use compilation, Compile each file, as usual, with augmented profile Read additional IPA results Suck in auxiliary modules and extend scope

Solution: Profile Feedback Based Lightweight IPO (LIPO)

☞ Memo http://gcc.gnu.org/wiki/LightweightIpo

Page 28: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

28/435th Korea Android Conference

PerformanceOptimization

• LIPO targets C/C++: Android uses C/C++.

(except for some assembly code)

• Baseline: FDO enabled

• Degradations are in noise range.

LIPO Improves Performance: Use -fripa

We just got the ARM version of LIPO to work: Run: f11#> arm-xxx- eabi-gcc –fprofilegenerate=/data/local/profile

–fripa -mandroid Replace: –fprofile-generate with –fprofile-use at the end of optimization

Page 29: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

29/435th Korea Android Conference

PerformanceOptimization

SPEC2000 on x86

Improvement SPEC2006 on x86

Improvement

177.mesa 1.33% 433.milc 1.75%

164.gzip 3.83% 477.dealII 2.03%

175.vpr 3.43% 453.povray 12.74%

253.perlbmk 1.94% 445.gobmk 1.26%

254.gap 3.76% 458.sjeng 5.45%

255.vortex 21.12% 464.h264ref 8.51%

252.eon 3.42% 473.astar 0.72%

The Standard Performance Evaluation Corporation (SPEC) is a non-profit corporation formed to establish, maintain and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of high-performance computers. (http://www.spec.org/)

Performance Evaluation Result

Page 30: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

30/435th Korea Android Conference

PerformanceOptimization

• Identify identical functions and merge them at link time.

• Implemented in the binutils gold linker. Triggered with option --icf.

• Debug support available through call tables.

• ICF on gold yields 5% on x86-64 binaries

• We are still getting gold linker to work with AndroidARM. We estimate ~5% further Android size reductionon top of garbage collection. Stay tuned.

Redundancy Elimination: Identical Code Folding (ICF)

Page 31: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

31/435th Korea Android Conference

PerformanceOptimization

• Each Dalvik(by Dan Bornstein) Virtual Machine has its own heap

• Dalvik use dlmalloc API to manage its heapAllocate memory by mspace_callocRelease memory by mspace_free

Optimizing Memory Management

DalvikDalvik

Dalvik HeapDalvik Heap

lease object

mspace_free

new object

mspace_calloc

Page 32: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

32/435th Korea Android Conference

PerformanceOptimization

• Various Headrooms for Memory Management Optimizations. Some of them have the same size

Various Headrooms forMemory Management Optimizations

Size Count Ratio

24 16,435 34.40%

20 5,464 11.40%

36 4,474 09.40%

. . . . . . . . .

High ratio objectSizes in

WebViewBench

ObjectsAllocation log inWebViewBench

. . .

[Ljava/util/HashMap$Entry;:24

Ljava/util/HashMap$Entry;:24

Landroid/webkit/PerfChecker;:16

Landroid/webkit/LoadListener;:156

Landroid/webkit/ByteArrayBuilder;:20

Ljava/util/LinkedList;:20

Ljava/util/LinkedList$Link;:20

Ljava/util/LinkedList;:20

Ljava/util/LinkedList$Link;:20

Ljava/lang/String;:24

Ljava/lang/String;:24

Landroid/webkit/FrameLoader;:48

Ljava/lang/String;:24

. . .

Page 33: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

33/435th Korea Android Conference

PerformanceOptimization

• The size ratio between allocation and release is almost same

Observation of WebView Bench

Page 34: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

34/435th Korea Android Conference

PerformanceOptimization

• This integer benchmark solves positions in the game of connect-4, as played on a vertical 7x6 board.

• Ratio of Size = 44 is extremely high in this case• http://homepages.cwi.nl/~tromp/c4/Fhourstones.tar.gz

Observation of Fhourstones (FreeBSD benchmarks)

Page 35: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

35/435th Korea Android Conference

PerformanceOptimization

• Optimization: Add a buffer cache of memory chunks

Many Objects Alloc/Released in Short Time

DalvikDalvik

Dalvik HeapDalvik Heap

Buffer Cache

Buffer Cache

Memory Chunk(size = 24)

Memory Chunk(size = 24)

Release a String Object.(size = 24)

Release a String Object.(size = 24)

Buffer Cache: Release

Page 36: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

36/435th Korea Android Conference

PerformanceOptimizationBuffer Cache: Allocate

DalvikDalvik

Dalvik HeapDalvik Heap

Buffer Cache

Buffer Cache

Memory Chunk(size = 24)

Memory Chunk(size = 24)

I need String Object.(size = 24)

I need String Object.(size = 24)

Do you have memory chunk which

size = 24 ?

Do you have memory chunk which

size = 24 ?

Memory Chunk(size=24)

Memory Chunk(size=24)

Page 37: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

37/435th Korea Android Conference

PerformanceOptimization

• Release Performance Improvement in Fhourstones

Experimental Result 1/2

• Allocation Performance Improvement in Fhourstones

Source: googleBuffer cache

slotsBuffer cache

slots

No Pool 16,384 65,536 No Pool 16,384 65,536

Page 38: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

38/435th Korea Android Conference

PerformanceOptimization

• Release Performance Improvement in WebViewBench

Experimental Result 2/2• Allocation Performance

Improvement in WebViewBench

Source: googleBuffer cache

slots

No Pool 16,384 65,536

Buffer cache slots

No Pool 16,384 65,536

Page 39: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

39/435th Korea Android Conference

PerformanceOptimization

1. Toolchain: Regularly evaluate and leverage E.g., leverage the newest lightweight IPO and ICF

2. There is no substitute for thoughtful abstraction & Specifications

3. Systematic parameter setting: A key driver to performance

4. Data-driven: Profile it

5. Optimizing memory time for Android/Dalvik is important.

Summary

• Systematic Optimizations

Page 40: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

40/435th Korea Android Conference

PerformanceOptimization

THANKS

Page 41: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

41/435th Korea Android Conference

PerformanceOptimization

Quiz#1) Throughput according to /init daemon

./android-2.1/system/core/sh/init.c

(minimal bootable environment )

1) Static build 하여 만든 init 을 실행하면 ,

2) Shared(Dynamic) build 하여 만든 init 을 실행하면

3) Shared Build 한 후 Pre-link 기술 적용 후 init 을 실행하면

4) Toolbox 소스 사이즈가 작을 때는 Static build 를 , 소스가 클 때는 Shared build 를 하여 init 을 실행하면

• 안드로이드 플랫폼에서 프로세스들의 조상인 /init 실행 파일의 경우

Power On 시 , 가장 이상적으로 QuickBoot 를 할 수 있다 .

Page 42: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

42/435th Korea Android Conference

PerformanceOptimization

Quiz#2) License Issue of C++ standard lib

• Android Platform 의 rootfs 에 사용되는 C++ 표준 라이브러리(/system/lib/libstdc++.so) 는 GPL 라이센스입니다 . 그렇다면 ,

이 라이브러리내의 함수들을 링크하여 동작하는 Userspace 의 코드

( 예 : *.apk) 들은 고객이 요청시 소스가 모두 공개되어야 할까요 ?

1) 당연하다 . 고객이 요청한다면 해당 상용 애플리케이션은 공개 해야 한다 .

2) 공식적으로 안드로이드는 Apache License 이므로 , 공개 하지 않아도 된다 .

3) C++ 표준 Lib 가 GPL 이라 하더라도 , 예외 조항 전문을 제품매뉴얼에 표기하여 애플리케이션의 소스를 고객에게 공개를 하지 안해도 된다 .

4) 애플리케이션 구매자에게는 공개해야 할 의무가 있고 , 비구매자의 요청에 대해서는 공개하지 않아도 된다 .

5) 애플리케이션 판매자가 재빨리 전화번호 변경 후 , 잠시 隱遁하면 되는 일이다 .

Page 43: Optimizing Android Performance with GCC Compiler Mar-12-2010, Fri Name - Geunsik Lim e-Mail -   Nick - invain ( 인베인 ) Blog -

43/435th Korea Android Conference

PerformanceOptimization

Quiz#3) How to get free memory maximumly

• 아래 그림에서 사용 가능한 RAM 용량을 봅시다 . Before(54MB) 이고 , After(93MB) 입니다 . 대략 2 배 정도의 차이를 보이고 있습니다 . 그 원인을 무엇일까요 ?

AfterBefore