how to select superinstructions for ruby

Post on 23-Feb-2016

50 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

How to select superinstructions for Ruby. ZAKIROV Salikh*, CHIBA Shigeru*, and SHIBAYAMA Etsuya** * Tokyo Institute of Technology, dept. of Mathematical and Computing Sciences ** Tokyo University, Information Technology Center. Ruby. Dynamic language Becoming popular recently - PowerPoint PPT Presentation

TRANSCRIPT

How to select superinstructions for Ruby

ZAKIROV Salikh*, CHIBA Shigeru*, and SHIBAYAMA Etsuya**

* Tokyo Institute of Technology,dept. of Mathematical and Computing Sciences

** Tokyo University, Information Technology Center

Ruby

• Dynamic language• Becoming popular

recently• Numeric benchmarks

100—1000 times slower than equivalent program in C

Numeric benchmarks marked in red

* http://shootout.alioth.debian.org/2

Interpreter optimization efforts

• Many techniques to optimize interpreter were proposed– Threaded interpretation– Stack top caching– Pipelining– Superinstructions

• Superinstructions– Merge code of operations executed in sequence

3

Focus of this presentation

Superinstructions (contrived example)

PUSH: // put <imm> argument on stack stack[sp++] = *pc++; goto **pc++;

ADD: // add two topmost values on stack sp--; stack[sp-1] += stack[sp]; goto **pc++;

PUSH_ADD: // add <imm> to stack top stack[sp++] = *pc++; //goto **pc++; sp--; stack[sp-1] += stack[sp]; goto **pc++;

PUSH_ADD: // add <imm> to stack top stack[sp-1] += *pc++; goto **pc++;

Dispatch eliminated

Optimizations applied

4

Superinstructions (effects)

• Effects1. Reduce dispatch overhead

a. Eliminate some jumpsb. Provide more context for indirect branch predictorby

replicating indirect jump instructions

2. Allow more optimizations within VM op

5

Good for reducing dispatch overhead

Superinstructions help when:• VM operations are small (~10 hwop/vmop)• Dispatch overhead is high (~50%)

Examples of successful use in prior research• ANSI C interpreter: 2-3 times improvement

(Proebsting 1995)• Ocaml: more than 50% improvement (Piumarta 1998)• Forth: 20-80% improvement (Ertl 2003)

6

Superinstructions help when:• VM operations are small (~10 hwop/vmop)• Dispatch overhead is high (~50%)

Ruby does not fit well

Hardware profiling data on Intel Core 2 Duo

60-140 hardware ops per VM op

Only 1-3% misprediction overhead on interpreter dispatch

7

BUT

Superinstructions for Ruby

• We experimentally evaluated effect of “naive” superinstructions on Ruby– Superinstructions are selected statically– Frequently occurring in training run combinations

of length 2 selected as superinstructions– Training run uses the same benchmark– Superinstructions constructed by concatenating C

source code, C compiler optimizations applied

8

Naive superinstructions effect on Ruby

9

Number of superinstructions used

Norm

alized execution time

Limited benefit

Unpredictableeffects

4 benchmarks

Branch mispredictions

10

Number of superinstructions used

Norm

alized execution time

2 benchmarks: mandelbrot and spectral_norm

Branch mispredictions, reordered

11

Number of superinstructions used, reordered by execution time

Norm

alized execution time

2 benchmarks: mandelbrot and spectral_norm

So why Ruby is slow?

• Profile of numeric benchmarks

12

Garbage collection takes significant time

Boxed floating point values dominate

allocation

Floating point value boxing

13

OPT_PLUS: VALUE a = *(sp-2); VALUE b = *(sp-1); /* ... */ if (CLASS_OF(a) == Float && CLASS_OF(b) == Float) { sp--; *(sp-1) = NEW_FLOAT(DOUBLE_VALUE(a) + DOUBLE_VALUE(b)); } else { CALL(1/*argnum*/, PLUS, a); } goto **pc++;

New “box” object is allocated on each operation

Typical Ruby 1.9 VM operation

Proposal: use superinstructions for boxing optimization

• 2 operation per allocation instead of 1

14

OPT_MULT_OPT_PLUS: VALUE a = *(sp-3); VALUE b = *(sp-2); VALUE c = *(sp-1); /* ... */ if (CLASS_OF(a) == Float && CLASS_OF(b) == Float && CLASS_OF(c) == Float) { sp-=2; *(sp-1) = NEW_FLOAT(DOUBLE_VALUE(a) + DOUBLE_VALUE(b)*DOUBLE_VALUE(c)); } else { CALL(1/*argnum*/, MULT/*method*/, b/*receiver*/); CALL(1/*argnum*/, PLUS/*method*/, a/*receiver*/); } goto **pc++;

Boxing of intermediate result eliminated

Implementation

15

• VM operations that handle floating point values directly:– opt_plus– opt_minus– opt_mult– opt_div– opt_mod

• We implemented all 25 combinations of length 2– Based on Ruby 1.9.1– Using existing Ruby infrastructure for superinstructions with

some modifications

Limitations

• Coding style-sensitive• Not applicable to other types (e.g. Fixnum,

Bignum, String)– Fixnum is already unboxed– Bignum and String cannot be unboxed

• Sequences of 3 arithmetic instructions or longer virtually non-existent– No occurrences in the benchmarks

16

Evaluation

• Methodology– median time of 30 runs

• Reduction in allocation

17

Results

• Up to 22% benefit on numeric benchmarks• No slowdown on other benchmarks

18

Example: mandelbrot tweak

19

ITER.times do- tr = zrzr - zizi + cr+ tr = cr + (zrzr - zizi)- ti = 2.0*zr*zi + ci + ti = ci + 2.0*zr*zi

• Slight modification produces 20% difference in performance– 4 of 9 arithmetic instructions get

merged into 2 superinstructions– 24% reduction in float allocation

Norm

alized execution time

Discussion of alternative approaches

• Faster GC would improve performance as well– Superinstructions still apply, but with reduced

benefit• Type inference– Would allow to specialize expressions and

eliminate boxing– Interoperability with dynamic code is an issue

• Dynamic specialization– Topic for further research

20

Related work: Tagged values

• Use lower bits of pointers to trigger alternative handling

• Embed floating point value into higher bits• Limited to 64-bit platforms, as Ruby uses double

precision 64 bit floating point arithmetic– Our approach has same effect on 32 and 64 bit

platforms• Allows to eliminate majority of boxed floats• Provides 28-35% benefit (on the same benchmarks)

21

* Sasada 2008

Related work: Lazy boxing

• Java-like language with generics over value-types• Boxing needed to avoid duplication of template

instantiation code for primitive types• Lazy optimization works by allocating boxed

objects in the stack frame, and moving to heap as needed

• Relies on static compiler analysis for escape path detection, and runtime checks

22

* Owen 2004

Related work:Superinstructions

Superinstructions used for code compression– ANSI C hybrid compiler-interpreter – Trimedia code compression system

• Superinstructions chosen statically to minimize code size

Superinstructions used to reduce dispatch overhead– Forth, Ocaml

• Superinstructions chosen dynamically

23

* Piumarta 1998

* Proebsting 1995* Hoogerbrugge 1999

* Ertl 2003

Conclusion

• Naive approach to superinstructions does not produce substantial benefit for Ruby

• Floating point values boxing overhead is a problem of Ruby

• Superinstructions provide some help (up to 22%)

Future work• Eliminate float boxing further– Specializing computation loop

24

top related