jvm virtual method invoking optimization based on cam table
DESCRIPTION
自主决定命运 , 创新成就未来 www.loongson.cn. JVM virtual method invoking optimization based on CAM table. Songsong Cai Institute of Computing Technology, Chinese Academy of Sciences [email protected] 28/7/2011. Outline. Introduction. Monomorphic Inline Caching in HotSpot. - PowerPoint PPT PresentationTRANSCRIPT
page 1
JVM virtual method invoking optimization based on CAM table
自主决定命运 , 创新成就未来www.loongson.cn
Songsong Cai
Institute of Computing Technology, Chinese Academy of Sciences
28/7/2011
page 2
Outline
Introduction
Monomorphic Inline Caching in HotSpot
Hardware Design of CAM Used in Virtual Method Call
SW/HW Co-design Virtual Method Invoking Mechanism
Experimental Results and Analysis
Conclusions
References
page 3
Methods in Java Programming Language (1)
Class (static) Method The class method does not require an instance Class method uses static binding When JVM calls a class method, it will be based on the type
of object reference (usually known while compiling) to select the call method
page 4
Methods in Java Programming Language (2)
Instance (virtual) Method The instance method needs an instance Instance method uses dynamic binding when calling an instance method, the virtual machine will be
based on the actual class object (only known while running) to select the call method
The information of type can be known only when JVM runs to the call site
The dynamic resolution is generally translated into an indirect jump, which can usually lead to pipeline stall
The instance method takes up a large proportion, such as in SPECjvm98, a virtual method call occurs in Java program every 12-40 bytecodes
page 5
Types of method invocation in Java
invokestatic process the static method invocation The entrance of the constant pool includes a symbolic
reference of the target method, then JVM pops the parameters and executes the target method
invokevirtual process virtual method invocation JVM needs to pops the object reference and the parameters
before the execution of the target method Invokespecial
process virtual method invocation invokeinterface
process virtual method invocation
page 6
The percentage of virtual method invocation
0%
20%
40%
60%
80%
100%
al l method i nvocat i on
vi r tual method i nvocat i on
The percentage of virtual method invocation in SPECjvm2008 benchmark
page 7
Related Work : Inline Caching
Origin The call type in the same call site will not change frequently With this locality, we can cache the call type in the call site
Kinds Monomorphic inline caching
store methods and the corresponding type value at the call site in an inline way
For each virtual method call, it compares the type values to jump to the target value method, rather than searches for objective method in many target of virtual method
Polymorphic inline caching different types of target method will be recorded at the same call site The type value of current call can be compared with these types of
storage in turn, until the matching type is found, the program jumps to the corresponding target method
page 8
Shortage of Inline Caching
Monomorphic inline caching It cannot handle the case that several different types of
methods are called frequently in the same call site
Polymorphic inline caching Although polymorphic inline caching can solve the problem
above, its complex implementation will result in additional costs
page 9
Solution: Software and Hardware co-design
Hardware (CAM table) We design and implement CAM (content associated memory)
table to index and search the virtual target method The CAM table is implemented by hardware and can be
managed by software
Software (Efficient Algorithm) With the CAM table, we optimize the virtual method invocation
that the target method can be resolved easily The program can jump to the target method directly, rather
than resolve the virtual method dynamically at runtime
page 10
Thesis Contributions
System architecture We present a Java Virtual Machine system with high
performance of virtual methods invocation. The JVM is simple, but efficient
CAM lookup table We design and implement the CAM hardware lookup table to
help resolve the virtual method. The target method can be easily resolved with the CAM table
Efficient algorithm of virtual method invocation With the CAM table, we present a virtual method invoking
algorithm based on software and hardware co-design. The algorithm attains a relatively high performance on virtual method invocation
page 11
Outline
Monomorphic Inline Caching in HotSpot
Introduction
Hardware Design of CAM Used in Virtual Method Call
SW/HW Co-design Virtual Method Invoking Mechanism
Experimental Results and Analysis
Conclusions
References
page 12
The Virtual Method Invocation in HotSpot
HotSpot the core of the open source project Openjdk6 standards and stability
The invocation of virtual method uses optimized monomorphic inline caching
page 13
The State Transition of Monomorphic Inline Caching
uninitialized
the comparison of the typemisses for many times
numbers of virtual method call
the comparison of the typemisses at the call site
dynamic resolve successfullyat the first time
polymorphic
monomorphic
page 14
A bad case
Var value = {1,”a”,2,”b”,3,”c”,4,”d”};For (var I in values){
Document.write(values[i].toString());}
the program always calls target methods with different types, the performance loss can be very large
Although such extreme case is rare, the type of virtual method call changes very commonly, so the performance loss caused by the overhead of virtual method call is very serious
page 15
Outline
Hardware Design of CAM Used in Virtual Method Call
Monomorphic Inline Caching in HotSpot
Introduction
SW/HW Co-design Virtual Method Invoking Mechanism
Experimental Results and Analysis
Conclusions
References
page 16
The Structure of CAM Table
8 40 64
0
1
2
61
62
63
the current method call instruction PC
XOR type of the method
ASID CAM_value RAM_value
…… ……
page 17
Operating Instructions of CAM Table
Instructions CAMPI
look up CAM table according to the index
CAMPV look up CAM table according to the value
CAMWI write CAM table according to the index
Usage All CAM entries can be written by the instruction CAMWI, and
RAM value can be read by the instruction CAMRI. Instruction CAMPI and CAMPV are used to look up CAM
page 18
Evaluation of CAM Entry Number
0. 00%
20. 00%
40. 00%
60. 00%
80. 00%
100. 00%
16 ent r i es
32 ent r i es64 ent r i es
128 ent r i es
page 19
Outline
SW/HW Co-design Virtual Method Invoking Mechanism
Monomorphic Inline Caching in HotSpot
Hardware Design of CAM Used in Virtual Method Call
Introduction
Experimental Results and Analysis
Conclusions
References
page 20
Flow Diagram of the Virtual Method Invoking Mechanism
look up CAMJump to the target method
and execute
the type comparison of inline cache mechanism by software in HotSpot
1、Jump to the target method and execute2、fill the CAM table
basic dynamic method invocation
1、Jump to the target method and execute2、refill the inline cache at the call site
PC of the call siteXOR
type of the method
hit
miss
hit
miss
resolve successfully
page 21
Comparison between These Three Dynamic Resolutions
CAM hitfoo(){…… }
CAM miss typeof(x) != cached(x) ?
foo(){…… }
dynamic method invocation
X.foo()With CAM
X.foo()With inline cache
X.foo()With the basic
dynamic method resolusion
page 22
Outline
Experimental Results and Analysis
Monomorphic Inline Caching in HotSpot
Hardware Design of CAM Used in Virtual Method Call
SW/HW Co-design Virtual Method Invoking Mechanism
Introduction
Conclusions
References
page 23
Evaluation Platform
Software Hotspot
Hardware Loongson-3 Processor
4-core high-performance general-purpose processor CAM table is implemented in the processor After we add the CAM table, the whole processor area increases
less than 5 ‰, the power consumption increases less than 1‰, the cost is negligible.
page 24
Virtual_Test Evaluation (1)
publ i c cl ass I nl i neCache{
stati c Ani mal [] ani mal = new Ani mal [8]; stati c { ani mal [0] = new Ani mal () ; ani mal [1] = ani mal [0]; ani mal [2] = new Dog(); ani mal [3] = ani mal [2]; ani mal [4] = new Cat(); ani mal [5] = ani mal [4]; }
publ i c stati c voi d mai n(Stri ng argv[]) { i nt i = 0; i nt ret ; whi l e ( i ++ < 1000000) { run(i %6); } }
publ i c stati c voi d run(i nt i ) { i nt ret ; ret = ani mal [ i ] . run(); }}
cl ass Ani mal{ publ i c Ani mal () { }
publ i c i nt run() {i nt a = 12;return a;
System. out. pri nt l n("ani mal i s runni ng. "); }}
cl ass Dog extends Ani mal{ publ i c Dog() { super(); }
publ i c i nt run() {i nt b = 0;i nt c = 23;i nt d = b+c;return d;
System. out. pri nt l n("dog i s runni ng. ") ; }}
cl ass Cat extends Ani mal{ publ i c Cat() { super(); }
publ i c i nt run() { i nt a = 345;
i nt b = a*12;return b;System. out. pri nt l n("cat i s runni ng. ") ;
}
}
page 25
Virtual_Test Evaluation (2)
Virtual test Original Optimized
Inline cache hit rate 13.3% 76.4%
run time of program (second) 36.098 30.250
page 26
SPECjvm98 Evaluation
0. 00%
20. 00%
40. 00%
60. 00%
80. 00%
100. 00%
120. 00%
140. 00%
160. 00%
sl owest bef ore opt i mi zed
f astest bef ore opt i mi zed
sl owest af t er opt i mi zed
f astest af t er opt i mi zed
page 27
Outline
Conclusions
Monomorphic Inline Caching in HotSpot
Hardware Design of CAM Used in Virtual Method Call
SW/HW Co-design Virtual Method Invoking Mechanism
Experimental Results and Analysis
Introduction
References
page 28
Conclusions
Problem The performance loss resulted from the dynamic method
resolution of virtual method call is always an important reason that causes the poor performance of Java language
Solution Design and achieve the hardware of CAM lookup table Present a mechanism of virtual method call based on
hardware and software co-design Performance improvement
In the case that there are frequently multiple types of target method at the same call site
the virtual hit rate increases from 13.3% to 76.4% the performance of the program improves by 16.2% it improves the performance of SPECjvm98 by 6.4% on average
page 29
Outline
References
Monomorphic Inline Caching in HotSpot
Hardware Design of CAM Used in Virtual Method Call
SW/HW Co-design Virtual Method Invoking Mechanism
Experimental Results and Analysis
Conclusions
Introduction
page 30
References (1)
J. Gosling, B. Joy, G. Steele, and G. Bracha. The JavaTM Language Specification. Addison-Wesley, 3rd edition, 2005.
B. Venners, Inside the Java virtual machine: McGraw-Hill Professional, 1999. Karel Driesen. Efficient Polymorphic Calls. The Kluwer International Series in Engineering
and Computer Science. Kluwer Academic Publisher, 2001. K. Driesen, P. Lam, J. Miecznikowski, F. Qian, and D. Rayside. On the predictability of Java
byte codes (abstract) (poster session), In: Addendum to the 2000 proceedings of the conference on Object-oriented programming, systems, languages, and applications (Addendum). Minneapolis, Minnesota, United States: ACM, pp. 127-128, 2000.
L. P. Deutsch and A. M. Schiffman. Efficient implementation of the smalltalk-80 system, In: Proceedings of the 11th ACM SIGACT-SIGPLAN symposium on Principles of programming languages. Salt Lake City, Utah, United States: ACM, pp. 297-302, 1984.
D. M. Ungar, The design and evaluation of a high performance Smalltalk system. 1986. D. Ungar and D. Patterson, What Price Smalltalk. Computer;(United States). 20(1), 1987. http://en.wikipedia.org/wiki/Inline_caching. [J. Dolby and A. Chien. An automatic object inlining optimization and its evaluation. In:
Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, Vancouver, British Columbia, United States: ACM, pp. 345–357, 2000.
page 31
References (2)
O. Lhot´ ak and L. Hendren. Run-time evaluation of opportunities for object inlining in Java. Concurrency and Computation: Practice and Experience, 17(5-6): pp. 515–537, 2005.
V. Sundaresan, L. Hendren, C. Razafimahefa, R. Vallée-Rai, e-Rai, P. Lam, E. Gagnon, and C. Godin. Practical virtual method call resolution for Java, In: Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications. Minneapolis, Minnesota, United States: ACM, pp. 264-280, 2000.
T. Kotzmann and H. M¨ ossenb¨ ock. Escape analysis in the context of dynamic compilation and deoptimization. In: Proceedings of the ACM/USENIX International Conference on Virtual Execution Environments, Chicago, United States: ACM, pp. 111–120, 2005.
U. Hölzle, D. Ungar. Optimizing dynamically-dispatched calls with run-time type feedback, In: Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation. Orlando, Florida, United States: ACM, pp. 326-336, 1994.
R. Veldema, C. J. H. Jacobs, R. F. H. Hofman, and H. E. Bal. Object combining: A new aggressive optimization for object intensive programs. Concurrency and Computation: Practice and Experience, 17(5-6): pp.439–464, 2005.
R. Griesemer and S. Mitrovic, A compiler for the Java HotSpot(tm) virtual machine. The School of Niklaus Wirth: The Art of Simplicity. pp. 133-152.
D. F. Bacon and P. F. Sweeney, Fast static analysis of C++ virtual function calls. ACM SIGPLAN Notices. 31(10): pp. 324-341, 1996.
Craig Chambers and Weimin Chen. Efficient Multiple and Predicated Dispatching. ACM SIGPLAN Notices. 34(10): pp. 238-255, 1999.
北京市海淀区中关村科学院南路 10号 100190No.10 Kexueyuan South Road,zhongguancunHaidian District,beijing 100190,china
Thanks!