java at scale, dallas jug, october 2013

42
Presented to Dallas JUG October 2013 Matt Schuetze Product Manager Java at Scale: Performance & GC

Upload: azul-systems-inc

Post on 10-May-2015

497 views

Category:

Technology


0 download

DESCRIPTION

Title: Java at Scale - What Works and What Doesn't Work Nearly so Well Speaker: Matt Schuetze, Product Manager, Azul Systems Abstract: Java gets used everywhere and for everything due to its efficiency, portability, the productivity it offers developers, and the platform it provides for application frameworks and non-Java languages. But all is not perfect; developers both benefit from and struggle against Java's greatest strength: its memory management. In this session, Matt will describe where Java needs help, the challenges it presents developers who need to provide reliable performance, the reasons those challenges exist, and how developers have traditionally worked around them. He will then discuss where Zing fits in the spectrum of use cases where large memory and predictable performance dominate essential application characteristics.

TRANSCRIPT

Page 1: Java at Scale, Dallas JUG, October 2013

Presented to Dallas JUG

October 2013

Matt Schuetze

Product Manager

Java at Scale: Performance & GC

Page 2: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 2

Where is Java Working?

• On the server─ Enterprise applications: business rules─ Monolithic & distributed computing

• On the client─ Fat client computing─ Thin client, browser-based

• Embedded─ Android apps

Page 3: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 3

What is Java’s Appeal?

• Portable─ Write once, run anywhere (after testing everywhere)

• Productive─ No bad features: no multiple inheritance, operator overloading─ Do the Right Thing philosophy (vs. C++ Do the Efficient Thing)─ Memory management reduces opportunities for error

• Efficient─ Interpreter → JIT compilation → Dynamic recompilation

• Generic─ Scala, Clojure, JRuby & more use Java runtime─ Byte code is the new target architecture (ANDF)

• Scalable─ Small to large platforms

Page 4: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 4

Parkinson’s Law Applied to Software

• Hardware grows with Moore’s Law─ Transistor counts double roughly every 18 months─ Memory size grows around 100x every 10 years

• Application sizes grow with hardware─ 1980: 100 KB data on ¼ – ½ MB server─ 1990: 10 MB data on16 – 32 MB server─ 2000: 1 GB data on 2 – 4 GB server─ 2010: 100 GB data on 256 GB server─ (In-memory data size. Bigger data is cached or distributed.)

Page 5: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 5

Big Memory Servers are the Standard

• Retail prices, major web server store (US $, Jan 2013)

• Cheap (< $1/GB/Month), and roughly linear to ~1TB

• 10s to 100s of GB/sec of memory bandwidth─ 24 vCore, 128 GB server $5K─ 24 vCore, 256 GB server $8K─ 32 vCore, 384 GB server $14K─ 48 vCore, 512 GB server $19K─ 64 vCore, 1 TB server $36K

Page 6: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 6

Has Java Kept Up? How Scalable is it?

• How big is your Java heap?˃ .5 GB˃ 1 GB˃ 2 GB˃ 4 GB˃ 10 GB˃ 20 GB˃ 50 GB˃ 100 GB

• Hardly anyone runs over 4 GB

Page 7: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 7

• Survey of heap sizes for Plumbr memory leak detector

─ Source: http://plumbr.eu/blog/most-popular-memory-configurations

Large Heaps are a Rarity

Page 8: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 8

• Java performance gets worse with heap size

ehCache: 10 GB cache, 29 GB heap, 48 GB 16 core Ubuntu server

─ Pause frequency varies with application activity─ Pause duration varies with amount to scan/copy

Why So Few Big JVMs on Big Servers?

Page 9: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 9

• What are requirements (percentiles & worst case)?

─ Need to think beyond averages & standard deviations─ GC pauses don’t fit a bell curve

Think in Terms of Service Levels

Page 10: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 10

• Key assumption: response time is a function of load

─ source: IBM CICS server documentation, “understanding response times”

A Classic Look at Application Response

Page 11: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 11

Java Response Has a Different Look

• Pauses may track with load, but not in as obvious a way

─ source: ZOHO QEngine White Paper: performance testing report analysis

Page 12: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 12

A Few Realities About GC

• First the good:─ GC is very efficient, much better than malloc()─ Dead objects cost nothing to collect─ GC will find all the dead objects without help, even cyclic graphs

• Now the bad:─ GC really does stop for ~1 second per GB of live objects

─ You can change when it happens, not if*─ You can still have memory leaks

─ Hold on to objects so GC can’t release them─ No pauses in a 20 minute test doesn’t mean they’re gone

─ “You can pay me now, or you can pay me later.”

* We’ll talk about that later…

Page 13: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 13

How Does a Garbage Collector Work?

• Three phases to GC:─ Identify the live objects

─ Start with stack & statics, flag everything we reach─ Reclaim resources held by dead objects

─ Anything we didn’t flag in the 1st phase─ Periodically relocate live objects (defrag)

─ Move objects together, correct references (remap)

Free

Page 14: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 14

How Does a Garbage Collector Work?

• Three phases to GC:─ Identify the live objects

─ Start with stack & statics, flag everything we reach─ Reclaim resources held by dead objects

─ Anything we didn’t flag in the 1st phase─ Periodically relocate live objects (defrag)

─ Move objects together, correct references (remap)

• Sample implementations:─ Mark/sweep/compact for old generation

─ Three separate passes, minimal extra heap─ Copying collector for new generation

─ Move as we flag, do it all in one pass─ Requires 2x heap

Page 15: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 15

Generational GC

Basic assumption: most objects die young

• Use copying collector on new objects─ Scan small % of heap, need small space for copy area─ Reclaim the most space for the least effort─ Move objects that live long enough to old generation(s)

• Collect old gen as it fills up─ Much less frequent, likely higher cost, lower benefit

• Requires a Remembered Set (e.g. via Card Marking)─ Track references from outside into new gen─ Use as roots for new gen collector scan

• Don’t absolutely need 2x memory for new gen GC─ Can overflow into old gen space

Page 16: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 16

GC Terminology

• Concurrent vs. Parallel─ A concurrent collector does GC while the application runs─ A parallel collector uses multiple CPU cores to perform GC─ A collector may be neither, one, or both

• Concurrent vs. Stop-The-World─ A STW collector pauses the application during part of GC─ A STW collector is not concurrent; it may be parallel

• Incremental─ An incremental collector does its work in discrete chunks─ Probably STW, with big gaps between increments

Page 17: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 17

GC Terminology 2

• Precise vs. Conservative─ A conservative collector doesn’t know every object reference or

doesn’t know if some values are references or not─ Can’t relocate objects if it can’t tell a ref from a value

─ A precise collector knows & can process every reference─ Required to move objects─ Compiler provides semantic information for the collector─ Java relies on precise collection

• Safepoints─ Places in execution (point or range) where collector can identify

every reference in a thread’s execution stack─ We bring a thread to a safepoint and keep it there during GC

─ Might mean pausing the thread, might not (e.g. JNI)─ Safepoints need to be reached frequently─ Global safepoints apply to all threads (STW)

Page 18: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 18

Typical GC Combinations

• New generation─ Usually a copying collector─ Usually monolithic, stop-the-world

• Old generation─ Usually Mark/Sweep/Compact─ May be stop-the-world, or concurrent, or mostly concurrent, or

incremental stop-the-world, or mostly incremental stop-the-world

• Mostly means not always─ Fall back to monolithic stop-the-world (i.e. big pauses)

Page 19: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 19

The Good Little Architect – A Moral Tale

A good architect must be able to impose her architectural choices on her projects

• Once upon a time, Azul met an app with 18 sec pauses─ App had 10s of millions of object finalizations every GC cycle─ Back then, reference processing was a stop-the-world event

• Every class in the project had a finalizer─ All the finalizers did was null every reference field

─ In theory, saves the GC from following pointers─ Right for C++ reference counting, oh so wrong for Java

• Two morals:─ Know the cost of your actions (learn the underlying system)─ Just because it doesn’t cost now doesn’t mean it won’t later

Page 20: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 20

Oracle HotSpot GC Options

• Parallel GC─ New Gen: monolithic STW copying─ Old Gen: monolithic STW mark/sweep/compact

• Concurrent Mark Sweep (CMS)─ New Gen: monolithic STW copying─ Old Gen: mostly concurrent non-compacting

─ Mostly concurrent marking (multipass)─ Concurrent sweeping─ No compaction: free list, no object movement

─ Fallback is monolithic STW mark/sweep/compact

Page 21: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 21

Oracle HotSpot GC Options 2

• Garbage First (G1GC)─ New Gen: monolithic STW copying─ Old Gen:

─ Mostly concurrent marker─ STW to catch up on mutations, reference processing─ Track inter-region relationships in remembered sets

─ STW mostly incremental compactor─ Compact regions that can be done in limited time─ Delay compaction of popular objects & regions─ Goal: “avoid, as much as possible, having a full GC”

─ Fallback is monolithic STW mark/sweep/compact─ Required for compacting popular objects & regions

Page 22: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 22

Where Do Pauses Matter?

• Interactive apps like ecommerce─ Add many seconds to a transaction & maybe lose a customer─ Batch apps care about start-to-finish time, not transactions

• Big data apps─ Travel site wants to keep hotel inventory in memory─ Search app wants to keep entire index in memory

• Efficiency & management─ More work from fewer JVM instances

• Low latency apps─ Financial apps process data as it arrives─ Small number of msecs down to < 1 msec─ Requires low latency OS & significant tuning

Page 23: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 23

Characterizing GC Pauses

• Frequency relates to activity─ Object creation rate─ Object mutation rate

• Severity relates to memory size─ The more we examine & copy, the longer it takes─ New gen is usually not the problem (yet)

• Not how much GC overhead, but where it happens

Page 24: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 24

Limits to GC Overhead

• Worst case: no empty memory = 100% GC─ GC runs hard all the time, reclaiming nothing

• Best case: infinite empty memory = 0% GC─ Just keep creating objects, never collecting

• In between, GC follows 1/x curve as memory grows

CPU

Live set Heap size

100%

0%

Page 25: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 25

How to Measure Pauses

• Identify the magnitude of the problem─ jHiccup: free software from Azul’s CTO (jhiccup.com)

─ Does minimal work & records time to complete─ Long delays indicate JVM wasn’t letting apps run

─ Run against your application─ Results should map well to GC logs─ Results will not include app inefficiencies

─ Run against idle JVM─ Identify pauses from OS, VM, power management

• Don’t fix problems until you know where they lie

Page 26: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 26

What To Do About Pauses

• Apply creative language (the Marketing solution)

─ “Guarantee a worst case of X msec, 99% of the time”

─ “Mostly concurrent, mostly incremental”

─ i.e. “Will at times exhibit long monolithic STW pauses”

─ “Fairly consistent”

─ i.e. “Will sometimes show results well outside this range”

─ “Typical pauses in the tens of milliseconds”

─ i.e. “Some pauses are a lot longer than that”

Page 27: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 27

What To Do About Pauses

• Tune like crazy─ Adjust GC parameters until behavior’s acceptable─ A stopgap, not a solution

• Keep the heap small─ Multiple small instances instead of fewer bigger ones─ Move data out of heap (e.g. external cache)─ Pool your objects (e.g. threads, DB connections)

• Commit ritual murder─ Big heap, kill & restart instance before old gen GC─ Yes, people really do this

• Change your GC─ Move from one that rarely stalls to one that never stalls

Page 28: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 28

Making JVM Pauseless: The Hard Parts

• Robust concurrent marking─ References keep changing─ Multipass marking is sensitive to mutation rate─ Weak, Soft, Final references hard to deal with

• Concurrent compaction─ Moving the objects isn’t the problem

─ It’s fixing all the references to the moved objects─ How do you handle an app looking at a stale reference?

─ If you can’t, remapping is a monolithic STW operation

• New gen collection at scale─ New gen is generally monolithic STW─ Pauses are small because heaps are tiny─ A 100 GB heap means new gen GC has a lot of work

Page 29: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 29

Azul’s Zing JVM

• High performance production JVM─ 64-bit Linux on X86

─ Red Hat, SuSE, Ubuntu, CentOS─ Maximum heap size: 512 GB─ Elastic memory to prevent out-of-memory failures

─ Overdraft protection for your JVM

• Always-on performance & execution monitoring─ System level─ JVM level─ Application level

Page 30: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 30

Azul’s C4 Collector

• Concurrent guaranteed-single-pass marker─ Unaffected by mutation rate─ Concurrent reference processing (weak, soft, final)

• Concurrent compactor─ Moves objects without pausing your application─ Remaps references without pausing your application─ Can relocate entire generation (new/old) in every GC cycle

• Concurrent, compacting old generation

• Concurrent, compacting new generation

• No stop-the-world fallback. Ever.

Page 31: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 31

• Java performance gets worse with heap size

ehCache: 10 GB cache, 29 GB heap, 48 GB 16 core Ubuntu server

─ Pause frequency varies with application activity─ Pause duration varies with amount to scan/copy

Remember This Slide?

Page 32: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 32

• What are requirements (percentiles & worst case)?

─ Need to think beyond averages & standard deviations─ GC pauses don’t fit a bell curve

Think in Terms of Service Levels

Page 33: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 33

• Wikipedia English language index in memory─ 132 GB data in 240 GB heap

─ Ref: blog.MikeMcCandless.com

In-Memory Computing with Lucene

Page 34: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 34

In-Memory Computing with Lucene

• Wikipedia English language index in memory─ 132 GB data in 240 GB heap

─ Ref: blog.MikeMcCandless.com

Page 35: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 35

Always-on Performance Monitoring

• System level activity: CPU, memory, network

Page 36: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 36

Always-on Performance Monitoring

• JVM activity: CPU & memory

Page 37: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 37

Real Time Execution Analysis

Page 38: Java at Scale, Dallas JUG, October 2013

Technical papers

Free trials of Zing VM

Free licenses to OSS committers

www.azulsystems.com

Page 39: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 39

Parallel GC

Page 40: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 40

Concurrent Mark/Sweep

Page 41: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 41

G1GC

Page 42: Java at Scale, Dallas JUG, October 2013

© 2013 Azul Systems 42

Zing C4