g1gc

48
G1GC 介介 2016/12/24 kojilin

Upload: koji-lin

Post on 15-Apr-2017

474 views

Category:

Software


0 download

TRANSCRIPT

Page 1: G1GC

G1GC 介紹2016/12/24

kojilin

Page 2: G1GC

關於我•koji lin•TWJUG member•LINE Fukuoka• Software Engineer•現在主要都在寫後端

Page 3: G1GC

G1GC•Java SE 7 開始•Java SE 9 成為預設的 GC (JEP 248)•減少 pause time•但可能伴隨較低的 throughput

•比起 CMS ,比較不需要去調整參數比較結果• CMS 也被認為較不適合非常大 heap 的情況

Page 4: G1GC

•heap 用 region 分割•region 的大小會根據 Xmx 變動

Page 5: G1GC
Page 6: G1GC

•region 有分 young gen 和 old gen

OO

O

E

E

E SS

Page 7: G1GC

•Young GC 發生時•包含所有的 young region

OO

O

E

E

E SS

Page 8: G1GC

•Young GC 發生時•依照狀況變成 Survivor 或 Old

OO

O

SS

SO

Page 9: G1GC

51.078: [GC pause (G1 Evacuation Pause) (young)51.078: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 0, predicted base time: 29.00 ms, remaining time: 171.00 ms, target pause time: 200.00 ms]51.078: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 1079 regions, survivors: 42 regions, predicted young region time: 170.99 ms]51.078: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1079 regions, survivors: 42 regions, old: 0 regions, predicted pause time: 199.99 ms, target pause time: 200.00 ms], 0.1092812 secs] … [Eden: 2158.0M(2158.0M)->0.0B(1796.0M) Survivors: 84.0M->102.0M

Heap: 2242.0M(4096.0M)->102.0M(4096.0M)] [Times: user=0.35 sys=0.03, real=0.11 secs]

Page 10: G1GC

Collection Set•Young GC•包含所有 young 世代的 region

Page 11: G1GC

old gen > InitiatingHeapOccupancyPercent

Page 12: G1GC

InitiatingHeapOccupancyPercent

•決定 concurrent marking 開始時機•找出適合回收的 old gen region

•-XX:InitiatingHeapOccupancyPercent=45•超過總 heap 45% 時開始

Page 13: G1GC

24963.770: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 3013607424 bytes, allocation request: 0 bytes, threshold: 3006477040 bytes (70.00 %), source: end of GC]

Page 14: G1GC

Concurrent Marking•Initial marking(STW)•Root region scanning•Concurrent marking•Final mark(STW)•Clean up(part STW)

Page 15: G1GC

Initial Marking•STW•用來 marking heap 內所有直接被 root

objects 直接參照到的物件•跟著 young gc 一起做

Page 16: G1GC

Root region scanning•Concurrently•將前一步 young gc 後的 survivor regions 參照到的物件 mark 起來•所以被稱作 root regions•必須在下一次 gc 前完成,不然又會出現新的

survivor region

Page 17: G1GC

Concurrent Marking•Concurrently•Live data accounting•可以透過 ConcGCThreads 調整效率

Page 18: G1GC

Remark•STW•完成 live data accounting•可以透過 ParallelGCThreads 調整效率

Page 19: G1GC

Cleanup•STW•找出有沒有完全都是垃圾的 region•依照垃圾清除效率排序 old regions

•Concurrent•清空沒有 live data 的 region

Page 20: G1GC

Concurrent Marking2016-11-30T23:12:17.318+0900: 24975.061: [GC pause (G1 Evacuation Pause) (young) (initial-mark)…24975.097: [GC concurrent-root-region-scan-start]24975.115: [GC concurrent-root-region-scan-end, 0.0183740 secs]24975.115: [GC concurrent-mark-start]24975.544: [GC concurrent-mark-end, 0.4295200 secs]24975.545: [GC remark 2016-11-30T23:12:17.803+0900: 24975.545: [Finalize Marking, 0.0004352 secs] 24975.586: [GC cleanup 2909M->2763M(4096M), 0.0184095 secs]24975.604: [GC concurrent-cleanup-start]24975.604: [GC concurrent-cleanup-end, 0.0002595 secs]

Page 21: G1GC

•Mixed GC 發生時•All young gen + some old

OO

O

SS

SO

E E

E

Page 22: G1GC

•Mixed GC 發生時•All young gen + some old

OO

O

SS

SO

E E

E

Page 23: G1GC

•Mixed GC 發生時•All young gen + some old

O

SOS

O

SO

Page 24: G1GC

24988.167: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 11490, predicted base time: 11.34 ms, remaining time: 188.66 ms, target pause time: 200.00 ms] 24988.167: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 92 regions, survivors: 10 regions, predicted young region time: 17.58 ms] 24988.167: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: predicted time is too high, predicted time: 1.28 ms, remaining time: 0.00 ms, old: 169 regions, min: 169 regions] 24988.167: [G1Ergonomics (CSet Construction) added expensive regions to CSet, reason: old CSet region num not reached min, old: 169 regions, expensive: 26 regions, min: 169 regions, remaining time: 0.00 ms] 24988.167: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 92 regions, survivors: 10 regions, old: 169 regions, predicted pause time: 232.41 ms, target pause time: 200.00 ms] 24988.205: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate old regions available, candidate old regions: 1179 regions, reclaimable: 2358748744 bytes (54.92 %), threshold: 4.00 %]

Page 25: G1GC

Collection Set•Young GC•包含所有 young 世代的 region

•Mixed GC•包含所有 young 世代的 region 加上一部份垃圾比例最多的 old 世代的 region

Page 26: G1GC

25000.324: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate old regions available, candidate old regions: 185 regions, reclaimable: 311893632 bytes (7.26 %), threshold: 4.00 %]

25006.772: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 113 regions, reclaimable: 170151528 bytes (3.96 %), threshold: 4.00 %]

Page 27: G1GC

Parameters for mixed gc•InitiatingHeapOccupancyPercent = 45•G1HeapWastePercent = 5•可以浪費的比例, mixed gc 會決定如果垃圾沒超過這個比例就不做•主要是為了讓 mixed collection cycle 可以快一點結束

•G1MixedGCCountTarget = 8•決定一次 concurrent mark 後最多跑幾次 mixed gc•所有對象 old regions / 8 = 每次 mixed gc CSet 最小數量

•G1MixedGCLiveThresholdPercent = 85•尋找預計回收 region 時, live data 低於此值才會被當作對象

Page 28: G1GC

Humongous Object•Object < 50% of region size•會放在 young region

•Object >= 50% of region size•直接放在 old region ,並且 region 內剩餘的空間不會再被使用

Page 29: G1GC

[G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: requested by GC cause, GC cause: G1 Humongous Allocation]

Page 30: G1GC

•加大 region size•但要注意因為加大而造成 region 數減少的效能影響

•找出 root cause•代表有物件佔著連續的記憶體• e.g. 很大的陣列,超多欄位的類別

Page 31: G1GC

Evacuation Failure•Survivor 或 old 不夠時• to-space exhausted

•出現數次後接著就會出現 Full GC•如果是 survivor•增加 G1ReservePercent

Page 32: G1GC

•-Xms16g -Xmx16g -XX:+UseG1GC• InitiatingHeapOccupancyPercent: 45•G1HeapWastePercent: 5•G1MixedGCLiveThresholdPercent: 85•G1HeapRegionSize: 8MB

Page 33: G1GC

•-Xms16g -Xmx16g -XX:+UseG1GC• InitiatingHeapOccupancyPercent: 45•G1HeapWastePercent: 5•G1MixedGCLiveThresholdPercent: 85•G1HeapRegionSize: 8MB

16GB x 0.45 = 7.2GB

Page 34: G1GC

•-Xms16g -Xmx16g -XX:+UseG1GC• InitiatingHeapOccupancyPercent: 45•G1HeapWastePercent: 5•G1MixedGCLiveThresholdPercent: 85•G1HeapRegionSize: 8MB

Full GC

Page 35: G1GC

•因為 old gen 擠壓到 young gen 的空間, young gen collection 惡化

Page 36: G1GC

•為什麼 Full GC 可以清到只剩 < 500MB ? 但是 mixed GC 無法?•加上 log 觀察

-XX:+PrintAdaptiveSizePolicy -XX:+PrintGC-XX:+PrintGCDetails

Page 37: G1GC

32631.357: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 7725907968 bytes, allocation request: 10047216 bytes, threshold: 7730941095 bytes (45.00 %), source: concurrent humongous allocation] ....32654.979: [G1Ergonomics (Mixed GCs) start mixed GCs, reason: candidate old regions available, candidate old regions: 457 regions, reclaimable: 2956666176 bytes (17.21 %), threshold: 5.00 %], 0.1106810 secs] .... [Eden: 6680.0M(6680.0M)->0.0B(536.0M) Survivors: 344.0M->280.0M Heap: 14.0G(16.0G)->7606.6M(16.0G)] [Times: user=2.31 sys=0.01, real=0.11 secs] [GC pause (G1 Evacuation Pause) (mixed) 32656.876: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: old CSet region num reached max, old: 205 regions, max: 205 regions] 32656.876: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 67 regions, survivors: 35 regions, old: 205 regions, predicted pause time: 173.84 ms, target pause time: 200.00 ms] 32656.992: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate old regions available, candidate old regions: 252 regions, reclaimable: 1321193600 bytes (7.69 %), threshold: 5.00 %] [Eden: 536.0M(536.0M)->0.0B(720.0M) Survivors: 280.0M->96.0M Heap: 8142.6M(16.0G)->6029.9M(16.0G)] [Times: user=2.49 sys=0.01, real=0.12 secs] [GC pause (G1 Evacuation Pause) (mixed) ... 32659.727: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: reclaimable percentage not over threshold, old: 66 regions, max: 205 regions, reclaimable: 857822432 bytes (4.99 %), threshold: 5.00 %] 32659.727: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 90 regions, survivors: 12 regions, old: 66 regions, predicted pause time: 120.51 ms, target pause time: 200.00 ms]

Page 38: G1GC

... 32659.727: [G1Ergonomics (CSet Construction) finish adding old regions to CSet, reason: reclaimable percentage not over threshold, old: 66 regions, max: 205 regions, reclaimable: 857822432 bytes (4.99 %), threshold: 5.00 %] 32659.727: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 90 regions, survivors: 12 regions, old: 66 regions, predicted pause time: 120.51 ms, target pause time: 200.00 ms] 32659.785: [G1Ergonomics (Mixed GCs) do not continue mixed GCs, reason: reclaimable percentage not over threshold, candidate old regions: 186 regions, reclaimable: 857822432 bytes (4.99 %), threshold: 5.00 %] [Eden: 720.0M(720.0M)->0.0B(9064.0M) Survivors: 96.0M->64.0M Heap: 6749.9M(16.0G)->5572.0M(16.0G)] [Times: user=1.20 sys=0.00, real=0.06 secs]

Page 39: G1GC

•發現 humongous allocation ,尤其在逼近 old gen 使用到近滿時會頻繁觸發 mixed GC.

•找到沒設計好的 http rest api ,會取得完整後先轉換成 byte[] -> json.

•底層使用 okio 的 Buffer ,會將大的 byte[] 轉換成 byte[] 的 linked list.

•但仍不是 root cause.

Page 40: G1GC

•調整看看 VM 參數• InitiatingHeapOccupancyPercent = 30•G1HeapWastePercent = 3•G1MixedGCLiveThresholdPercent = 90• SoftRefLRUPolicyMSPerMB = 50

•不太有用而且因為提早做而降低了 throughput•超過 30 % 後總是觸發 concurrent marking

Page 41: G1GC

•最後沒招了, heap dump 看看

Page 42: G1GC

•最後沒招了, heap dump 看看

Page 43: G1GC
Page 44: G1GC
Page 45: G1GC

•在我們的環境下,問題出在 lettuce 使用的 LatencyUtils 的 LatencyStats & PauseTracker(Weakreference)

•lettuce 使用 LatencyUtils 來做 command tracking ,並且每 10 分鐘會清掉過去的 LatencyStats 的 reference*https://github.com/mp911de/lettuce/wiki/Command-Latency-Metrics

Page 46: G1GC
Page 47: G1GC

該不該用 G1GC ?•照理上沒問題,但 ElasticSearch & Lucene 尚未建議切換•G1GC 有些 bug 會造成 index 毀損

•套用後還是要記得觀察結果•避免過度的調整參數•盡量使用 MaxGCPausMillis , Xmx , Xms

Page 48: G1GC

Java 9•預設就會是 G1GC•以前的 JVM log 被廢除,參數跟格式都改變• -Xlog:[<selection>]:[<output>]:

[<decorators>]e.g. -Xlog:gc*:file=gc.log