In this presentation, we will introduce Hotspot's Garbage First collector (G1GC) as the most suitable collector for latency-sensitive applications running with large memory environments. We will first discuss G1GC internal operations and tuning opportunities, and also cover tuning flags that set desired GC pause targets, change adaptive GC thresholds, and adjust GC activities at runtime. We will provide several HBase case studies using Java heaps as large as 100GB that show how to best tune applications to remove unpredicted, protracted GC pauses.
3. Motivation
HBase SLAs driven by query response times
Unpredictable and lengthy (seconds+) stop the world garbage collection
burdens applications (resulting in not mission critical use)
Frequent GC affect throughput
Growing memory capacity (200+GB) not uncommon
Need for a more efficient GC algorithm to tame 100GB+ Java Heap
3
10. G1 Collector Promises
Concurrent marking
Nature compaction
Simplified tuning
Low and predictable latency
10
Not yet there, but getting closer…
11. Garbage First Collector Architecture Overview
• Heap is divided to ~2K non-contiguous regions of eden, survivor,
and old spaces, region size can be 1MB, 2MB, 4MB, 8MB, 16MB,
32MB.
• Humongous regions are old regions for large objects that are larger
than ½ of region size. Humongous objects are stored in contiguous
regions in heap.
• Number of regions in eden and survivor can be changed between
GC’s.
• Young GC: multi-threaded, low stop-the-world pauses
• Concurrent marking and clean up: mostly parallel, multi-threaded
• Mixed GC: multi-threaded, incremental GC, collects and compacts
heap partially, more expensive stop-the-world pauses
• Full GC: Serial, collects and compacts entire heap
11
* Picture from: http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/G1GettingStarted/index.html
12. Garbage First Collector Parameters
G1GC Diagnostic Flags
-XX:+PrintFlagsFinal
Prints all JVM runtime flags when JVM starts (no
overhead)
-XX:+PrintGCDetails
Prints GC status (Must have for GC tuning, low overhead)
-XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
Tracks time stamps for each GC activity (low overhead)
-XX:+PrintAdaptiveSizePolicy
Prints information every time when GC decides to change
any setting or hits certain conditions
-XX:+PrintReferenceGC
Prints GC reference processing for each GC
G1GC Performance Flags
-XX:+UseG1GC -Xms100g –Xmx100g
-XX:+ParallelRefProcEnabled
Uses Multi-threads in parallel to process references
-XX:MaxGCPauseMillis=100
Sets low desired GC pause target, the default is 200ms
-XX:ParallelGCThreads=48
Sets number of Parallel GC threads
Recommended: 8+(#of_logical_processors-8)(5/8)
12
13. Garbage First Collector Parameters
G1GC Diagnostic Flags
-XX:+PrintFlagsFinal
Prints all JVM runtime flags when JVM starts (no
overhead)
-XX:+PrintGCDetails
Prints GC status (Must have for GC tuning, low overhead)
-XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
Tracks time stamps for each GC activity (low overhead)
-XX:+PrintAdaptiveSizePolicy
Prints information every time when GC decides to change
any setting or hits certain conditions
-XX:+PrintReferenceGC
Prints GC reference processing for each GC
G1GC Performance Flags
-XX:+UseG1GC -Xms100g –Xmx100g
-XX:+ParallelRefProcEnabled
Uses Multi-threads in parallel to process references
-XX:MaxGCPauseMillis=100
Sets low desired GC pause target, the default is 200ms
-XX:ParallelGCThreads=48
Sets number of Parallel GC threads
Recommended: 8+(#of_logical_processors-8)(5/8)
13
Ideally, the only option needed
19. 0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000 3500 4000
GCpausetime(ms)
Time since the JVM launched
G1GC pause time series
G1 Baseline
19
71% > 100ms
26% > 200ms
11% > 300ms
3 pauses ~400ms
-XX:+UseG1GC -Xms100g –Xmx100g -XX:MaxGCPauseMillis=100
20. How-to Improve Garbage Collection Behavior
1. Enable GC logging
2. Look for outliers (long pauses) in the logs
3. Understand the root of long GC pauses
4. Tune GC command line to avoid or alleviate the symptoms
5. Examine logs and repeat at step #2 (multiple iterations)
20
23. Root Causes and Possible Solutions
Mixed GC takes much more time to free up less wasted regions
Not cost efficient
Heap is not full, could leave it for next collection cycle
Exclude most expensive mixed GC by increasing waste percent
Increase –XX:G1HeapWastePercent to 20% ( default 5%)
23
24. Examining Logs – Sample log snippet #2
24
[Eden: 4480.0M(4480.0M)->0.0B(4480.0M) Survivors: 640.0M->640.0M Heap: 70.5G(100.0G)->68.4G(100.0G)]
…
[Eden: 4480.0M(4480.0M)->0.0B(4480.0M) Survivors: 640.0M->640.0M Heap: 70.2G(100.0G)->68.4G(100.0G)]
…
[Eden: 4480.0M(4480.0M)->0.0B(4480.0M) Survivors: 640.0M->640.0M Heap: 70.6G(100.0G)->70.3G(100.0G)]
…
[Eden: 4480.0M(4480.0M)->0.0B(4480.0M) Survivors: 640.0M->640.0M Heap: 71.9G(100.0G)->69.5G(100.0G)]
30GB heap is unused
because concurrent marking
phase started too early
25. Root Cause and Solution
Concurrent marking started when heap still has sufficient free space
Causing extra GC pauses
Extra copying for objects die in near future
Goal is to increase heap usage
Increase “InitiatingHeapOccupancyPercent”, so more heap can be used
Increase “ConcGCThreads”, so concurrent marking phase can be
completed early enough to avoid full GC
25
27. GC log After Tuning – Sample log snippet
27
[Eden: 4800.0M(4800.0M)->0.0B(4640.0M) Survivors: 640.0M->704.0M Heap: 81.5G(100.0G)->78.6G(100.0G)]
…
[Eden: 4640.0M(4640.0M)->0.0B(5664.0M) Survivors: 640.0M->672.0M Heap: 81.2G(100.0G)->78.3G(100.0G)]
…
[Eden: 4480.0M(4480.0M)->0.0B(4480.0M) Survivors: 640.0M->640.0M Heap: 82.2G(100.0G)->79.6G(100.0G)]
…
[Eden: 4640.0M(4480.0M)->0.0B(7712.0M) Survivors: 640.0M->640.0M Heap: 80.4G(100.0G)->77.4G(100.0G)]
Heap usage increased from
70GB to 80GB without Full GC
28. 0
50
100
150
200
250
300
350
400
0 500 1000 1500 2000 2500 3000 3500 4000
GCpausetime(ms)
Time since the JVM launched
G1GC pause time series
G1 After Tuning
28
40% > 100ms
8% > 200ms
1% > 300ms
1 pause 340ms
2189 GC pauses, average 105ms, 99percentile 310ms
29. 0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000 3500 4000
Time since the JVM launched
G1GC pause time after tuning
0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000 3500 4000
GCpausetime(ms)
Time since the JVM launched
G1GC pause time before tuning
G1Before and After Tuning
29
40% > 100ms
8% > 200ms
1% > 300ms11%
26%s
71%s
30. 0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000 3500 4000
Time since the JVM launched
G1GC pause time after tuning
0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000 3500 4000
GCpausetime(ms)
Time since the JVM launched
G1GC pause time before tuning
G1Before and After Tuning
30
40% > 100ms
8% > 200ms
1% > 300ms11%
26%s
71%s
60% of Pauses Shorter than 100ms
31. G1 Characteristic Before and After Tuning
31
1,114,798
446,482
81,322 51,826
1,694,428
519,062
759,254
76,089 39,855
1,394,260
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
1,600,000
1,800,000
Mixed GC Pauses Young GC Pauses Cleanup Pauses Remark Pauses Total STW Pauses
clusterGCpausetime(seconds) GC type distribution comparison
Before Tuning After Tuning
53%
41%
18%
7% 23%
32. G1GC characteristic before and after tuning
32
1,114,798
446,482
81,322 51,826
1,694,428
519,062
759,254
76,089 39,855
1,394,260
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
1,600,000
1,800,000
Mixed GC Pauses Young GC Pauses Cleanup Pauses Remark Pauses Total STW Pauses
clusterGCpausetime(seconds) GC type distribution comparison
Before Tuning After Tuning
53%
41%
18%
7% 23%
Young GC replacing expensive Mixed GC
33. G1 Characteristic Before and After Tuning
33
1,114,798
446,482
81,322 51,826
1,694,428
519,062
759,254
76,089 39,855
1,394,260
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
1,600,000
1,800,000
Mixed GC Pauses Young GC Pauses Cleanup Pauses Remark Pauses Total STW Pauses
clusterGCpausetime(seconds) GC type distribution comparison
Before Tuning After Tuning
53%
41%
18%
7% 23%
~20% Shorter total application pause
34. 0
100
200
300
400
500
600
700
800
900
1000
0 500 1000 1500 2000 2500 3000 3500 4000
GCpausetime(ms)
Time since Java VM launched (seconds)
GC pause time conparison
G1 after
G1 before
CMS
Comparison
34
100ms
200ms
300ms
35. 0
100
200
300
400
500
600
700
800
900
1000
0 500 1000 1500 2000 2500 3000 3500 4000
GCpausetime(ms)
Time since Java VM launched (seconds)
GC pause time conparison
G1 after
G1 before
CMS
Comparison
35
100ms
200ms
300ms
Less than 1% pauses over 300ms
36. Conclusion
36
CMS behaves well until it doesn’t (stop the world GC lasting 5+ seconds)
JDK8u40 G1 collector is a good CMS alternative for large Java Heaps (100+ GB)
Default G1 settings still do not offer best performance
Requires tuning
Tuned G1 provides low and predictable latencies for HBase running with large heaps
(100+GB)
G1 Collector Constantly Improves!!