HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase

Taming GC Pauses for Large HBase Heaps
Liqi Yi – Senior Performance Engineer, Intel Corporation
Acknowledging
Eric Kaczmarek – Senior Java Performance Architect, Intel Corporation
Yanping Wang – Java GC Performance Architect, Intel Corporation

Legal Disclaimer
2
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of
any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for
conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with
this information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published
specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual
performance. Consult other sources of information to evaluate performance as you consider your purchase.
For more complete information about performance and benchmark results, visit http://www.intel.com/performance.
Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or
software design or configuration may affect actual performance.
Results have been simulated and are provided for informational purposes only. Results were derived using simulations run on an architecture simulator or
model. Any difference in system hardware or software design or configuration may affect actual performance.
Intel does not control or audit the design or implementation of third party benchmark data or Web sites referenced in this document. Intel encourages all
of its customers to visit the referenced Web sites or others where similar performance benchmark data are reported and confirm whether the referenced
benchmark data are accurate and reflect performance of systems available for purchase.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2015 Intel Corporation. All rights reserved.

Motivation
 HBase SLAs driven by query response times
 Unpredictable and lengthy (seconds+) stop the world garbage collection
burdens applications (resulting in not mission critical use)
 Frequent GC affect throughput
 Growing memory capacity (200+GB) not uncommon
 Need for a more efficient GC algorithm to tame 100GB+ Java Heap
3

Garbage Collectors in JDK 8
 Parallel Compacting Collector
 -XX:+UseParallelOldGC
 Throughput friendly collector
 Concurrent Mark Sweep (CMS) Collector
 -XX:+UseConcMarkSweepGC
 Low latency collector for heap < 32GB
 Garbage First (G1) Collector
 -XX:+UseG1GC
 Low latency collector
4

 -XX:+UseG1GC
5
Maintenance mode

 -XX:+UseG1GC
6
Will be replaced by G1
Maintenance mode

 -XX:+UseG1GC
7
Will be replaced by G1
Maintenance mode
One Garbage Collector To Rule Them All

G1 Collector Promises
 Concurrent marking
 Nature compaction
 Simplified tuning
8

 Low and predictable latency
9

 Low and predictable latency
10
Not yet there, but getting closer…

Garbage First Collector Architecture Overview
• Heap is divided to ~2K non-contiguous regions of eden, survivor,
and old spaces, region size can be 1MB, 2MB, 4MB, 8MB, 16MB,
32MB.
• Humongous regions are old regions for large objects that are larger
than ½ of region size. Humongous objects are stored in contiguous
regions in heap.
• Number of regions in eden and survivor can be changed between
GC’s.
• Young GC: multi-threaded, low stop-the-world pauses
• Concurrent marking and clean up: mostly parallel, multi-threaded
• Mixed GC: multi-threaded, incremental GC, collects and compacts
heap partially, more expensive stop-the-world pauses
• Full GC: Serial, collects and compacts entire heap
11
* Picture from: http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/G1GettingStarted/index.html

Garbage First Collector Parameters
G1GC Diagnostic Flags
 -XX:+PrintFlagsFinal
Prints all JVM runtime flags when JVM starts (no
overhead)
 -XX:+PrintGCDetails
Prints GC status (Must have for GC tuning, low overhead)
 -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
Tracks time stamps for each GC activity (low overhead)
 -XX:+PrintAdaptiveSizePolicy
Prints information every time when GC decides to change
any setting or hits certain conditions
 -XX:+PrintReferenceGC
Prints GC reference processing for each GC
G1GC Performance Flags
 -XX:+UseG1GC -Xms100g –Xmx100g
 -XX:+ParallelRefProcEnabled
Uses Multi-threads in parallel to process references
 -XX:MaxGCPauseMillis=100
Sets low desired GC pause target, the default is 200ms
 -XX:ParallelGCThreads=48
Sets number of Parallel GC threads
Recommended: 8+(#of_logical_processors-8)(5/8)
12

Garbage First Collector Parameters
G1GC Diagnostic Flags
 -XX:+PrintFlagsFinal
Prints all JVM runtime flags when JVM starts (no
overhead)
 -XX:+PrintGCDetails
Prints GC status (Must have for GC tuning, low overhead)
 -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
Tracks time stamps for each GC activity (low overhead)
 -XX:+PrintAdaptiveSizePolicy
Prints information every time when GC decides to change
any setting or hits certain conditions
 -XX:+PrintReferenceGC
Prints GC reference processing for each GC
G1GC Performance Flags
 -XX:+UseG1GC -Xms100g –Xmx100g
 -XX:+ParallelRefProcEnabled
Uses Multi-threads in parallel to process references
 -XX:MaxGCPauseMillis=100
Sets low desired GC pause target, the default is 200ms
 -XX:ParallelGCThreads=48
Sets number of Parallel GC threads
Recommended: 8+(#of_logical_processors-8)(5/8)
13
Ideally, the only option needed

Experiment Environment
 Hardware  Software
14
8 Regionservers
Dual Sockets Xeon E5 2699
v3 @ 2.3GHz
128 GB DDR4 @ 2133 MHz
4X Intel S3700 SSD
10Gbps Ethernet
HBase 0.98.6
Zookeeper 3.4.5
Hadoop 2.5.0
YCSB (Yahoo Cloud Serving
Benchmark) 0.14
Oracle JDK8 update 40

• Regionserver
100GB Java heap
40% BlockCache, 40% Memstore
1 table, 1 column family,
1 billion rows, 1KB each row
170GB table data per regionserver
HDFS replicate factor: 3
Experiment Configuration
• YCSB Client
15
8 separate YCSB clients
50% read 50% write workload
Zipfian row key generator
Cold start each run for one hour
Query rate un-throttled

• Regionserver
100GB Java heap
40% BlockCache, 40% Memstore
1 table, 1 column family,
1 billion rows, 1KB each row
170GB table data per regionserver
HDFS replicate factor: 3
Experiment Configuration
• YCSB Client
16
8 separate YCSB clients
50% read 50% write workload
Zipfian row key generator
Cold start each run for one hour
Query rate un-throttled
50% read 50% write worst offender

CMS Baseline
17
0
100
200
300
400
500
600
700
800
900
1000
0 500 1000 1500 2000 2500 3000 3500 4000
GCpausetime(ms)
Time since the JVM launched
CMS GC pause time series4.96 seconds 2.6 seconds
10 > 400ms

CMS Baseline
18
0
100
200
300
400
500
600
700
800
900
1000
0 500 1000 1500 2000 2500 3000 3500 4000
GCpausetime(ms)
CMS GC pause time series4.96 seconds 2.6 seconds
10 > 400ms
Yes! your application just paused for
5seconds

0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000 3500 4000
GCpausetime(ms)
G1GC pause time series
G1 Baseline
19
71% > 100ms
26% > 200ms
11% > 300ms
3 pauses ~400ms
-XX:+UseG1GC -Xms100g –Xmx100g -XX:MaxGCPauseMillis=100

How-to Improve Garbage Collection Behavior
1. Enable GC logging
2. Look for outliers (long pauses) in the logs
3. Understand the root of long GC pauses
4. Tune GC command line to avoid or alleviate the symptoms
5. Examine logs and repeat at step #2 (multiple iterations)
20

Examining Logs – Sample log snippet #1
21
2957.020: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate old regions available, candidate old regions: 597
regions, reclaimable: 5504622248 bytes (5.13 %), threshold: 5.00 %], 0.3933224 secs]
[Parallel Time: 353.2 ms, GC Workers: 48]
[GC Worker Start (ms): Min: 2956632.9, Avg: 2956633.2, Max: 2956633.4, Diff: 0.6]
[Ext Root Scanning (ms): Min: 0.5, Avg: 0.7, Max: 1.5, Diff: 1.1, Sum: 35.3]
[Update RS (ms): Min: 7.5, Avg: 8.6, Max: 9.7, Diff: 2.2, Sum: 411.7]
[Processed Buffers: Min: 4, Avg: 6.9, Max: 14, Diff: 10, Sum: 333]
[Scan RS (ms): Min: 70.5, Avg: 71.5, Max: 73.9, Diff: 3.4, Sum: 3432.5]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.8]
[Object Copy (ms): Min: 269.6, Avg: 271.3, Max: 272.3, Diff: 2.7, Sum: 13024.7]
…
[Eden: 4480.0M(4480.0M)->0.0B(4480.0M) Survivors: 640.0M->640.0M Heap: 65.6G(100.0G)->61.5G(100.0G)]
[Times: user=17.28 sys=0.04, real=0.40 secs]
400 ms GC pause
It is a mixed GC
Scan RS and copy
costs are expensive
4GB freed

22
2942.508: [G1Ergonomics (Mixed GCs) continue mixed GCs, reason: candidate old regions available, candidate old regions: 1401
regions, reclaimable: 15876399368 bytes (14.79 %), threshold: 5.00 %], 0.1843897 secs]
[Parallel Time: 160.3 ms, GC Workers: 48]
[GC Worker Start (ms): Min: 2942325.1, Avg: 2942325.4, Max: 2942325.7, Diff: 0.6]
[Ext Root Scanning (ms): Min: 0.4, Avg: 0.8, Max: 1.8, Diff: 1.3, Sum: 38.0]
[Update RS (ms): Min: 6.9, Avg: 7.8, Max: 9.2, Diff: 2.3, Sum: 374.9]
[Processed Buffers: Min: 4, Avg: 6.7, Max: 11, Diff: 7, Sum: 320]
[Scan RS (ms): Min: 26.1, Avg: 27.1, Max: 27.7, Diff: 1.6, Sum: 1299.1]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.2, Diff: 0.2, Sum: 0.3]
[Object Copy (ms): Min: 123.2, Avg: 123.8, Max: 124.4, Diff: 1.1, Sum: 5942.0]
…
[Times: user=6.89 sys=1.17, real=0.19 secs]
190 ms GC pause
7GB freed
Mixed GC 15 seconds before
Scan RS and copy
are much better

Root Causes and Possible Solutions
 Mixed GC takes much more time to free up less wasted regions
 Not cost efficient
 Heap is not full, could leave it for next collection cycle
 Exclude most expensive mixed GC by increasing waste percent
 Increase –XX:G1HeapWastePercent to 20% ( default 5%)
23

24
…
…
…
30GB heap is unused
because concurrent marking
phase started too early

Root Cause and Solution
 Concurrent marking started when heap still has sufficient free space
 Causing extra GC pauses
 Extra copying for objects die in near future
 Goal is to increase heap usage
 Increase “InitiatingHeapOccupancyPercent”, so more heap can be used
 Increase “ConcGCThreads”, so concurrent marking phase can be
completed early enough to avoid full GC
25

G1 Parameters Tuning
26
-XX:+UseG1GC -XX:MaxGCPauseMillis=100
-XX:G1HeapWastePercent=20 -XX:InitiatingHeapOccupancyPercent=75
-XX:ConcGCThreads=32 -XX:ParallelGCThreads=48
-XX:InitialHeapSize=107374182400 -XX:MaxHeapSize=107374182400
-XX:+ParallelRefProcEnabled
-XX:+PrintAdaptiveSizePolicy -XX:+PrintFlagsFinal -XX:+PrintGC
-XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintReferenceGC

GC log After Tuning – Sample log snippet
27
…
…
…
Heap usage increased from
70GB to 80GB without Full GC

0
50
100
150
200
250
300
350
400
0 500 1000 1500 2000 2500 3000 3500 4000
GCpausetime(ms)
G1GC pause time series
G1 After Tuning
28
40% > 100ms
8% > 200ms
1% > 300ms
1 pause 340ms
2189 GC pauses, average 105ms， 99percentile 310ms

0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000 3500 4000
G1GC pause time after tuning
0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000 3500 4000
GCpausetime(ms)
G1GC pause time before tuning
G1Before and After Tuning
29
40% > 100ms
8% > 200ms
1% > 300ms11%
26%s
71%s

0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000 3500 4000
G1GC pause time after tuning
0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000 3500 4000
GCpausetime(ms)
G1GC pause time before tuning
G1Before and After Tuning
30
40% > 100ms
8% > 200ms
1% > 300ms11%
26%s
71%s
60% of Pauses Shorter than 100ms

G1 Characteristic Before and After Tuning
31
1,114,798
446,482
81,322 51,826
1,694,428
519,062
759,254
76,089 39,855
1,394,260
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
1,600,000
1,800,000
Mixed GC Pauses Young GC Pauses Cleanup Pauses Remark Pauses Total STW Pauses
clusterGCpausetime(seconds) GC type distribution comparison
Before Tuning After Tuning
53%
41%
18%
7% 23%

G1GC characteristic before and after tuning
32
1,114,798
446,482
81,322 51,826
1,694,428
519,062
759,254
76,089 39,855
1,394,260
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
1,600,000
1,800,000
53%
41%
18%
7% 23%
Young GC replacing expensive Mixed GC

G1 Characteristic Before and After Tuning
33
1,114,798
446,482
81,322 51,826
1,694,428
519,062
759,254
76,089 39,855
1,394,260
0
200,000
400,000
600,000
800,000
1,000,000
1,200,000
1,400,000
1,600,000
1,800,000
53%
41%
18%
7% 23%
~20% Shorter total application pause

0
100
200
300
400
500
600
700
800
900
1000
0 500 1000 1500 2000 2500 3000 3500 4000
GCpausetime(ms)
Time since Java VM launched (seconds)
GC pause time conparison
G1 after
G1 before
CMS
Comparison
34
100ms
200ms
300ms

0
100
200
300
400
500
600
700
800
900
1000
0 500 1000 1500 2000 2500 3000 3500 4000
GCpausetime(ms)
Time since Java VM launched (seconds)
GC pause time conparison
G1 after
G1 before
CMS
Comparison
35
100ms
200ms
300ms
Less than 1% pauses over 300ms

Conclusion
36
 CMS behaves well until it doesn’t (stop the world GC lasting 5+ seconds)
 JDK8u40 G1 collector is a good CMS alternative for large Java Heaps (100+ GB)
 Default G1 settings still do not offer best performance
 Requires tuning
 Tuned G1 provides low and predictable latencies for HBase running with large heaps
(100+GB)
G1 Collector Constantly Improves!!

Contacts
 Eric Kaczmarek
eric.kaczmarek@intel.com
 Yanping Wang
yanping.wang@intel.com
Twitter: @YanWang6
 Liqi Yi
liqi.yi@intel.com
Twitter: @yi_liqi
37

Additional resources
 https://blogs.oracle.com/g1gc/
 http://www.infoq.com/articles/G1-One-Garbage-Collector-To-Rule-Them-All
 http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/G1GettingStart
ed/index.html#ClearCT
 https://blogs.oracle.com/g1gc/entry/g1gc_logs_how_to_print
 http://blog.cloudera.com/blog/2014/12/tuning-java-garbage-collection-for-
hbase/
38

HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase

HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase

Similar to HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase (20)

More from HBaseCon

More from HBaseCon (20)

Recently uploaded

Recently uploaded (20)

HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase