This is the second edition of the story about how we struggled to implement strict latency requirements in a service implemented with Java and how we managed to do that.
The most common latency contributors are an in-process locking, thread scheduling, I/O, algorithmic inefficiencies and, of course, garbage collector.
I will share our experience of dealing with the causes. And tell what you can do to prevent them from affecting the production.
National Level Hackathon Participation Certificate.pdf
On the way to low latency (2nd edition)
1. On the way to low
latency
Artem Orobets
Smartling Inc
2. Long story short
We realized that latency is important for us
Our fabulous architecture supposed to work, but it didn’t
The issues that we have faced on the way
3. Those guys consider 10µs
latencies slow
We have only 100ms
threshold
We are not a trading company
21. Context switch problem
• Thread per request doesn’t work
• Too much overhead on context switching
• Too much overhead on memory
Usually a Thread takes memory from 256kb to 1mb for the
stack space!
22. Troubleshooting framework
1. Discovery.
2. Problem Reproduction.
3. Isolate the variables that relate directly to the
problem.
4. Analyze your findings to determine the cause of the
problem.
23. We have have fixed a lot of
things that we believed were
the most problematic parts.
But they weren’t.
35. Sync Async
98.85% <= 1 ms
99.95% <= 7 ms
99.98% <= 13 ms
99.99% <= 15 ms
100.00% <= 18 ms
1658 rps
98.47% <= 2 ms
99.95% <= 10 ms
99.98% <= 16 ms
99.99% <= 17 ms
100.00% <= 18 ms
769.05 rps
Logging
42. Nagle's algorithm
• the "small packet problem”
• TCP packets have a 40 byte header
(20 bytes for TCP, 20 bytes for IPv4)
• combining a number of small outgoing messages,
and sending them all at once
43. • Pauses ~100 ms every couple of hours
• During connection creation
• Doesn’t reproduces on a local setup
54. -XX:+PrintHeapAtGC
Heap after GC invocations=43363
(full 3):
par new generation total 59008K, used
1335K
eden space 52480K, 0%
from space 6528K, 20% used
to space 6528K, 0% used
concurrent mark-sweep generation total
2031616K, used 1830227K
55. -XX:+PrintTenuringDistribution
Desired survivor size 3342336 bytes, new
threshold 2 (max 2)
- age 1: 878568 bytes, 878568 total
- age 2: 1616 bytes, 880184 total
: 53829K->1380K(59008K), 0.0083140 secs]
1884058K->1831609K(2090624K), 0.0084006 secs]
56. A big amount of wrappers
Significant allocation pressure
59. Note: CMS collector on young
generation uses the same algorithm
as that of the parallel collector.
Java GC documentation at oracle.com
* http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html
60. Too many alive objects
during young gen GC
• Minimize survivors
• Watch the tenuring threshold, might need
to tune it to tenure long lived objects faster
• Reduce NewSize
• Reduce survivor spaces