Lightning talk showing various aspectos of software system performance. It goes through: latency, data structures, garbage collection, troubleshooting method like workload saturation method, quick diagnostic tools, famegraph and perfview
4. Core 3 Core 4
Cache L1 Cache L1
Cache L2 Cache L2
Cache L3
Memory
Disk SSD
Network
Core 3 Core 4
Cache L1 Cache L1
Cache L2 Cache L2
Cache L3
Big Picture: latency
Cache Reference 7 s 7 ns
Branch mispredict 5 s 5 ns
Cache Reference 0,5 s 0,5 ns
Memory reference 1min40 100 ns
Compress 1K bytes 50 min 3 us
Read 1 MB sequentially 2.9 days 250 us
Random read 1,7 days 150 us
Read 1 MB sequentially 11,6 days 1 ms
2K bytes over 1 Gbps net 5,5 hours 20 us
Round trip same DC 5,8 days 500 us
CA->Netherlands->CA 4,8 years 150 ms
5. Cache Conscious data structures
• Array
• Sparse array
• Queues built from integer arrays
• Ring buffer
• Zero allocation hashed-wheel timer
• Open addressing hash maps
• ....
6. GC Algorithms
• Serial collector (monothread and stop-the-world)
• -XX:+UseParallelGC to use multithreading during generational GC
• -XX:+UseParallelOldGC to use multithreading during old generation collect
• Concurrent Mark and sweep (-XX:+UseConcMarkSweepGC)
• Act on old gen and does most of its analyse without stopping the world
• consume more CPU
• Young gen collected with parallel algorithm called ParNew
• G1 (Garbage First) For multiprocessor with large memory
• Partition heap in a set of equal sized heap region
• Run concurrently
• Predicted blocking time
7. VM
System Call
System Library
Virtual File System
File System
Volume Manager
Block Device Interface
Sockets
TCP/UDP
IP
Ethernet
Device Driver
Scheduler
Virtual Memory
CPU
DRAM
CPU Interconnect
Memory Bus
I/O Bridge
I/O Controller
Your application
Disk Swap
Network Controller
Port
I/O Bus
Expander
Interconnect
Where is the time spent?
8. Where is the time spent?
Understand
a perf issue
Collect
facts
HypothesisTest
When did it start happening?
What are the symptoms?
What recently changed?
9. Workload characterization method
1.Who is causing the workload? PID, UID, IP, ...
2.Why is the load called? Code path, stack trace, ...
3.What is the load? IOPS, tps, r/w, ...
4.How is the load changing over time?
10. The Utilization Saturation and Error Method
For Every Resource,
check utilization,
saturation and errors