SlideShare a Scribd company logo
1 of 36
Download to read offline
The fastest NoSQL database! 
! 
Talking about Go Performance! 
! 
Try it while I blab !! 
github.com/aerospike/aerospike-server! 
github.com/aerospike/aerospike-client-go!
Who am I ? 
Brian Bulkowski! 
brian@bulkowski.org! 
brian@aerospike.com! 
@bbulkow! 
TRS-80, PC, Apple II, Vax 11/70, Wang 
First product: lightpen university teaching kiosk 
Palo Alto High School ( ‘85 ) 
Liberate / NetComputer through the boom 
10B market cap in 1999, employee 32 
2003-2007 “time off” ( startups ) 
Citrusleaf / Aerospike history 
42 year old first-time CEO (me) 
2008 Prototype 
2010 First sales “get the band back together” 
2011+ 3 rounds of funding (Draper, ALP, NEA, CNTP) 
70 employees, 2 offices
Does brian know performance? 
Brian Bulkowski! 
brian@bulkowski.org! 
brian@aerospike.com! 
@bbulkow! 
Undergrad project: image converter 
Single pass arbitrary scale and rotate w/ nyquist filters 
Novell 
Fastest Appletalk server + router available 
Starlight Networks 
150Mb/sec video server on P133 
Liberate 
HTML technology for embedded systems 
Aggregate Knowledge 
Realtime reccommendations: 2x faster in first week 
Aerospike 
10x faster than existing NoSQL, 100x faster than RDBMs
Internet Technology Stack 
MILLIONS OF CONSUMERS 
BILLIONS OF DEVICES 
APP SERVERS 
DATA 
WRITE CONTEXT 
INSIGHTS WAREHOUSE 
In-memory NoSQL 
WRITE REAL-TIME CONTEXT 
READ RECENT CONTENT 
PROFILE STORE 
Cookies, email, deviceID, IP address, location, 
segments, clicks, likes, tweets, search terms... 
REAL-TIME ANALYTICS 
Best sellers, top scores, trending tweets 
BATCH ANALYTICS 
Discover patterns, 
segment data: location 
patterns, audience 
affinity
Who uses Aerospike? 
theTradeDesk 
… to name a few!
Aerospike is High Performance 
1700000 
1600000 
1500000 
1400000 
1300000 
1200000 
1100000 
1000000 
900000 
800000 
700000 
600000 
500000 
400000 
300000 
200000 
100000 
0 
Balanced 
Read-Heavy 
Aerospike 3 (in-memory) 
Aerospike 3 (persistent) 
Aerospike 2 
Cassandra 
MongoDB 
Couchbase 1.8 
Couchbase 2.0
Easy Clients ( better than JSON ) 
Python! 
Go!
Also, analytics 
http://www.aerospike.com/community/labs/!
If it is so good, why haven't I heard of it? 
Established in 2009 (newer than most) 
Used in Advertising – ad exchanges, data exchanges, 
targeting, real-time bidding, real-time attribution. 
Open Sourced in June 2014
When should I use Aerospike? 
Redis, but with scale & flash 
Cassandra, but fast 
User data, session data, behavior, fraud… 
API billing ~ retail actions ~ recommendations 
Up and running in 10 minutes! 
( vagrant, EC2 …)!
Why does Aerospike care about Go? 
It’s cool ! 
Promises performance with expressive 
( as an old C guy, Go is aimed at me ) 
Our customers are diving in, deploying 
What about (other versions of other languages)… 
( sure, they’re cool too! ) 
Go!
Let’s talk about…. 
Some old microbenchmarks 
Profilers, how to run it 
War story: optimizing our Go client 
( sure, we know Go isn’t JUST about performance )
Old Microbenchmark 
In Nov 22 2009, I posted to Golang Nuts
Old Microbenchmark 
Seconds (Nov 2009) 
1.1 - 
python (CPython 2.6.2, the distro release with no tweaks) " 
4.6 - 
go (current hg release) " 
4.2 - 
ruby 1.8 (distro release) " 
1.1 - 
ruby 1.9 (distro release) 
Pike said: " 
I suspect the great majority of the time in your benchmark is due to Go's current 
rudimentary garbage collector. Tests like this generate a lot of garbage that is 
collected slowly. From experiments I've done, a better implementation can make a 
huge difference. Profiling this test shows at least 50% of the time is in the allocator 
and collector, as opposed to about 5% printing the string and less than 15% in the 
map code. A better allocator and collector would make a dramatic change. " 
" 
The short answer: the Go runtime is new and completely untuned. The libraries 
need work too.
Microbenchmark 
“T1” 
for i := 0; i < 1000000; i++ { 
x = ( 2 * x ) + x + 1 
} 
1.96 s (big integer only) Python 
1.04 ms (2.17s big.Int) Go 
5 ms (2.15s BigNum) Java 
Good news: go is right in the hunt, but easier to code 
Amazon m3.xlarge (4 core E3@2.5Ghz)" 
Python 2.6.9" 
Go 1.3.3" 
Java 1.7.0_71" 
Amazon Linux (3.16)
Microbenchmarks 
T5 – the 2009 benchmark 
12.5 sec Python 
12.56 sec Go 
2.56 sec Java 
Good news: not slower than python! 
Bad news: Holy Crap compared to Java 
Amazon m3.xlarge (4 core E3@2.5Ghz)" 
Python 2.6.9" 
Go 1.3.3" 
Java 1.7.0_71" 
Amazon Linux (3.16)
Microbenchmarks – the old code 
T5 – the 2009 benchmark (slower CPU) 
for x := 0; x < 1000000; x++ { 
a := make(map[int] string); 
for a1 := 0; a1 < 50; a1++ { 
a[a1] = strconv.Itoa(a1); 
} 
} 
12.56 seconds 
Amazon m3.xlarge (4 core E3@2.5Ghz)" 
Python 2.6.9" 
Go 1.3.3" 
Java 1.7.0_71" 
Amazon Linux (3.16)
Microbenchmarks – tune the map 
T5 – the 2009 benchmark 
for x := 0; x < 1000000; x++ { 
a := make(map[int] string, 50); 
for a1 := 0; a1 < 50; a1++ { 
a[a1] = strconv.Itoa(a1); 
} 
} 
7.80 seconds 
Amazon m3.xlarge (4 core E3@2.5Ghz)" 
Python 2.6.9" 
Go 1.3.3" 
Java 1.7.0_71" 
Amazon Linux (3.16)
Microbenchmarks – remove the Itoa 
T5 – the 2009 benchmark 
for x := 0; x < 1000000; x++ { 
a := make(map[int] string, 50); 
for a1 := 0; a1 < 50; a1++ { 
a[a1] = "123456”; 
} 
} 
5.45 seconds 
Amazon m3.xlarge (4 core E3@2.5Ghz)" 
Python 2.6.9" 
Go 1.3.3" 
Java 1.7.0_71" 
Amazon Linux (3.16)
Microbenchmarks – singleton Map 
T5 – the 2009 benchmark 
a := make(map[int] string, 50); 
for x := 0; x < 1000000; x++ { 
// a := make(map[int] string, 50); 
for a1 := 0; a1 < 50; a1++ { 
a[a1] = "123456”; 
} 
} 
2.03 seconds ! Finally better than Java ! 
Amazon m3.xlarge (4 core E3@2.5Ghz)" 
Python 2.6.9" 
Go 1.3.3" 
Java 1.7.0_71" 
Amazon Linux (3.16)
Microbenchmarks – Java 
T5 – the 2009 benchmark 
for (int x=0; x < 1000000; x++) { 
HashMap<Integer, String> a = new HashMap<Integer, String>(); 
for (int a1=0; a1 < 50; a1++) { 
a.put(a1, Integer.toString(a1) ); 
} 
} 
2.56 seconds 
Amazon m3.xlarge (4 core E3@2.5Ghz)" 
Python 2.6.9" 
Go 1.3.3" 
Java 1.7.0_71" 
Amazon Linux (3.16)
Any ideas? 
( I haven’t figured it out yet )
Next microbenchmarks ! 
Float, String 
Go Channels vs Java Futures 
… couldn’t code the java part in time! 
Simple TCP echo, but with transactions 
Log processing 
Ruby 2.1, Go 1.4… 
Your votes ?
Profilers 
pprof is pretty great! 
Import in all your main’s, does not seem to hurt 
import _ "net/http/pprof” 
Add the HTTP listener ( only on flag ) 
// launch http pprof listener if in profile mode 
if *profileMode { 
go func() { 
log.Println(http.ListenAndServe("localhost:6060", nil)) 
}() 
}
Profilers 
Take a 30 second snapshot 
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=xx 
pprof prompt: ‘top 10’ 
(pprof) top 10 
Total: 3852 samples 
1187 30.8% 30.8% 1254 32.6% syscall.Syscall 
304 7.9% 38.7% 304 7.9% ExternalCode 
172 4.5% 43.2% 175 4.5% github.com/aerospike/ 
aerospike-client-go/pkg/ripemd160._Block 
137 3.6% 46.7% 233 6.0% runtime.mallocgc 
98 2.5% 49.3% 98 2.5% runtime.futex 
79 2.1% 51.3% 86 2.2% runtime.MSpan_Sweep 
77 2.0% 53.3% 77 2.0% scanblock 
68 1.8% 55.1% 68 1.8% runtime.xchg 
46 1.2% 56.3% 46 1.2% runtime.epollwait
(pprof) web Profilers
Profilers 
Good old ‘oprofile’, let’s not forget it –--- 
( especially if you can get kernel symbols, hard ) 
sudo yum -y install oprofile 
Start capturing 
sudo opcontrol --reset 
sudo opcontrol --no-vmlinux 
sudo opcontrol –start 
Run your program 
sudo opcontrol --dump 
sudo opcontrol --shutdown 
Dump your result 
sudo opreport -l --demangle=smart --debug-info 
Cheat Sheet http://www.bonsai.com/wiki/howtos/tuning/oprofile/
Profilers 
opreport 
samples % linenr info image name app name symbol name 
28106 56.5877 (no location information) no-vmlinux no-vmlinux /no-vmlinux 
6216 12.5151 rand.go:76 benchmark benchmark math/rand.(*Rand).Int31n 
3940 7.9327 rng.go:232 benchmark benchmark math/rand.(*rngSource).Int63 
1987 4.0006 benchmark.go:255 benchmark benchmark main.randString 
1584 3.1892 rand.go:43 benchmark benchmark math/rand.(*Rand).Int63 
1465 2.9496 rand.go:93 benchmark benchmark math/rand.(*Rand).Intn 
1421 2.8610 rand.go:49 benchmark benchmark math/rand.(*Rand).Int31 
354 0.7127 ripemd160block.go:45 benchmark benchmark github.com/aerospike/aerosp 
ike-client-go/pkg/ripemd160._Block 
349 0.7027 mgc0.c:720 benchmark benchmark scanblock 
307 0.6181 malloc.goc:40 benchmark benchmark runtime.mallocgc 
205 0.4127 mgc0.c:1783 benchmark benchmark runtime.MSpan_Sweep 
138 0.2778 memmove_amd64.s:33 benchmark benchmark runtime.memmove 
131 0.2638 asm_amd64.s:600 benchmark benchmark runtime.xchg
Tuning the Aerospike Client 
What does the client do?! 
! 
Maintain the DHT state! 
! 
Keep a connection pool! 
! 
Make requests to the right servers! 
! 
Box / unbox to wire protocol…! 
SIMPLE
Tuning the Aerospike Client 
Attempt 1: run pprof! 
! 
The usual dance of making life! 
easy for the garbage collector ! 
(just like java)! 
! 
pprof worked!! 
the hot objects showed up! 
! 
Cache easily with Sized Channels !!!!
Tuning the Aerospike Client 
Attempt 2: oprofile! 
! 
oprofile found rand() taking time! 
! 
Optimization gave nothing! 
! 
… not sure why not …! 
! 
Currently happy with throughput!
Tuning the Aerospike Client 
Latency problem at customer site !! 
! 
User validating a server install with a quick Go client! 
“17 ms average latency @ 20K TPS” --- terrible!! 
! 
Server measured at 0.4 ms @ 40k TPS, ! 
-- ping ok! 
-- it’s the client! 
! 
Where’s the latency source? GC? Green Threads? Network?! 
-- Profile shows low GC load! 
-- Hard to measure thread latency! 
EC2 m3.xlarge ($0.05/hr)! 
4 core E5-2670 @ 2.5 Ghz! 
Bare metal vs Virtual! 
Centos 6 vs Latest Kernel! 
Intel SSDs vs RAM!
Tuning the Aerospike Client 
GO! 
! 
! 
Java! 
! 
!
What happened? 
• Not sure what happened at deployment ! 
(yet, suspect old kernel)! 
• A week lost by developers using MacOS, Laptop! 
(MacOS is showing bad latency)! 
• C code is running slower – we think it’s random fill of buffer! 
• Lesson: just switch to Linux 3.12-ish kernels! 
• Lesson: fewer lines ~ 11k Go, 17k Java! 
• Lesson: for network / IO, these languages are THE SAME !
Golang Performance : microbenchmarks, profilers, and a war story

More Related Content

What's hot

PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat Pôle Systematic Paris-Region
 
Developing high-performance network servers in Lisp
Developing high-performance network servers in LispDeveloping high-performance network servers in Lisp
Developing high-performance network servers in LispVladimir Sedach
 
Владимир Перепелица "Модули"
Владимир Перепелица "Модули"Владимир Перепелица "Модули"
Владимир Перепелица "Модули"Media Gorod
 
Logstash family introduction
Logstash family introductionLogstash family introduction
Logstash family introductionOwen Wu
 
Profiling and optimizing go programs
Profiling and optimizing go programsProfiling and optimizing go programs
Profiling and optimizing go programsBadoo Development
 
A Recovering Java Developer Learns to Go
A Recovering Java Developer Learns to GoA Recovering Java Developer Learns to Go
A Recovering Java Developer Learns to GoMatt Stine
 
Groovy & Grails: Scripting for Modern Web Applications
Groovy & Grails: Scripting for Modern Web ApplicationsGroovy & Grails: Scripting for Modern Web Applications
Groovy & Grails: Scripting for Modern Web Applicationsrohitnayak
 
The Parenscript Common Lisp to JavaScript compiler
The Parenscript Common Lisp to JavaScript compilerThe Parenscript Common Lisp to JavaScript compiler
The Parenscript Common Lisp to JavaScript compilerVladimir Sedach
 
Ansible not only for Dummies
Ansible not only for DummiesAnsible not only for Dummies
Ansible not only for DummiesŁukasz Proszek
 
Building Observable Applications w/ Node.js -- BayNode Meetup, March 2014
Building Observable Applications w/ Node.js -- BayNode Meetup, March 2014Building Observable Applications w/ Node.js -- BayNode Meetup, March 2014
Building Observable Applications w/ Node.js -- BayNode Meetup, March 2014Yunong Xiao
 
Build microservice with gRPC in golang
Build microservice with gRPC in golangBuild microservice with gRPC in golang
Build microservice with gRPC in golangTing-Li Chou
 
From zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and ElasticsearchFrom zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and ElasticsearchRafał Kuć
 
Coding in the context era
Coding in the context eraCoding in the context era
Coding in the context eralestrrat
 
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)Tim Bunce
 
How to Begin Developing Ruby Core
How to Begin Developing Ruby CoreHow to Begin Developing Ruby Core
How to Begin Developing Ruby CoreHiroshi SHIBATA
 

What's hot (20)

Go memory
Go memoryGo memory
Go memory
 
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
PyParis 2017 / Writing a C Python extension in 2017, Jean-Baptiste Aviat
 
Developing high-performance network servers in Lisp
Developing high-performance network servers in LispDeveloping high-performance network servers in Lisp
Developing high-performance network servers in Lisp
 
Владимир Перепелица "Модули"
Владимир Перепелица "Модули"Владимир Перепелица "Модули"
Владимир Перепелица "Модули"
 
Logstash family introduction
Logstash family introductionLogstash family introduction
Logstash family introduction
 
Profiling and optimizing go programs
Profiling and optimizing go programsProfiling and optimizing go programs
Profiling and optimizing go programs
 
A Recovering Java Developer Learns to Go
A Recovering Java Developer Learns to GoA Recovering Java Developer Learns to Go
A Recovering Java Developer Learns to Go
 
Groovy & Grails: Scripting for Modern Web Applications
Groovy & Grails: Scripting for Modern Web ApplicationsGroovy & Grails: Scripting for Modern Web Applications
Groovy & Grails: Scripting for Modern Web Applications
 
The Parenscript Common Lisp to JavaScript compiler
The Parenscript Common Lisp to JavaScript compilerThe Parenscript Common Lisp to JavaScript compiler
The Parenscript Common Lisp to JavaScript compiler
 
Ansible not only for Dummies
Ansible not only for DummiesAnsible not only for Dummies
Ansible not only for Dummies
 
Go Memory
Go MemoryGo Memory
Go Memory
 
On Centralizing Logs
On Centralizing LogsOn Centralizing Logs
On Centralizing Logs
 
Docker and Fluentd
Docker and FluentdDocker and Fluentd
Docker and Fluentd
 
Building Observable Applications w/ Node.js -- BayNode Meetup, March 2014
Building Observable Applications w/ Node.js -- BayNode Meetup, March 2014Building Observable Applications w/ Node.js -- BayNode Meetup, March 2014
Building Observable Applications w/ Node.js -- BayNode Meetup, March 2014
 
Build microservice with gRPC in golang
Build microservice with gRPC in golangBuild microservice with gRPC in golang
Build microservice with gRPC in golang
 
From zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and ElasticsearchFrom zero to hero - Easy log centralization with Logstash and Elasticsearch
From zero to hero - Easy log centralization with Logstash and Elasticsearch
 
Coding in the context era
Coding in the context eraCoding in the context era
Coding in the context era
 
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)
Devel::NYTProf v3 - 200908 (OUTDATED, see 201008)
 
How we use Twisted in Launchpad
How we use Twisted in LaunchpadHow we use Twisted in Launchpad
How we use Twisted in Launchpad
 
How to Begin Developing Ruby Core
How to Begin Developing Ruby CoreHow to Begin Developing Ruby Core
How to Begin Developing Ruby Core
 

Viewers also liked

Profiling go code a beginners tutorial
Profiling go code   a beginners tutorialProfiling go code   a beginners tutorial
Profiling go code a beginners tutorialSamuel Lampa
 
Android is going to Go! Android and Golang
Android is going to Go! Android and GolangAndroid is going to Go! Android and Golang
Android is going to Go! Android and GolangAlmog Baku
 
Lua: the world's most infuriating language
Lua: the world's most infuriating languageLua: the world's most infuriating language
Lua: the world's most infuriating languagejgrahamc
 
Know about SDN and NFV
Know about SDN and NFVKnow about SDN and NFV
Know about SDN and NFVKedar Raval
 
Scala Json Features and Performance
Scala Json Features and PerformanceScala Json Features and Performance
Scala Json Features and PerformanceJohn Nestor
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 

Viewers also liked (7)

Profiling go code a beginners tutorial
Profiling go code   a beginners tutorialProfiling go code   a beginners tutorial
Profiling go code a beginners tutorial
 
Android is going to Go! Android and Golang
Android is going to Go! Android and GolangAndroid is going to Go! Android and Golang
Android is going to Go! Android and Golang
 
Lua: the world's most infuriating language
Lua: the world's most infuriating languageLua: the world's most infuriating language
Lua: the world's most infuriating language
 
Know about SDN and NFV
Know about SDN and NFVKnow about SDN and NFV
Know about SDN and NFV
 
Scala Json Features and Performance
Scala Json Features and PerformanceScala Json Features and Performance
Scala Json Features and Performance
 
An Introduction to Python Concurrency
An Introduction to Python ConcurrencyAn Introduction to Python Concurrency
An Introduction to Python Concurrency
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 

Similar to Golang Performance : microbenchmarks, profilers, and a war story

Joblib: Lightweight pipelining for parallel jobs (v2)
Joblib:  Lightweight pipelining for parallel jobs (v2)Joblib:  Lightweight pipelining for parallel jobs (v2)
Joblib: Lightweight pipelining for parallel jobs (v2)Marcel Caraciolo
 
OSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
OSMC 2012 | Neues in Nagios 4.0 by Andreas EricssonOSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
OSMC 2012 | Neues in Nagios 4.0 by Andreas EricssonNETWAYS
 
node.js, javascript and the future
node.js, javascript and the futurenode.js, javascript and the future
node.js, javascript and the futureJeff Miccolis
 
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, SematextTuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, SematextLucidworks
 
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayQuantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayPhil Estes
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-optJeff Larkin
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalDatabricks
 
How Many Slaves (Ukoug)
How Many Slaves (Ukoug)How Many Slaves (Ukoug)
How Many Slaves (Ukoug)Doug Burns
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterDatabricks
 
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...Flink Forward
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopDatabricks
 
Library Operating System for Linux #netdev01
Library Operating System for Linux #netdev01Library Operating System for Linux #netdev01
Library Operating System for Linux #netdev01Hajime Tazaki
 
ITB2019 Real World Scenarios for Modern CFML - Nolan Erck
ITB2019 Real World Scenarios for Modern CFML - Nolan ErckITB2019 Real World Scenarios for Modern CFML - Nolan Erck
ITB2019 Real World Scenarios for Modern CFML - Nolan ErckOrtus Solutions, Corp
 
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...Herman Wu
 
2019 hashiconf consul-templaterb
2019 hashiconf consul-templaterb2019 hashiconf consul-templaterb
2019 hashiconf consul-templaterbPierre Souchay
 

Similar to Golang Performance : microbenchmarks, profilers, and a war story (20)

Serial-War
Serial-WarSerial-War
Serial-War
 
Joblib: Lightweight pipelining for parallel jobs (v2)
Joblib:  Lightweight pipelining for parallel jobs (v2)Joblib:  Lightweight pipelining for parallel jobs (v2)
Joblib: Lightweight pipelining for parallel jobs (v2)
 
OSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
OSMC 2012 | Neues in Nagios 4.0 by Andreas EricssonOSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
OSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
 
node.js, javascript and the future
node.js, javascript and the futurenode.js, javascript and the future
node.js, javascript and the future
 
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, SematextTuning Solr for Logs: Presented by Radu Gheorghe, Sematext
Tuning Solr for Logs: Presented by Radu Gheorghe, Sematext
 
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayQuantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare Metal
 
How Many Slaves (Ukoug)
How Many Slaves (Ukoug)How Many Slaves (Ukoug)
How Many Slaves (Ukoug)
 
Elasticwulf Pycon Talk
Elasticwulf Pycon TalkElasticwulf Pycon Talk
Elasticwulf Pycon Talk
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
 
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
 
Library Operating System for Linux #netdev01
Library Operating System for Linux #netdev01Library Operating System for Linux #netdev01
Library Operating System for Linux #netdev01
 
ITB2019 Real World Scenarios for Modern CFML - Nolan Erck
ITB2019 Real World Scenarios for Modern CFML - Nolan ErckITB2019 Real World Scenarios for Modern CFML - Nolan Erck
ITB2019 Real World Scenarios for Modern CFML - Nolan Erck
 
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
 
2019 hashiconf consul-templaterb
2019 hashiconf consul-templaterb2019 hashiconf consul-templaterb
2019 hashiconf consul-templaterb
 

Recently uploaded

Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 

Recently uploaded (20)

Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 

Golang Performance : microbenchmarks, profilers, and a war story

  • 1. The fastest NoSQL database! ! Talking about Go Performance! ! Try it while I blab !! github.com/aerospike/aerospike-server! github.com/aerospike/aerospike-client-go!
  • 2. Who am I ? Brian Bulkowski! brian@bulkowski.org! brian@aerospike.com! @bbulkow! TRS-80, PC, Apple II, Vax 11/70, Wang First product: lightpen university teaching kiosk Palo Alto High School ( ‘85 ) Liberate / NetComputer through the boom 10B market cap in 1999, employee 32 2003-2007 “time off” ( startups ) Citrusleaf / Aerospike history 42 year old first-time CEO (me) 2008 Prototype 2010 First sales “get the band back together” 2011+ 3 rounds of funding (Draper, ALP, NEA, CNTP) 70 employees, 2 offices
  • 3. Does brian know performance? Brian Bulkowski! brian@bulkowski.org! brian@aerospike.com! @bbulkow! Undergrad project: image converter Single pass arbitrary scale and rotate w/ nyquist filters Novell Fastest Appletalk server + router available Starlight Networks 150Mb/sec video server on P133 Liberate HTML technology for embedded systems Aggregate Knowledge Realtime reccommendations: 2x faster in first week Aerospike 10x faster than existing NoSQL, 100x faster than RDBMs
  • 4. Internet Technology Stack MILLIONS OF CONSUMERS BILLIONS OF DEVICES APP SERVERS DATA WRITE CONTEXT INSIGHTS WAREHOUSE In-memory NoSQL WRITE REAL-TIME CONTEXT READ RECENT CONTENT PROFILE STORE Cookies, email, deviceID, IP address, location, segments, clicks, likes, tweets, search terms... REAL-TIME ANALYTICS Best sellers, top scores, trending tweets BATCH ANALYTICS Discover patterns, segment data: location patterns, audience affinity
  • 5. Who uses Aerospike? theTradeDesk … to name a few!
  • 6. Aerospike is High Performance 1700000 1600000 1500000 1400000 1300000 1200000 1100000 1000000 900000 800000 700000 600000 500000 400000 300000 200000 100000 0 Balanced Read-Heavy Aerospike 3 (in-memory) Aerospike 3 (persistent) Aerospike 2 Cassandra MongoDB Couchbase 1.8 Couchbase 2.0
  • 7. Easy Clients ( better than JSON ) Python! Go!
  • 9. If it is so good, why haven't I heard of it? Established in 2009 (newer than most) Used in Advertising – ad exchanges, data exchanges, targeting, real-time bidding, real-time attribution. Open Sourced in June 2014
  • 10. When should I use Aerospike? Redis, but with scale & flash Cassandra, but fast User data, session data, behavior, fraud… API billing ~ retail actions ~ recommendations Up and running in 10 minutes! ( vagrant, EC2 …)!
  • 11. Why does Aerospike care about Go? It’s cool ! Promises performance with expressive ( as an old C guy, Go is aimed at me ) Our customers are diving in, deploying What about (other versions of other languages)… ( sure, they’re cool too! ) Go!
  • 12. Let’s talk about…. Some old microbenchmarks Profilers, how to run it War story: optimizing our Go client ( sure, we know Go isn’t JUST about performance )
  • 13. Old Microbenchmark In Nov 22 2009, I posted to Golang Nuts
  • 14. Old Microbenchmark Seconds (Nov 2009) 1.1 - python (CPython 2.6.2, the distro release with no tweaks) " 4.6 - go (current hg release) " 4.2 - ruby 1.8 (distro release) " 1.1 - ruby 1.9 (distro release) Pike said: " I suspect the great majority of the time in your benchmark is due to Go's current rudimentary garbage collector. Tests like this generate a lot of garbage that is collected slowly. From experiments I've done, a better implementation can make a huge difference. Profiling this test shows at least 50% of the time is in the allocator and collector, as opposed to about 5% printing the string and less than 15% in the map code. A better allocator and collector would make a dramatic change. " " The short answer: the Go runtime is new and completely untuned. The libraries need work too.
  • 15. Microbenchmark “T1” for i := 0; i < 1000000; i++ { x = ( 2 * x ) + x + 1 } 1.96 s (big integer only) Python 1.04 ms (2.17s big.Int) Go 5 ms (2.15s BigNum) Java Good news: go is right in the hunt, but easier to code Amazon m3.xlarge (4 core E3@2.5Ghz)" Python 2.6.9" Go 1.3.3" Java 1.7.0_71" Amazon Linux (3.16)
  • 16. Microbenchmarks T5 – the 2009 benchmark 12.5 sec Python 12.56 sec Go 2.56 sec Java Good news: not slower than python! Bad news: Holy Crap compared to Java Amazon m3.xlarge (4 core E3@2.5Ghz)" Python 2.6.9" Go 1.3.3" Java 1.7.0_71" Amazon Linux (3.16)
  • 17. Microbenchmarks – the old code T5 – the 2009 benchmark (slower CPU) for x := 0; x < 1000000; x++ { a := make(map[int] string); for a1 := 0; a1 < 50; a1++ { a[a1] = strconv.Itoa(a1); } } 12.56 seconds Amazon m3.xlarge (4 core E3@2.5Ghz)" Python 2.6.9" Go 1.3.3" Java 1.7.0_71" Amazon Linux (3.16)
  • 18. Microbenchmarks – tune the map T5 – the 2009 benchmark for x := 0; x < 1000000; x++ { a := make(map[int] string, 50); for a1 := 0; a1 < 50; a1++ { a[a1] = strconv.Itoa(a1); } } 7.80 seconds Amazon m3.xlarge (4 core E3@2.5Ghz)" Python 2.6.9" Go 1.3.3" Java 1.7.0_71" Amazon Linux (3.16)
  • 19. Microbenchmarks – remove the Itoa T5 – the 2009 benchmark for x := 0; x < 1000000; x++ { a := make(map[int] string, 50); for a1 := 0; a1 < 50; a1++ { a[a1] = "123456”; } } 5.45 seconds Amazon m3.xlarge (4 core E3@2.5Ghz)" Python 2.6.9" Go 1.3.3" Java 1.7.0_71" Amazon Linux (3.16)
  • 20. Microbenchmarks – singleton Map T5 – the 2009 benchmark a := make(map[int] string, 50); for x := 0; x < 1000000; x++ { // a := make(map[int] string, 50); for a1 := 0; a1 < 50; a1++ { a[a1] = "123456”; } } 2.03 seconds ! Finally better than Java ! Amazon m3.xlarge (4 core E3@2.5Ghz)" Python 2.6.9" Go 1.3.3" Java 1.7.0_71" Amazon Linux (3.16)
  • 21. Microbenchmarks – Java T5 – the 2009 benchmark for (int x=0; x < 1000000; x++) { HashMap<Integer, String> a = new HashMap<Integer, String>(); for (int a1=0; a1 < 50; a1++) { a.put(a1, Integer.toString(a1) ); } } 2.56 seconds Amazon m3.xlarge (4 core E3@2.5Ghz)" Python 2.6.9" Go 1.3.3" Java 1.7.0_71" Amazon Linux (3.16)
  • 22. Any ideas? ( I haven’t figured it out yet )
  • 23. Next microbenchmarks ! Float, String Go Channels vs Java Futures … couldn’t code the java part in time! Simple TCP echo, but with transactions Log processing Ruby 2.1, Go 1.4… Your votes ?
  • 24. Profilers pprof is pretty great! Import in all your main’s, does not seem to hurt import _ "net/http/pprof” Add the HTTP listener ( only on flag ) // launch http pprof listener if in profile mode if *profileMode { go func() { log.Println(http.ListenAndServe("localhost:6060", nil)) }() }
  • 25. Profilers Take a 30 second snapshot go tool pprof http://localhost:6060/debug/pprof/profile?seconds=xx pprof prompt: ‘top 10’ (pprof) top 10 Total: 3852 samples 1187 30.8% 30.8% 1254 32.6% syscall.Syscall 304 7.9% 38.7% 304 7.9% ExternalCode 172 4.5% 43.2% 175 4.5% github.com/aerospike/ aerospike-client-go/pkg/ripemd160._Block 137 3.6% 46.7% 233 6.0% runtime.mallocgc 98 2.5% 49.3% 98 2.5% runtime.futex 79 2.1% 51.3% 86 2.2% runtime.MSpan_Sweep 77 2.0% 53.3% 77 2.0% scanblock 68 1.8% 55.1% 68 1.8% runtime.xchg 46 1.2% 56.3% 46 1.2% runtime.epollwait
  • 27.
  • 28. Profilers Good old ‘oprofile’, let’s not forget it –--- ( especially if you can get kernel symbols, hard ) sudo yum -y install oprofile Start capturing sudo opcontrol --reset sudo opcontrol --no-vmlinux sudo opcontrol –start Run your program sudo opcontrol --dump sudo opcontrol --shutdown Dump your result sudo opreport -l --demangle=smart --debug-info Cheat Sheet http://www.bonsai.com/wiki/howtos/tuning/oprofile/
  • 29. Profilers opreport samples % linenr info image name app name symbol name 28106 56.5877 (no location information) no-vmlinux no-vmlinux /no-vmlinux 6216 12.5151 rand.go:76 benchmark benchmark math/rand.(*Rand).Int31n 3940 7.9327 rng.go:232 benchmark benchmark math/rand.(*rngSource).Int63 1987 4.0006 benchmark.go:255 benchmark benchmark main.randString 1584 3.1892 rand.go:43 benchmark benchmark math/rand.(*Rand).Int63 1465 2.9496 rand.go:93 benchmark benchmark math/rand.(*Rand).Intn 1421 2.8610 rand.go:49 benchmark benchmark math/rand.(*Rand).Int31 354 0.7127 ripemd160block.go:45 benchmark benchmark github.com/aerospike/aerosp ike-client-go/pkg/ripemd160._Block 349 0.7027 mgc0.c:720 benchmark benchmark scanblock 307 0.6181 malloc.goc:40 benchmark benchmark runtime.mallocgc 205 0.4127 mgc0.c:1783 benchmark benchmark runtime.MSpan_Sweep 138 0.2778 memmove_amd64.s:33 benchmark benchmark runtime.memmove 131 0.2638 asm_amd64.s:600 benchmark benchmark runtime.xchg
  • 30. Tuning the Aerospike Client What does the client do?! ! Maintain the DHT state! ! Keep a connection pool! ! Make requests to the right servers! ! Box / unbox to wire protocol…! SIMPLE
  • 31. Tuning the Aerospike Client Attempt 1: run pprof! ! The usual dance of making life! easy for the garbage collector ! (just like java)! ! pprof worked!! the hot objects showed up! ! Cache easily with Sized Channels !!!!
  • 32. Tuning the Aerospike Client Attempt 2: oprofile! ! oprofile found rand() taking time! ! Optimization gave nothing! ! … not sure why not …! ! Currently happy with throughput!
  • 33. Tuning the Aerospike Client Latency problem at customer site !! ! User validating a server install with a quick Go client! “17 ms average latency @ 20K TPS” --- terrible!! ! Server measured at 0.4 ms @ 40k TPS, ! -- ping ok! -- it’s the client! ! Where’s the latency source? GC? Green Threads? Network?! -- Profile shows low GC load! -- Hard to measure thread latency! EC2 m3.xlarge ($0.05/hr)! 4 core E5-2670 @ 2.5 Ghz! Bare metal vs Virtual! Centos 6 vs Latest Kernel! Intel SSDs vs RAM!
  • 34. Tuning the Aerospike Client GO! ! ! Java! ! !
  • 35. What happened? • Not sure what happened at deployment ! (yet, suspect old kernel)! • A week lost by developers using MacOS, Laptop! (MacOS is showing bad latency)! • C code is running slower – we think it’s random fill of buffer! • Lesson: just switch to Linux 3.12-ish kernels! • Lesson: fewer lines ~ 11k Go, 17k Java! • Lesson: for network / IO, these languages are THE SAME !