SlideShare a Scribd company logo
1 of 58
Freeing the Whale
How to Fail at Scale
oliver gould

cto, buoyant
QConSF, November 9, 2016
from
InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
twitter-finagle
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
2010
A FAILWHALE ODYSSEY
Twitter, 2010
107
users
107
tweets/day
102
engineers
101
ops eng
101
services
101
deploys/week
102
hosts
0 datacenters
101
user-facing outages/week
https://blog.twitter.com/2010/measuring-tweets
objective
reliability
flexibility
objective
reliability
flexibility
solution
platform
SOA + devops

i.e. “microservices”
Resilience is an imperative: our software
runs on the truly dismal computers we call
datacenters. Besides being heinously

complex… they are unreliable and prone to

operator error.
Marius Eriksen
@marius

RPC Redux
software you didn’t write
hardware you can’t touch
network you can’t trace
break in new and surprising ways
and your customers shouldn’t notice
freeing the whale
photo: Johanan Ottensooser
mesos.apache.org
UC Berkeley, 2010
Twitter, 2011
Apache, 2012
Abstracts compute resources
Promise: don’t worry about the hosts
aurora.apache.org
Twitter, 2011
Apache, 2013
Schedules processes on Mesos
Promise: no more puppet, monit, etc
timelines
Aurora (or Marathon, or …)
host
Mesos
host host host host host
users notifications
x800 x300 x1000
timelines
Aurora (or Marathon, or …)
host
Mesos
host host host host
users notifications
x800 x300 x1000
🔥
service discovery
timelines users
zookeeper
create ephemeral
/svc/users/node_012345

{“host”: “host-abc”,“port”: 4321}
service discovery
timelines users
zookeeper
watch /svc/users/*
service discovery
timelines users
zookeeper
GetUser(olix0r)
service discovery
timelines users
zookeeper
uh oh.
GetUser(olix0r)
service discovery
timelines users
zookeeper
client caches results
GetUser(olix0r)
service discovery
timelines users
zookeeper
GetUser(olix0r)
zookeeper serves empty results?!
service discovery
timelines users
zookeeper
service discovery is advisory
GetUser(olix0r)
github.com/twitter/finagle
RPC library (JVM)
asynchronous
built on Netty
scala
functional
strongly typed
first commit: Oct 2010
datacenter
[1] physical
[2] link
[3] network
[4] transport
kubernetes, mesos, swarm, …


canal, weave, …
aws, azure, digitalocean, gce, …
business languages, libraries[7] application
rpc
[5] session
[6] presentation json, protobuf, thrift, …
http/2, mux, …
“It’s slow”

is the hardest problem you’ll ever debug.
Jeff Hodges
@jmhodges

Notes on Distributed Systems for Young Bloods
observability
counters (e.g. client/users/failures)
histograms (e.g. client/users/latency/p99)
tracing
tracing
timeouts & retries
timelines
users
web
db
timeout=400ms
retries=3
timeout=400ms
retries=2
timeout=200ms
retries=3
timelines
users
web
db
timeouts & retries
timelines
users
web
db
timeout=400ms
retries=3
timeout=400ms
retries=2
timeout=200ms
retries=3
timelines
users
web
db
800ms!
600ms!
deadlines
timelines
users
web
db
timeout=400ms
deadline=323ms
deadline=210ms
77ms elapsed
113ms elapsed
retries
typical:
retries=3
retries
typical:
retries=3
worst-case: 300% more load!!!
budgets
typical:
retries=3
better:

retryBudget=20%
worst-case: 300% more load!!!
worst-case: 20% more load
load shedding via cancellation
timelines
users
web
db
timelines
users
web
db
timeout!
load shedding via cancellation
timelines
users
web
db
timelines
users
web
db
timeout!
backpressure
timelines
users
web
db
timelines
users
web
db
1000 requests
100 requests
1000 requests
backpressure
timelines
users
web
db
timelines
users
web
db
1000 failed
💀
1000 failed
backpressure
timelines
users
web
db
100 ok
100 ok
100 ok + 900 failed/redirected/etc
lb algorithms:
• round-robin
• fewest connections
• queue depth
• exponentially-weighted
moving average (ewma)
• aperture
request-level load balancing
So just rewrite everything in Finagle!?
linkerd
github.com/buoyantio/linkerd
service mesh proxy
built on finagle & netty
suuuuper pluggable
http, thrift, …
etcd, consul, kubernetes, marathon,
zookeeper, …
…
Linkers and Loaders, John R. Levine, Academic Press
linker for the datacenter
logical naming
applications refer to
logical names

requests are bound to
concrete names

delegations express
routing
/s/users
/#/io.l5d.zk/prod/users
/#/io.l5d.zk/staging/users
/s => /#/io.l5d.zk/prod
per-request routing: staging
GET / HTTP/1.1

Host: mysite.com

l5d-dtab: /s/B => /s/B2
per-request routing: debug proxy
GET / HTTP/1.1

Host: mysite.com

l5d-dtab: /s/E => /s/P/s/E
linkerd service mesh
transport security
service discovery
circuit breaking
backpressure
deadlines
retries
tracing
metrics
keep-alive
multiplexing
load balancing
per-request routing
service-level objectives
Service B
instance
linkerd
Service C
instance
linkerd
Service A
instance
linkerd
demo: gob’s microservice
web
wordgen
l5d
l5dl5d
web
wordgen
gen-v2
l5d
l5dl5d
l5d
web
wordgen
gen-v2
l5d
l5dl5d
l5d
namerd
github.com/buoyantio/linkerd-examples
linkerd roadmap
• Battle test HTTP/2
• TLS client certs
• Deadlines
• Dark Traffic
• All configurable everything
more at linkerd.io
slack: slack.linkerd.io
email: ver@buoyant.io
twitter:
• @olix0r
• @linkerd
thanks!
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
twitter-finagle

More Related Content

Similar to Freeing the Whale: How to Fail at Scale

Scientific Software: Sustainability, Skills & Sociology
Scientific Software: Sustainability, Skills & SociologyScientific Software: Sustainability, Skills & Sociology
Scientific Software: Sustainability, Skills & Sociology
Neil Chue Hong
 

Similar to Freeing the Whale: How to Fail at Scale (20)

#NetflixEverywhere Global Architecture
#NetflixEverywhere Global Architecture#NetflixEverywhere Global Architecture
#NetflixEverywhere Global Architecture
 
Getting Towards Real Sandbox Containers
Getting Towards Real Sandbox ContainersGetting Towards Real Sandbox Containers
Getting Towards Real Sandbox Containers
 
Forced Evolution: Shopify's Journey to Kubernetes
Forced Evolution: Shopify's Journey to KubernetesForced Evolution: Shopify's Journey to Kubernetes
Forced Evolution: Shopify's Journey to Kubernetes
 
OpenStack Introduction
OpenStack IntroductionOpenStack Introduction
OpenStack Introduction
 
Microservices and the Art of Taming the Dependency Hell Monster
Microservices and the Art of Taming the Dependency Hell MonsterMicroservices and the Art of Taming the Dependency Hell Monster
Microservices and the Art of Taming the Dependency Hell Monster
 
Scaling Push Messaging for Millions of Devices @Netflix
Scaling Push Messaging for Millions of Devices @NetflixScaling Push Messaging for Millions of Devices @Netflix
Scaling Push Messaging for Millions of Devices @Netflix
 
IBM Bluemix OpenWhisk: Cloud Foundry Summit 2016, Frankfurt, Germany: The Fut...
IBM Bluemix OpenWhisk: Cloud Foundry Summit 2016, Frankfurt, Germany: The Fut...IBM Bluemix OpenWhisk: Cloud Foundry Summit 2016, Frankfurt, Germany: The Fut...
IBM Bluemix OpenWhisk: Cloud Foundry Summit 2016, Frankfurt, Germany: The Fut...
 
Solving New School with the Old School (Clojure)
Solving New School with the Old School (Clojure)Solving New School with the Old School (Clojure)
Solving New School with the Old School (Clojure)
 
Microservices: State of the Union
Microservices: State of the UnionMicroservices: State of the Union
Microservices: State of the Union
 
Mobile, Open Source, and the Drive to the Cloud
Mobile, Open Source, and the Drive to the CloudMobile, Open Source, and the Drive to the Cloud
Mobile, Open Source, and the Drive to the Cloud
 
Mobile, Open Source, & the Drive to the Cloud
Mobile, Open Source, & the Drive to the CloudMobile, Open Source, & the Drive to the Cloud
Mobile, Open Source, & the Drive to the Cloud
 
How Netflix Directs 1/3rd of Internet Traffic
How Netflix Directs 1/3rd of Internet TrafficHow Netflix Directs 1/3rd of Internet Traffic
How Netflix Directs 1/3rd of Internet Traffic
 
Developing on OpenStack Startup Edmonton
Developing on OpenStack Startup EdmontonDeveloping on OpenStack Startup Edmonton
Developing on OpenStack Startup Edmonton
 
Patterns of Streaming Applications
Patterns of Streaming ApplicationsPatterns of Streaming Applications
Patterns of Streaming Applications
 
Training Ensimag OpenStack 2016
Training Ensimag OpenStack 2016Training Ensimag OpenStack 2016
Training Ensimag OpenStack 2016
 
Building an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache PulsarBuilding an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache Pulsar
 
Becoming a Fully Buzzword Compliant Developer
Becoming a Fully Buzzword Compliant DeveloperBecoming a Fully Buzzword Compliant Developer
Becoming a Fully Buzzword Compliant Developer
 
Structural Biology in the Clouds: A Success Story of 10 years
Structural Biology in the Clouds: A Success Story of 10 yearsStructural Biology in the Clouds: A Success Story of 10 years
Structural Biology in the Clouds: A Success Story of 10 years
 
Opencast Project Update at Open Apereo 2015
Opencast Project Update at Open Apereo 2015Opencast Project Update at Open Apereo 2015
Opencast Project Update at Open Apereo 2015
 
Scientific Software: Sustainability, Skills & Sociology
Scientific Software: Sustainability, Skills & SociologyScientific Software: Sustainability, Skills & Sociology
Scientific Software: Sustainability, Skills & Sociology
 

More from C4Media

More from C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Freeing the Whale: How to Fail at Scale