SlideShare a Scribd company logo
1 of 145
What Etsy Learned
Building a Distributed
Tracing System
Paul Wright
Core Platform, Etsy
@wrighty
What is Etsy?
Etsy is the marketplace we
make together
(It’s a busy marketplace)
Over 1 Million Sellers
$1.35 Billion GMS* in 2013
*Gross Merchandise Sales
Spoilers
What did we learn?
How did we build it?
What did we learn?
Why did we need it?
How did we build it?
What did we learn?
Why did we need it?
How did we build it?
What did we learn?
First, some context
Aversion to Services*
* With exceptions
(search, payments, photos)
Single Threaded
Short Lived
Shared Nothing
This raises a problem:
Sequential Execution
Application complexity
dictates lower bound of
response time
True for both
website and API
(API powers native
mobile apps)
How to balance this tension
with constraints of the
stack?
Single Threaded
Short Lived
Shared Nothing
The somewhat tangential
organisational problem:
Native mobile apps always
play catch up to web
Feature development:
web-first,
mobile-sometime-maybe
Increasingly disconnected
from reality
Need an approach to
unify platforms
The answer:
Keep doing what we
understand
HTTP calls, but
asynchronously
Build a lightweight API
framework to support this
approach
(Borrow API design ideas
from Netflix)
New API offers
two types of endpoint:
Component and Bespoke
Component:
Generic resources for all
clients
Bespoke:
Custom endpoints for
specific client
Both can make concurrent
HTTP calls to other
Component endpoints
Advocate for API-First
mindset across
organisation
New programming model
Bespoke
Homepage
Render
Homepage
APIWeb
Bespoke
Homepage
Activity
Feed
New
Favorites
Our Picks
For You
Sidebar
Render
Homepage
APIWeb
Bespoke
Homepage
Activity
Feed
New
Favorites
Our Picks
For You
Sidebar
Blog
Post
Featured
Shop
Render
Homepage
APIWeb
Bespoke
Homepage
Activity
Feed
New
Favorites
Our Picks
For You
Sidebar
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Blog
Post
Featured
Shop
Render
Homepage
APIWeb
May have accidentally built
a distributed RPC system
Single Threaded
Short Lived
Shared Nothing
Many Single Threaded
Many Short Lived
Still Shared Nothing?
Existing introspection tools
built for a simpler time
Need a way to visualise
execution tree
Why did we need it?
How did we build it?
What did we learn?
First step: Research
Short answer: read Dapper
http://research.google.com/pubs/pub36356.html
Slightly longer answer:
Read Dapper, study Zipkin,
start prototyping
Build vs. Buy
Build vs. Buy
(or Adopt)
In this instance, we decided
to build
Distributed Tracing
Overview
All implementations
operate on a trace
A trace is a tree of execution
Each tree node is a span
A span is a series of
observations about a single
service call
Bespoke
Homepage
Activity
Feed
New
Favorites
Our Picks
For You
Sidebar
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Blog
Post
Featured
Shop
Render
Homepage
APIWeb
Bespoke
Homepage
Activity
Feed
New
Favorites
Our Picks
For You
Sidebar
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Blog
Post
Featured
Shop
Render
Homepage
API
Web
Story Card
Story Card
Story Card
Story Card
Bespoke Homepage
Activity Feed
New Favorites
Our Picks For You
Sidebar
Blog Post
Featured Shop
Render Homepage
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
Story Card
0ms 330ms
What did we build?
Instrumentation
Data Collection
Data Retrieval
Visualisation
Request tree construction
Clients pass down tree
location data
All non-HTTP service calls
are instrumented
only from client’s POV
Instrumentation
Data Collection
Data Retrieval
Visualisation
Client Send
Server Receive
Server Respond
Client Receive
Annotate server execution
for high level context
Server execution resulting
in low-level service calls is
handled by child spans
Terminating annotations
trigger span logging
Instrumentation
Data Collection
Data Retrieval
Visualisation
logstash elasticsearch
Ajax API
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
Web node
logstash elasticsearch
Ajax API
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
Web node
json
msgpack
gzip
base64
logstash elasticsearch
Ajax API
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
Web node
logstash elasticsearch
Ajax API
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
Web node
logstash elasticsearch
Ajax API
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
Web node
logstash elasticsearch
Ajax API
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
Web node
logstash elasticsearch
Ajax API
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
httpd /
PHP
httpd /
PHP
httpd /
PHP
httpd /
PHP
cross_stitch_spans.log
Web node
Instrumentation
Data Collection
Data Retrieval
Visualisation
Why did we need it?
How did we build it?
What did we learn?
There’s a number of moving
parts, but this is not
rocket science
Most of the code is specific
to your environment
(Hence I’m not announcing
any open source release)
But that code is simple to
write, and can be done
piecemeal
New internal tools are no
different from new
products
There’s an adoption curve,
it’s your job to accelerate it
First version was
end-to-end complete
Get cross-discipline input
Treat it like product
development
Observing both client and
server POVs is powerful
Network latency is easier to
observe
Areas of few observations
are interesting
Shared nothing isn’t free
But caching is cheap
There will be WTFs
You won’t always
solve them
Doing the simplest thing
possible works
Hacky logging runs for 0.1%
of all requests
Future directions
BIG DATA
Hybrid API client
Instrumenting other
services from server POV
Sample more requests -
maybe 100%?
Why did we need it?
How did we build it?
What did we learn?
Thanks!
@wrighty

More Related Content

Viewers also liked

BlaBlaCar - Going Native !
BlaBlaCar - Going Native ! BlaBlaCar - Going Native !
BlaBlaCar - Going Native ! Erwann Robin
 
BlaBlaCar and infrastructure automation
BlaBlaCar and infrastructure automationBlaBlaCar and infrastructure automation
BlaBlaCar and infrastructure automationsinfomicien
 
Meetic Backend Mutation With Symfony
Meetic Backend Mutation With SymfonyMeetic Backend Mutation With Symfony
Meetic Backend Mutation With SymfonymeeticTech
 
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracingsoasme
 
Tracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracing
Tracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracingTracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracing
Tracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracingYuri Shkuro
 
Microservice, Micro Deployments and DevOps
Microservice, Micro Deployments and DevOpsMicroservice, Micro Deployments and DevOps
Microservice, Micro Deployments and DevOpsAlois Reitbauer
 
Setting up Kubernetes with tectonic
Setting up Kubernetes with tectonicSetting up Kubernetes with tectonic
Setting up Kubernetes with tectonicVishal Biyani
 
Microservices & API Gateways
Microservices & API Gateways Microservices & API Gateways
Microservices & API Gateways Kong Inc.
 
Microservices Tracing with Spring Cloud and Zipkin
Microservices Tracing with Spring Cloud and ZipkinMicroservices Tracing with Spring Cloud and Zipkin
Microservices Tracing with Spring Cloud and ZipkinMarcin Grzejszczak
 
Transition Agile @ Meetic
Transition Agile @ MeeticTransition Agile @ Meetic
Transition Agile @ MeeticmeeticTech
 
From Microliths To Microsystems
From Microliths To MicrosystemsFrom Microliths To Microsystems
From Microliths To MicrosystemsJonas Bonér
 
Debian usage at BlaBlaCar - Debian Paris meetup
Debian usage at BlaBlaCar - Debian Paris meetupDebian usage at BlaBlaCar - Debian Paris meetup
Debian usage at BlaBlaCar - Debian Paris meetupJean Baptiste Favre
 

Viewers also liked (14)

BlaBlaCar - Going Native !
BlaBlaCar - Going Native ! BlaBlaCar - Going Native !
BlaBlaCar - Going Native !
 
BlaBlaCar and infrastructure automation
BlaBlaCar and infrastructure automationBlaBlaCar and infrastructure automation
BlaBlaCar and infrastructure automation
 
Meetic Backend Mutation With Symfony
Meetic Backend Mutation With SymfonyMeetic Backend Mutation With Symfony
Meetic Backend Mutation With Symfony
 
Kong
KongKong
Kong
 
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracing
 
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracing
 
Tracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracing
Tracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracingTracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracing
Tracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracing
 
Microservice, Micro Deployments and DevOps
Microservice, Micro Deployments and DevOpsMicroservice, Micro Deployments and DevOps
Microservice, Micro Deployments and DevOps
 
Setting up Kubernetes with tectonic
Setting up Kubernetes with tectonicSetting up Kubernetes with tectonic
Setting up Kubernetes with tectonic
 
Microservices & API Gateways
Microservices & API Gateways Microservices & API Gateways
Microservices & API Gateways
 
Microservices Tracing with Spring Cloud and Zipkin
Microservices Tracing with Spring Cloud and ZipkinMicroservices Tracing with Spring Cloud and Zipkin
Microservices Tracing with Spring Cloud and Zipkin
 
Transition Agile @ Meetic
Transition Agile @ MeeticTransition Agile @ Meetic
Transition Agile @ Meetic
 
From Microliths To Microsystems
From Microliths To MicrosystemsFrom Microliths To Microsystems
From Microliths To Microsystems
 
Debian usage at BlaBlaCar - Debian Paris meetup
Debian usage at BlaBlaCar - Debian Paris meetupDebian usage at BlaBlaCar - Debian Paris meetup
Debian usage at BlaBlaCar - Debian Paris meetup
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

CrossStitch: What Etsy Learned Building a Distributed Tracing System (from Surge Conference 2014)