SlideShare a Scribd company logo
1 of 43
Download to read offline
Everything You Wanted to
Know About Distributed Tracing.
Strange Loop Conference
September 2019
Who Am I
● Hungai Amuhinda
● Twitter: @Hungai
● Nairobi, Kenya
● Work: Ajua, Infra Team
● Website: hungaikev.in
How to use this talk
● Please don’t just follow it blindly: There are often times when you will need
to do things differently.
● Software is all about trade-offs: Very few decisions are about right and
wrong.
● Try things for yourself: Why not make Friday afternoons a time to play and
experiment?
Software Evolution.
● Monolith
● On Prem
● Single Language
● Single Stack
● Virtual Machines
Software Evolution.
● Microservices
● Containers
● Multi Cloud/ Hybrid
● Polyglot
● Containers
● Serverless/ Cloud Functions
New Architectures/ New Challenges.
● Observability
● Deployment / Packaging
● Configuration Management
● Debugging
● Secrets Management
Meet: Distributed Tracing
“Distributed Tracing, also called distributed request tracing, is a method used to
profile and monitor applications, especially those built using a microservices
architecture. Distributed tracing helps pinpoint where failures occur and what
causes poor performance.”
Distributed Tracing - Terminology
Trace - a trace is a tree of spans that
follows the course of a request or
system from its source to its ultimate
destination.
Each trace is a narrative that tells the
requests story as it travels through the
system.
Distributed Tracing - Terminology
Span - are logical units of work in a
distributed system. They all have a
name, a start time, and a duration.
Each Span captures important data
points specific to the current process
handling the request.
Distributed Tracing - Terminology
Context Propagation:
Incoming
Request
Distributed Tracing - Terminology
Context Propagation:
Incoming
Request
trace-id = 123
parent-d = nil
span-id = 1
Distributed Tracing - Terminology
Context Propagation:
Incoming
Request
trace-id = 123
parent-d = nil
span-id = 1
Outbound
Request
trace-id = 123
parent-d = 1
span-id = 2
Distributed Tracing - Terminology
Tags & Logs: both annotate the span with some contextual information.
● Tags typically apply to the whole span, while logs represent some events that
happened during the span execution.
● A log always has a timestamp that falls within the span's start-end time interval.
● The tracing system does not explicitly track causality between logged events the
way it keeps track of causality relationships between spans, because it can be
inferred from the timestamps.
What questions can tracing help us answer?
Distributed Tracing:
● What services did a request pass through?
Distributed Tracing:
● What services did a request pass through?
● What occured in each service for a given request?
Distributed Tracing:
● What services did a request pass through?
● What occured in each service for a given request?
● Where did the error happen?
Distributed Tracing:
● What services did a request pass through?
● What occured in each service for a given request?
● Where did the error happen?
● Where are the bottlenecks?
Distributed Tracing:
● What services did a request pass through?
● What occured in each service for a given request?
● Where did the error happen?
● Where are the bottlenecks?
● What is the critical path for a request?
Distributed Tracing:
● What services did a request pass through?
● What occured in each service for a given request?
● Where did the error happen?
● Where are the bottlenecks?
● What is the critical path for a request?
● Who should I page?
If tracing is so good why isn’t everyone using it?
If tracing is so good why isn’t everyone using it?
● Not much education or not many publicized case studies on
the benefits.
If tracing is so good why isn’t everyone using it?
● Not much education or not many publicized case studies on
the benefits.
● Vendor Lock in is unacceptable: Instrumentation must be
decoupled from vendors
If tracing is so good why isn’t everyone using it?
● Not much education or not many publicized case studies on
the benefits.
● Vendor Lock in is unacceptable: Instrumentation must be
decoupled from vendors .
● Inconsistent APIs: Tracing semantics must not be language
dependent.
If tracing is so good why isn’t everyone using it?
● Not much education or not many publicized case studies on
the benefits.
● Vendor Lock in is unacceptable: Instrumentation must be
decoupled from vendors .
● Inconsistent APIs: Tracing semantics must not be language
dependent.
● Handoff woes: Tracing libs in Project X do not handoff to
tracing libs in Project Y.
Meet OpenTelemetry
OpenTelemetry
Open Telemetry is made up of an integrated set of APIs
and libraries as well as a collection mechanism via a agent
and collector. These components are used to generate,
collect, and describe telemetry about distributed
systems.
Problems OpenTelemetry solves:
● Vendor neutrality for tracing, monitoring and
logging
● Context Propagation.
OpenTelemetry (opentelemetry.io) Is:
● Single set of APIs for tracing and metrics collection.
● Standardized Context Propagation.
● Exporters for sending data to backend of choice.
● Collector for smart traces & metrics aggregation.
● Integrations with popular web, RPC and storage
frameworks.
OpenTelemetry (opentelemetry.io) Is:
Next major version of the OpenTracing and OpenCensus projects.
+ =
OpenTelemetry Roadmap:
+
Announcement: https://medium.com/opentracing/merging-opentracing-and-opencensus-f0fe9c7ca6f0
Roadmap: https://medium.com/opentracing/a-roadmap-to-convergence-b074e5815289
Tracing with OpenTelemetry - The Options
Agentless Using an Agent
OpenTelemetry: How to get Involved
Github: https://github.com/open-telemetry
Gitter: https://gitter.im/open-telemetry
Languages:
● .NET SDK
● GoLang SDK
● Java SDK
● JavaScript SDK
● Python SDK
● Ruby SIG
● Erlang/Elixir SDK
EXAMPLE
Create a Tracer
Create a Tracer
Tracers
Instrument
Instrument
Introducing Jaeger:
● Open source distributed tracing platform.
● Inspired by Google Dapper and OpenZipkin
● Created by Uber in 2015 and donated to CNCF in 2017.
● Compliant with both OpenTracing and OpenCensus.
● Supports multiple storage options (Cassandra, ElasticSearch, In-Memory)
● Compatible with Apache Kafka for backpressure management.
DEMO
obitech/micro-obs
Conclusion
● Tracing is crucial for understanding complex, microservices applications.
● Distributed tracing provides a base view of the system that can
drastically shorten feedback loops and the number of people involved
incidents.
● Tracing provides much more context, allowing an on call responder to
better understand the system and get further on their own before
involving more people.
Thank You
Twitter: @Hungai
Email: hungaikevin@gmail.com
Everything You wanted to Know About Distributed Tracing

More Related Content

What's hot

Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Eric D. Schabell
 

What's hot (20)

Observability in Java: Getting Started with OpenTelemetry
Observability in Java: Getting Started with OpenTelemetryObservability in Java: Getting Started with OpenTelemetry
Observability in Java: Getting Started with OpenTelemetry
 
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
Adopting Open Telemetry as Distributed Tracer on your Microservices at Kubern...
 
Meetup OpenTelemetry Intro
Meetup OpenTelemetry IntroMeetup OpenTelemetry Intro
Meetup OpenTelemetry Intro
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
 
Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...
Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...
Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...Understand your system like never before with OpenTelemetry, Grafana, and Pro...
Understand your system like never before with OpenTelemetry, Grafana, and Pro...
 
OpenTelemetry Introduction
OpenTelemetry Introduction OpenTelemetry Introduction
OpenTelemetry Introduction
 
Exploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on KubernetesExploring the power of OpenTelemetry on Kubernetes
Exploring the power of OpenTelemetry on Kubernetes
 
Observability
ObservabilityObservability
Observability
 
Observability
ObservabilityObservability
Observability
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
 
Juraci Paixão Kröhling - All you need to know about OpenTelemetry
Juraci Paixão Kröhling - All you need to know about OpenTelemetryJuraci Paixão Kröhling - All you need to know about OpenTelemetry
Juraci Paixão Kröhling - All you need to know about OpenTelemetry
 
Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...Intro to open source observability with grafana, prometheus, loki, and tempo(...
Intro to open source observability with grafana, prometheus, loki, and tempo(...
 
Distributed Tracing with Jaeger
Distributed Tracing with JaegerDistributed Tracing with Jaeger
Distributed Tracing with Jaeger
 
MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)MeetUp Monitoring with Prometheus and Grafana (September 2018)
MeetUp Monitoring with Prometheus and Grafana (September 2018)
 
Platform engineering 101
Platform engineering 101Platform engineering 101
Platform engineering 101
 
Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2Distributed tracing using open tracing & jaeger 2
Distributed tracing using open tracing & jaeger 2
 
Kubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory GuideKubernetes for Beginners: An Introductory Guide
Kubernetes for Beginners: An Introductory Guide
 
KCD-OpenTelemetry.pdf
KCD-OpenTelemetry.pdfKCD-OpenTelemetry.pdf
KCD-OpenTelemetry.pdf
 

Similar to Everything You wanted to Know About Distributed Tracing

Blockade.io : One Click Browser Defense
Blockade.io : One Click Browser DefenseBlockade.io : One Click Browser Defense
Blockade.io : One Click Browser Defense
RiskIQ, Inc.
 
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
AgileNetwork
 
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Chris Hammerschmidt
 

Similar to Everything You wanted to Know About Distributed Tracing (20)

Introduction to the open rights group censorship monitoring project
Introduction to the open rights group censorship monitoring projectIntroduction to the open rights group censorship monitoring project
Introduction to the open rights group censorship monitoring project
 
Observability in highly distributed systems
Observability in highly distributed systemsObservability in highly distributed systems
Observability in highly distributed systems
 
Blockade.io : One Click Browser Defense
Blockade.io : One Click Browser DefenseBlockade.io : One Click Browser Defense
Blockade.io : One Click Browser Defense
 
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
Agile Gurugram 2023 | Observability for Modern Applications. How does it help...
 
Go Observability (in practice)
Go Observability (in practice)Go Observability (in practice)
Go Observability (in practice)
 
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
#OSSPARIS19 - How to improve database observability - CHARLES JUDITH, Criteo
 
Distributed tracing
Distributed tracingDistributed tracing
Distributed tracing
 
The Final Frontier, Automating Dynamic Security Testing
The Final Frontier, Automating Dynamic Security TestingThe Final Frontier, Automating Dynamic Security Testing
The Final Frontier, Automating Dynamic Security Testing
 
Tenants for Going at DevSecOps Speed - LASCON 2023
Tenants for Going at DevSecOps Speed - LASCON 2023Tenants for Going at DevSecOps Speed - LASCON 2023
Tenants for Going at DevSecOps Speed - LASCON 2023
 
DevOps State of the Union 2015
DevOps State of the Union 2015DevOps State of the Union 2015
DevOps State of the Union 2015
 
Observability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptxObservability for Application Developers (1)-1.pptx
Observability for Application Developers (1)-1.pptx
 
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
Your Testing Is Flawed: Introducing A New Open Source Tool For Accurate Kuber...
 
The working architecture of NodeJS applications, Виктор Турский
The working architecture of NodeJS applications, Виктор ТурскийThe working architecture of NodeJS applications, Виктор Турский
The working architecture of NodeJS applications, Виктор Турский
 
The working architecture of node js applications open tech week javascript ...
The working architecture of node js applications   open tech week javascript ...The working architecture of node js applications   open tech week javascript ...
The working architecture of node js applications open tech week javascript ...
 
BSIT3CD_Continuation of Cyber incident response (1).pdf
BSIT3CD_Continuation of Cyber incident response (1).pdfBSIT3CD_Continuation of Cyber incident response (1).pdf
BSIT3CD_Continuation of Cyber incident response (1).pdf
 
Orchestration, Automation and Virtualisation (OAV) in GÉANT
Orchestration, Automation and Virtualisation (OAV) in GÉANTOrchestration, Automation and Virtualisation (OAV) in GÉANT
Orchestration, Automation and Virtualisation (OAV) in GÉANT
 
Bringing it all together
Bringing it all togetherBringing it all together
Bringing it all together
 
LF Energy Webinar: Introduction to TROLIE
LF Energy Webinar: Introduction to TROLIELF Energy Webinar: Introduction to TROLIE
LF Energy Webinar: Introduction to TROLIE
 
Cynthia Wu: Satisfaction Not Guaranteed
Cynthia Wu: Satisfaction Not GuaranteedCynthia Wu: Satisfaction Not Guaranteed
Cynthia Wu: Satisfaction Not Guaranteed
 
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
Machine Learning for (DF)IR with Velociraptor: From Setting Expectations to a...
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 

Recently uploaded (20)

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 

Everything You wanted to Know About Distributed Tracing

  • 1. Everything You Wanted to Know About Distributed Tracing. Strange Loop Conference September 2019
  • 2. Who Am I ● Hungai Amuhinda ● Twitter: @Hungai ● Nairobi, Kenya ● Work: Ajua, Infra Team ● Website: hungaikev.in
  • 3. How to use this talk ● Please don’t just follow it blindly: There are often times when you will need to do things differently. ● Software is all about trade-offs: Very few decisions are about right and wrong. ● Try things for yourself: Why not make Friday afternoons a time to play and experiment?
  • 4. Software Evolution. ● Monolith ● On Prem ● Single Language ● Single Stack ● Virtual Machines
  • 5. Software Evolution. ● Microservices ● Containers ● Multi Cloud/ Hybrid ● Polyglot ● Containers ● Serverless/ Cloud Functions
  • 6. New Architectures/ New Challenges. ● Observability ● Deployment / Packaging ● Configuration Management ● Debugging ● Secrets Management
  • 7. Meet: Distributed Tracing “Distributed Tracing, also called distributed request tracing, is a method used to profile and monitor applications, especially those built using a microservices architecture. Distributed tracing helps pinpoint where failures occur and what causes poor performance.”
  • 8. Distributed Tracing - Terminology Trace - a trace is a tree of spans that follows the course of a request or system from its source to its ultimate destination. Each trace is a narrative that tells the requests story as it travels through the system.
  • 9. Distributed Tracing - Terminology Span - are logical units of work in a distributed system. They all have a name, a start time, and a duration. Each Span captures important data points specific to the current process handling the request.
  • 10. Distributed Tracing - Terminology Context Propagation: Incoming Request
  • 11. Distributed Tracing - Terminology Context Propagation: Incoming Request trace-id = 123 parent-d = nil span-id = 1
  • 12. Distributed Tracing - Terminology Context Propagation: Incoming Request trace-id = 123 parent-d = nil span-id = 1 Outbound Request trace-id = 123 parent-d = 1 span-id = 2
  • 13. Distributed Tracing - Terminology Tags & Logs: both annotate the span with some contextual information. ● Tags typically apply to the whole span, while logs represent some events that happened during the span execution. ● A log always has a timestamp that falls within the span's start-end time interval. ● The tracing system does not explicitly track causality between logged events the way it keeps track of causality relationships between spans, because it can be inferred from the timestamps.
  • 14. What questions can tracing help us answer?
  • 15. Distributed Tracing: ● What services did a request pass through?
  • 16. Distributed Tracing: ● What services did a request pass through? ● What occured in each service for a given request?
  • 17. Distributed Tracing: ● What services did a request pass through? ● What occured in each service for a given request? ● Where did the error happen?
  • 18. Distributed Tracing: ● What services did a request pass through? ● What occured in each service for a given request? ● Where did the error happen? ● Where are the bottlenecks?
  • 19. Distributed Tracing: ● What services did a request pass through? ● What occured in each service for a given request? ● Where did the error happen? ● Where are the bottlenecks? ● What is the critical path for a request?
  • 20. Distributed Tracing: ● What services did a request pass through? ● What occured in each service for a given request? ● Where did the error happen? ● Where are the bottlenecks? ● What is the critical path for a request? ● Who should I page?
  • 21. If tracing is so good why isn’t everyone using it?
  • 22. If tracing is so good why isn’t everyone using it? ● Not much education or not many publicized case studies on the benefits.
  • 23. If tracing is so good why isn’t everyone using it? ● Not much education or not many publicized case studies on the benefits. ● Vendor Lock in is unacceptable: Instrumentation must be decoupled from vendors
  • 24. If tracing is so good why isn’t everyone using it? ● Not much education or not many publicized case studies on the benefits. ● Vendor Lock in is unacceptable: Instrumentation must be decoupled from vendors . ● Inconsistent APIs: Tracing semantics must not be language dependent.
  • 25. If tracing is so good why isn’t everyone using it? ● Not much education or not many publicized case studies on the benefits. ● Vendor Lock in is unacceptable: Instrumentation must be decoupled from vendors . ● Inconsistent APIs: Tracing semantics must not be language dependent. ● Handoff woes: Tracing libs in Project X do not handoff to tracing libs in Project Y.
  • 27. OpenTelemetry Open Telemetry is made up of an integrated set of APIs and libraries as well as a collection mechanism via a agent and collector. These components are used to generate, collect, and describe telemetry about distributed systems. Problems OpenTelemetry solves: ● Vendor neutrality for tracing, monitoring and logging ● Context Propagation.
  • 28. OpenTelemetry (opentelemetry.io) Is: ● Single set of APIs for tracing and metrics collection. ● Standardized Context Propagation. ● Exporters for sending data to backend of choice. ● Collector for smart traces & metrics aggregation. ● Integrations with popular web, RPC and storage frameworks.
  • 29. OpenTelemetry (opentelemetry.io) Is: Next major version of the OpenTracing and OpenCensus projects. + =
  • 31. Tracing with OpenTelemetry - The Options Agentless Using an Agent
  • 32. OpenTelemetry: How to get Involved Github: https://github.com/open-telemetry Gitter: https://gitter.im/open-telemetry Languages: ● .NET SDK ● GoLang SDK ● Java SDK ● JavaScript SDK ● Python SDK ● Ruby SIG ● Erlang/Elixir SDK
  • 39. Introducing Jaeger: ● Open source distributed tracing platform. ● Inspired by Google Dapper and OpenZipkin ● Created by Uber in 2015 and donated to CNCF in 2017. ● Compliant with both OpenTracing and OpenCensus. ● Supports multiple storage options (Cassandra, ElasticSearch, In-Memory) ● Compatible with Apache Kafka for backpressure management.
  • 41. Conclusion ● Tracing is crucial for understanding complex, microservices applications. ● Distributed tracing provides a base view of the system that can drastically shorten feedback loops and the number of people involved incidents. ● Tracing provides much more context, allowing an on call responder to better understand the system and get further on their own before involving more people.
  • 42. Thank You Twitter: @Hungai Email: hungaikevin@gmail.com