SlideShare a Scribd company logo
1 of 33
Download to read offline
bertjan@openvalue.eu
Debugging distributed systems
Bert Jan Schrijver
@bjschrijver
Debugging distributed systems: the good parts
bertjan@openvalue.eu
Bert Jan Schrijver
@bjschrijver
Networking 101
How the internet works
Why?
Bert Jan Schrijver
L e t ’ s m e e t
@bjschrijver
Why are distributed
systems difficult?
Networking 101
What?
Why?
✅
Demo
War stories
Conclusion
W h a t ‘ s n e x t ?
Outline
A structured approach
@bjschrijver
What is a distributed system?
A distributed system is a system whose
components are located on different
networked computers, which
communicate and coordinate their actions
by passing messages to one another.
• Concurrency of components


• Lack of a global clock


• Independent failure of components


• Distributed systems are harder to reason
about
Characteristics of distributed systems
Source:	http://www.nasa.gov/images/content/218652main_STOCC_FS_img_lg.jpg
Working with distributed systems is
fundamentally different from writing
software on a single computer - and the
main difference is that there are lots of new
and exciting ways for things to go wrong.


- Martin Kleppmann
“
”
Photo:	Dave	Lehl
What could possibly go wrong?
“ ”
Photo:	Dave	Lehl
OSI & TCP/IP
Source:	https://www.guru99.com/difference-tcp-ip-vs-osi-model.html
.. in your browser’s address bar and press Enter
What happens when you type google.com…
Source:	https://github.com/alex/what-happens-when
13
Source:	https://7216-presscdn-0-76-pagely.netdna-ssl.com/wp-content/uploads/2011/12/confused-man-single-good-men.jpg
Where do I start?
Step 1: Observe & document
• What do you know about the problem?


• Inspect logging, errors, metrics, tracing


• Draw the path from source to target - what’s
in between? Focus on details!


• Document what you know


• Can we reproduce in a test?


• By injecting errors, for example
Tools


Whiteboard,
documentation, logging,
metrics, tracing
(opentracing.io), tests,
Jepsen
Step 1: Observe & document
Step 2: Create minimal reproducer
• Goal: maximise the amount of debugging
cycles


• Focus on short development iterations /
feedback loops


• Get close to the action!
Tools


IDE, Shell scripts,


SSH tunnels, Curl
Step 3: Debug client side
• Focus on eliminating anything that could be
wrong on the client side


• Are we connecting to the right host?


• Do we send the right message?


• Do we receive a response?


• Not much different from local


debugging
Tools


IDE, debugger,
logging
Step 4: Check DNS & routing
• DNS:


• Make sure you know what IP address the
hostname should resolve to


• Verify that this actually happens


at the client


• Routing:


• Verify you can reach the


target machine
Tools


host, nslookup,
dig, whois, ping,
traceroute,
nslookup.io,
dnschecker.org
Step 5: Check connection
• Can we connect to the port?


• If not, do we get a REJECT or a DROP?


• Does the connection open and stay open?


• Are we talking TLS?


• What is the connection speed


between us?
Tools


telnet, nc, curl,
iperf
Step 6: Inspect traffic / messages
• Do we send the right request?


• Do we receive the right response?


• How do we know?


• How do we handle TLS?


• Are there any load balancers


or proxies in between?
Tools


curl, wireshark,
tcpdump, network
tab in browser,
mitm/tls proxy
Step 7: Debug server side
• Inspect the remote host


• Can we attach a remote debugger?


• See https://youtube.com/OpenValue


• Profiling


• Strace Tools


SSH tunnels,
remote debugger,
profiler, strace
Step 8: Wrap up & post mortem
• Document the issue:


• Timeline


• What did we see?


• Why did it happen?


• What was the impact?


• How did we find out?


• What did we do to mitigate and fix?


• What should we do to prevent


repetition?
Tools


Whiteboard,
documentation
If you really want a reliable system, you
have to understand what its failure modes
are. You have to actually have witnessed
it misbehaving.


- Jason Cahoon
“
”
Distributed systems war stories
The time where it worked half of the time…
The one at a school…
The one with breaking news…
Summary: a structured approach
to debugging distributed systems
@bjschrijver
Check DNS & routing
Check connection
Debug client side
Create minimal reproducer
Debug server side
Observe & document
Wrap up & post mortem
Inspect traffic / messages
Source:	https://cdn2.vox-cdn.com/thumbor/J9OqPYS7FgI9fjGhnF7AFh8foVY=/148x0:1768x1080/1280x854/cdn0.vox-cdn.com/uploads/chorus_image/image/46147742/cute-success-kid-1920x1080.0.0.jpg
THAT’S IT.


NOW GO KICK SOME ASS!
Questions?


@bjschrijver
Thanks for your time.
Got feedback? Tweet it!
All pictures belong


to their respective


authors
@bjschrijver

More Related Content

What's hot

JUG Bonn June 2021 - The DevOps disaster
JUG Bonn June 2021 - The DevOps disasterJUG Bonn June 2021 - The DevOps disaster
JUG Bonn June 2021 - The DevOps disasterBert Jan Schrijver
 
Principles of Continuous Delivery and DevOps
Principles of Continuous Delivery and DevOpsPrinciples of Continuous Delivery and DevOps
Principles of Continuous Delivery and DevOpsBert Jan Schrijver
 
Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...
Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...
Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...Bert Jan Schrijver
 
Cyberland 2020 - Better software, faster: Principles of Continuous Delivery a...
Cyberland 2020 - Better software, faster: Principles of Continuous Delivery a...Cyberland 2020 - Better software, faster: Principles of Continuous Delivery a...
Cyberland 2020 - Better software, faster: Principles of Continuous Delivery a...Bert Jan Schrijver
 
OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...
OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...
OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...Bert Jan Schrijver
 
Continuous Delivery: better software, faster.
Continuous Delivery: better software, faster.Continuous Delivery: better software, faster.
Continuous Delivery: better software, faster.Bert Jan Schrijver
 
Den Bosch Java User Group April 2020 - Better software, faster - Principles o...
Den Bosch Java User Group April 2020 - Better software, faster - Principles o...Den Bosch Java User Group April 2020 - Better software, faster - Principles o...
Den Bosch Java User Group April 2020 - Better software, faster - Principles o...Bert Jan Schrijver
 
JavaZone 2019 - Better software, faster: Principles of Continuous Delivery an...
JavaZone 2019 - Better software, faster: Principles of Continuous Delivery an...JavaZone 2019 - Better software, faster: Principles of Continuous Delivery an...
JavaZone 2019 - Better software, faster: Principles of Continuous Delivery an...Bert Jan Schrijver
 
OpenValue meetup June 2019 - Better, software faster: Principles of Continuou...
OpenValue meetup June 2019 - Better, software faster: Principles of Continuou...OpenValue meetup June 2019 - Better, software faster: Principles of Continuou...
OpenValue meetup June 2019 - Better, software faster: Principles of Continuou...Bert Jan Schrijver
 
DevoxxUK 2019 - Better software, faster.
DevoxxUK 2019 - Better software, faster.DevoxxUK 2019 - Better software, faster.
DevoxxUK 2019 - Better software, faster.Bert Jan Schrijver
 
CodeOne 2018 - Better software, faster: principles of Continuous Delivery and...
CodeOne 2018 - Better software, faster: principles of Continuous Delivery and...CodeOne 2018 - Better software, faster: principles of Continuous Delivery and...
CodeOne 2018 - Better software, faster: principles of Continuous Delivery and...Bert Jan Schrijver
 
AmsterdamJUG September 2019 - Better software, faster: Principles of Continuo...
AmsterdamJUG September 2019 - Better software, faster: Principles of Continuo...AmsterdamJUG September 2019 - Better software, faster: Principles of Continuo...
AmsterdamJUG September 2019 - Better software, faster: Principles of Continuo...Bert Jan Schrijver
 
DevOps(1) : What's DevOps - (MOSG)
DevOps(1) : What's DevOps - (MOSG)DevOps(1) : What's DevOps - (MOSG)
DevOps(1) : What's DevOps - (MOSG)Soshi Nemoto
 
Introduction to devops 2016
Introduction to devops 2016Introduction to devops 2016
Introduction to devops 2016gjdevos
 
Continuous Testing in DevOps
Continuous Testing in DevOpsContinuous Testing in DevOps
Continuous Testing in DevOpsTechWell
 
The Four Keys - Measuring DevOps Success
The Four Keys - Measuring DevOps SuccessThe Four Keys - Measuring DevOps Success
The Four Keys - Measuring DevOps SuccessDina Graves Portman
 
Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Kris Buytaert
 
Security & DevOps- Ways To Make Sure Your Apps & Infrastructure Are Secure
Security & DevOps- Ways To Make Sure Your Apps & Infrastructure Are SecureSecurity & DevOps- Ways To Make Sure Your Apps & Infrastructure Are Secure
Security & DevOps- Ways To Make Sure Your Apps & Infrastructure Are SecurePuppet
 
Walk This Way - An Introduction to DevOps
Walk This Way - An Introduction to DevOpsWalk This Way - An Introduction to DevOps
Walk This Way - An Introduction to DevOpsNathen Harvey
 

What's hot (20)

JUG Bonn June 2021 - The DevOps disaster
JUG Bonn June 2021 - The DevOps disasterJUG Bonn June 2021 - The DevOps disaster
JUG Bonn June 2021 - The DevOps disaster
 
Principles of Continuous Delivery and DevOps
Principles of Continuous Delivery and DevOpsPrinciples of Continuous Delivery and DevOps
Principles of Continuous Delivery and DevOps
 
Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...
Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...
Devoxx Belgium 2019 - Better software, faster: Principles of Continuous Deliv...
 
Cyberland 2020 - Better software, faster: Principles of Continuous Delivery a...
Cyberland 2020 - Better software, faster: Principles of Continuous Delivery a...Cyberland 2020 - Better software, faster: Principles of Continuous Delivery a...
Cyberland 2020 - Better software, faster: Principles of Continuous Delivery a...
 
OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...
OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...
OpenValue Vienna meetup september 2020 - Better software, faster: Principles ...
 
Continuous Delivery: better software, faster.
Continuous Delivery: better software, faster.Continuous Delivery: better software, faster.
Continuous Delivery: better software, faster.
 
Den Bosch Java User Group April 2020 - Better software, faster - Principles o...
Den Bosch Java User Group April 2020 - Better software, faster - Principles o...Den Bosch Java User Group April 2020 - Better software, faster - Principles o...
Den Bosch Java User Group April 2020 - Better software, faster - Principles o...
 
JavaZone 2019 - Better software, faster: Principles of Continuous Delivery an...
JavaZone 2019 - Better software, faster: Principles of Continuous Delivery an...JavaZone 2019 - Better software, faster: Principles of Continuous Delivery an...
JavaZone 2019 - Better software, faster: Principles of Continuous Delivery an...
 
OpenValue meetup June 2019 - Better, software faster: Principles of Continuou...
OpenValue meetup June 2019 - Better, software faster: Principles of Continuou...OpenValue meetup June 2019 - Better, software faster: Principles of Continuou...
OpenValue meetup June 2019 - Better, software faster: Principles of Continuou...
 
DevoxxUK 2019 - Better software, faster.
DevoxxUK 2019 - Better software, faster.DevoxxUK 2019 - Better software, faster.
DevoxxUK 2019 - Better software, faster.
 
CodeOne 2018 - Better software, faster: principles of Continuous Delivery and...
CodeOne 2018 - Better software, faster: principles of Continuous Delivery and...CodeOne 2018 - Better software, faster: principles of Continuous Delivery and...
CodeOne 2018 - Better software, faster: principles of Continuous Delivery and...
 
AmsterdamJUG September 2019 - Better software, faster: Principles of Continuo...
AmsterdamJUG September 2019 - Better software, faster: Principles of Continuo...AmsterdamJUG September 2019 - Better software, faster: Principles of Continuo...
AmsterdamJUG September 2019 - Better software, faster: Principles of Continuo...
 
DevOps(1) : What's DevOps - (MOSG)
DevOps(1) : What's DevOps - (MOSG)DevOps(1) : What's DevOps - (MOSG)
DevOps(1) : What's DevOps - (MOSG)
 
Introduction to devops 2016
Introduction to devops 2016Introduction to devops 2016
Introduction to devops 2016
 
Continuous Testing in DevOps
Continuous Testing in DevOpsContinuous Testing in DevOps
Continuous Testing in DevOps
 
The Four Keys - Measuring DevOps Success
The Four Keys - Measuring DevOps SuccessThe Four Keys - Measuring DevOps Success
The Four Keys - Measuring DevOps Success
 
Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.Devops, the future is here, it's just not evenly distributed yet.
Devops, the future is here, it's just not evenly distributed yet.
 
Devops
DevopsDevops
Devops
 
Security & DevOps- Ways To Make Sure Your Apps & Infrastructure Are Secure
Security & DevOps- Ways To Make Sure Your Apps & Infrastructure Are SecureSecurity & DevOps- Ways To Make Sure Your Apps & Infrastructure Are Secure
Security & DevOps- Ways To Make Sure Your Apps & Infrastructure Are Secure
 
Walk This Way - An Introduction to DevOps
Walk This Way - An Introduction to DevOpsWalk This Way - An Introduction to DevOps
Walk This Way - An Introduction to DevOps
 

Similar to Debugging distributed systems

Mastering Microservices 2022 - Debugging distributed systems
Mastering Microservices 2022 - Debugging distributed systemsMastering Microservices 2022 - Debugging distributed systems
Mastering Microservices 2022 - Debugging distributed systemsBert Jan Schrijver
 
Devoxx Belgium 2022 - Debugging distributed systems
Devoxx Belgium 2022 - Debugging distributed systemsDevoxx Belgium 2022 - Debugging distributed systems
Devoxx Belgium 2022 - Debugging distributed systemsBert Jan Schrijver
 
Arnhem JUG March 2023 - Debugging distributed systems
Arnhem JUG March 2023 - Debugging distributed systemsArnhem JUG March 2023 - Debugging distributed systems
Arnhem JUG March 2023 - Debugging distributed systemsBert Jan Schrijver
 
Incident Response Fails
Incident Response FailsIncident Response Fails
Incident Response FailsMichael Gough
 
WTF is Penetration Testing v.2
WTF is Penetration Testing v.2WTF is Penetration Testing v.2
WTF is Penetration Testing v.2Scott Sutherland
 
Pentesting Tips: Beyond Automated Testing
Pentesting Tips: Beyond Automated TestingPentesting Tips: Beyond Automated Testing
Pentesting Tips: Beyond Automated TestingAndrew McNicol
 
Monitoring microservices
Monitoring microservicesMonitoring microservices
Monitoring microservicesWilliam Brander
 
When Security Tools Fail You
When Security Tools Fail YouWhen Security Tools Fail You
When Security Tools Fail YouMichael Gough
 
FUEL_USERS_GROUP
FUEL_USERS_GROUPFUEL_USERS_GROUP
FUEL_USERS_GROUPWill Pearce
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Owning windows 8 with human interface devices
Owning windows 8 with human interface devicesOwning windows 8 with human interface devices
Owning windows 8 with human interface devicesNikhil Mittal
 
Professional Hacking in 2011
Professional Hacking in 2011Professional Hacking in 2011
Professional Hacking in 2011securityaegis
 
From SLO to GOTY
From SLO to GOTYFrom SLO to GOTY
From SLO to GOTYScyllaDB
 
Blockchain and Hook model of engagement
Blockchain and Hook model of engagement Blockchain and Hook model of engagement
Blockchain and Hook model of engagement Rajeev Soni
 
Jax Devops 2017 Succeeding in the Cloud – the guidebook of Fail
Jax Devops 2017  Succeeding in the Cloud – the guidebook of FailJax Devops 2017  Succeeding in the Cloud – the guidebook of Fail
Jax Devops 2017 Succeeding in the Cloud – the guidebook of FailSteve Poole
 
THOTCON 0x6: Going Kinetic on Electronic Crime Networks
THOTCON 0x6: Going Kinetic on Electronic Crime NetworksTHOTCON 0x6: Going Kinetic on Electronic Crime Networks
THOTCON 0x6: Going Kinetic on Electronic Crime NetworksJohn Bambenek
 
Creating Havoc using Human Interface Device
Creating Havoc using Human Interface DeviceCreating Havoc using Human Interface Device
Creating Havoc using Human Interface DevicePositive Hack Days
 
2023 NCIT: Introduction to Intrusion Detection
2023 NCIT: Introduction to Intrusion Detection2023 NCIT: Introduction to Intrusion Detection
2023 NCIT: Introduction to Intrusion DetectionAPNIC
 

Similar to Debugging distributed systems (20)

Mastering Microservices 2022 - Debugging distributed systems
Mastering Microservices 2022 - Debugging distributed systemsMastering Microservices 2022 - Debugging distributed systems
Mastering Microservices 2022 - Debugging distributed systems
 
Devoxx Belgium 2022 - Debugging distributed systems
Devoxx Belgium 2022 - Debugging distributed systemsDevoxx Belgium 2022 - Debugging distributed systems
Devoxx Belgium 2022 - Debugging distributed systems
 
Arnhem JUG March 2023 - Debugging distributed systems
Arnhem JUG March 2023 - Debugging distributed systemsArnhem JUG March 2023 - Debugging distributed systems
Arnhem JUG March 2023 - Debugging distributed systems
 
Incident Response Fails
Incident Response FailsIncident Response Fails
Incident Response Fails
 
WTF is Penetration Testing v.2
WTF is Penetration Testing v.2WTF is Penetration Testing v.2
WTF is Penetration Testing v.2
 
Pentesting Tips: Beyond Automated Testing
Pentesting Tips: Beyond Automated TestingPentesting Tips: Beyond Automated Testing
Pentesting Tips: Beyond Automated Testing
 
Monitoring microservices
Monitoring microservicesMonitoring microservices
Monitoring microservices
 
When Security Tools Fail You
When Security Tools Fail YouWhen Security Tools Fail You
When Security Tools Fail You
 
Heartbleed
HeartbleedHeartbleed
Heartbleed
 
FUEL_USERS_GROUP
FUEL_USERS_GROUPFUEL_USERS_GROUP
FUEL_USERS_GROUP
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Owning windows 8 with human interface devices
Owning windows 8 with human interface devicesOwning windows 8 with human interface devices
Owning windows 8 with human interface devices
 
Professional Hacking in 2011
Professional Hacking in 2011Professional Hacking in 2011
Professional Hacking in 2011
 
From SLO to GOTY
From SLO to GOTYFrom SLO to GOTY
From SLO to GOTY
 
Blockchain and Hook model of engagement
Blockchain and Hook model of engagement Blockchain and Hook model of engagement
Blockchain and Hook model of engagement
 
Rewriting DevOps
Rewriting DevOpsRewriting DevOps
Rewriting DevOps
 
Jax Devops 2017 Succeeding in the Cloud – the guidebook of Fail
Jax Devops 2017  Succeeding in the Cloud – the guidebook of FailJax Devops 2017  Succeeding in the Cloud – the guidebook of Fail
Jax Devops 2017 Succeeding in the Cloud – the guidebook of Fail
 
THOTCON 0x6: Going Kinetic on Electronic Crime Networks
THOTCON 0x6: Going Kinetic on Electronic Crime NetworksTHOTCON 0x6: Going Kinetic on Electronic Crime Networks
THOTCON 0x6: Going Kinetic on Electronic Crime Networks
 
Creating Havoc using Human Interface Device
Creating Havoc using Human Interface DeviceCreating Havoc using Human Interface Device
Creating Havoc using Human Interface Device
 
2023 NCIT: Introduction to Intrusion Detection
2023 NCIT: Introduction to Intrusion Detection2023 NCIT: Introduction to Intrusion Detection
2023 NCIT: Introduction to Intrusion Detection
 

Recently uploaded

TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxAndrieCagasanAkio
 
Summary ID-IGF 2016 National Dialogue - English (tata kelola internet / int...
Summary  ID-IGF 2016 National Dialogue  - English (tata kelola internet / int...Summary  ID-IGF 2016 National Dialogue  - English (tata kelola internet / int...
Summary ID-IGF 2016 National Dialogue - English (tata kelola internet / int...ICT Watch - Indonesia
 
How to login to Router net ORBI LOGIN...
How to login to Router net ORBI LOGIN...How to login to Router net ORBI LOGIN...
How to login to Router net ORBI LOGIN...rrouter90
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119APNIC
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxMario
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxmibuzondetrabajo
 
Cybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesCybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesLumiverse Solutions Pvt Ltd
 
办理澳洲USYD文凭证书学历认证【Q微/1954292140】办理悉尼大学毕业证书真实成绩单GPA修改/办理澳洲大学文凭证书Offer录取通知书/在读证明...
办理澳洲USYD文凭证书学历认证【Q微/1954292140】办理悉尼大学毕业证书真实成绩单GPA修改/办理澳洲大学文凭证书Offer录取通知书/在读证明...办理澳洲USYD文凭证书学历认证【Q微/1954292140】办理悉尼大学毕业证书真实成绩单GPA修改/办理澳洲大学文凭证书Offer录取通知书/在读证明...
办理澳洲USYD文凭证书学历认证【Q微/1954292140】办理悉尼大学毕业证书真实成绩单GPA修改/办理澳洲大学文凭证书Offer录取通知书/在读证明...vmzoxnx5
 
Summary IGF 2013 Bali - English (tata kelola internet / internet governance)
Summary  IGF 2013 Bali - English (tata kelola internet / internet governance)Summary  IGF 2013 Bali - English (tata kelola internet / internet governance)
Summary IGF 2013 Bali - English (tata kelola internet / internet governance)ICT Watch - Indonesia
 

Recently uploaded (9)

TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptx
 
Summary ID-IGF 2016 National Dialogue - English (tata kelola internet / int...
Summary  ID-IGF 2016 National Dialogue  - English (tata kelola internet / int...Summary  ID-IGF 2016 National Dialogue  - English (tata kelola internet / int...
Summary ID-IGF 2016 National Dialogue - English (tata kelola internet / int...
 
How to login to Router net ORBI LOGIN...
How to login to Router net ORBI LOGIN...How to login to Router net ORBI LOGIN...
How to login to Router net ORBI LOGIN...
 
IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptx
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptx
 
Cybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesCybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best Practices
 
办理澳洲USYD文凭证书学历认证【Q微/1954292140】办理悉尼大学毕业证书真实成绩单GPA修改/办理澳洲大学文凭证书Offer录取通知书/在读证明...
办理澳洲USYD文凭证书学历认证【Q微/1954292140】办理悉尼大学毕业证书真实成绩单GPA修改/办理澳洲大学文凭证书Offer录取通知书/在读证明...办理澳洲USYD文凭证书学历认证【Q微/1954292140】办理悉尼大学毕业证书真实成绩单GPA修改/办理澳洲大学文凭证书Offer录取通知书/在读证明...
办理澳洲USYD文凭证书学历认证【Q微/1954292140】办理悉尼大学毕业证书真实成绩单GPA修改/办理澳洲大学文凭证书Offer录取通知书/在读证明...
 
Summary IGF 2013 Bali - English (tata kelola internet / internet governance)
Summary  IGF 2013 Bali - English (tata kelola internet / internet governance)Summary  IGF 2013 Bali - English (tata kelola internet / internet governance)
Summary IGF 2013 Bali - English (tata kelola internet / internet governance)
 

Debugging distributed systems

  • 2. Debugging distributed systems: the good parts bertjan@openvalue.eu Bert Jan Schrijver @bjschrijver Networking 101 How the internet works
  • 4. Bert Jan Schrijver L e t ’ s m e e t @bjschrijver
  • 5. Why are distributed systems difficult? Networking 101 What? Why? ✅ Demo War stories Conclusion W h a t ‘ s n e x t ? Outline A structured approach @bjschrijver
  • 6. What is a distributed system?
  • 7. A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another.
  • 8. • Concurrency of components • Lack of a global clock • Independent failure of components 
 • Distributed systems are harder to reason about Characteristics of distributed systems Source: http://www.nasa.gov/images/content/218652main_STOCC_FS_img_lg.jpg
  • 9. Working with distributed systems is fundamentally different from writing software on a single computer - and the main difference is that there are lots of new and exciting ways for things to go wrong. - Martin Kleppmann “ ” Photo: Dave Lehl
  • 10. What could possibly go wrong? “ ” Photo: Dave Lehl
  • 12. .. in your browser’s address bar and press Enter What happens when you type google.com… Source: https://github.com/alex/what-happens-when
  • 13. 13
  • 15. Step 1: Observe & document • What do you know about the problem? • Inspect logging, errors, metrics, tracing • Draw the path from source to target - what’s in between? Focus on details! • Document what you know • Can we reproduce in a test? • By injecting errors, for example Tools Whiteboard, documentation, logging, metrics, tracing (opentracing.io), tests, Jepsen
  • 16. Step 1: Observe & document
  • 17. Step 2: Create minimal reproducer • Goal: maximise the amount of debugging cycles • Focus on short development iterations / feedback loops • Get close to the action! Tools IDE, Shell scripts, SSH tunnels, Curl
  • 18. Step 3: Debug client side • Focus on eliminating anything that could be wrong on the client side • Are we connecting to the right host? • Do we send the right message? • Do we receive a response? • Not much different from local 
 debugging Tools IDE, debugger, logging
  • 19. Step 4: Check DNS & routing • DNS: • Make sure you know what IP address the hostname should resolve to • Verify that this actually happens 
 at the client • Routing: • Verify you can reach the 
 target machine Tools host, nslookup, dig, whois, ping, traceroute, nslookup.io, dnschecker.org
  • 20. Step 5: Check connection • Can we connect to the port? • If not, do we get a REJECT or a DROP? • Does the connection open and stay open? • Are we talking TLS? • What is the connection speed 
 between us? Tools telnet, nc, curl, iperf
  • 21. Step 6: Inspect traffic / messages • Do we send the right request? • Do we receive the right response? • How do we know? • How do we handle TLS? • Are there any load balancers 
 or proxies in between? Tools curl, wireshark, tcpdump, network tab in browser, mitm/tls proxy
  • 22. Step 7: Debug server side • Inspect the remote host • Can we attach a remote debugger? • See https://youtube.com/OpenValue • Profiling • Strace Tools SSH tunnels, remote debugger, profiler, strace
  • 23. Step 8: Wrap up & post mortem • Document the issue: • Timeline • What did we see? • Why did it happen? • What was the impact? • How did we find out? • What did we do to mitigate and fix? • What should we do to prevent 
 repetition? Tools Whiteboard, documentation
  • 24. If you really want a reliable system, you have to understand what its failure modes are. You have to actually have witnessed it misbehaving. - Jason Cahoon “ ”
  • 26. The time where it worked half of the time…
  • 27. The one at a school…
  • 28.
  • 29. The one with breaking news…
  • 30. Summary: a structured approach to debugging distributed systems @bjschrijver Check DNS & routing Check connection Debug client side Create minimal reproducer Debug server side Observe & document Wrap up & post mortem Inspect traffic / messages
  • 33. Thanks for your time. Got feedback? Tweet it! All pictures belong to their respective authors @bjschrijver