More Related Content Similar to Chaos Debugging for Microservices (20) More from Christian Posta (16) Chaos Debugging for Microservices1. 1 | Copyright © 2019
CHAOS DEBUGGING
CHRISTIAN POSTA – FIELD CTO, SOLO.IO
2. 2 | Copyright © 2019
CHRISTIAN POSTA
• Field CTO @ solo.io
• Author of a few books
• Contributor to many open-source projects
• Architect, blogger, speaker, mentor, leader
https://bit.ly/istio-in-action
@christianposta
christian@solo.io
https://blog.christianposta.com
https://slideshare.net/ceposta
3. 3 | Copyright © 2019
Move fast, safely
https://puppet.com/resources/whitepaper/state-of-devops-report
5. 5 | Copyright © 2019
Service mesh technologies typically provide:
• Service discovery / Load balancing
• Secure service-to-service communication
• Traffic control / shaping / shifting
• Policy / Intention based access control
• Traffic metric collection
• Service resilience
• API / programmable interface
6. 6 | Copyright © 2019 @christianposta
Service Mesh Interface (SMI)
https://github.com/deislabs/smi-spec https://supergloo.solo.io
https://servicemeshhub.io
7. 7 | Copyright © 20197 | Copyright © 2019
THE PROBLEM:
DEBUGGING MICROSERVICES APPLICATIONS IS HARD.
8. 8 | Copyright © 2019
THE PROBLEM
A MONOLITHIC APPLICATION
CONSISTS OF A SINGLE
PROCESS
AN ATTACHED DEBUGGER
ALLOWS VIEWING THE
COMPLETE STATE OF THE
APPLICATION DURING RUNTIME
A MICROSERVICES APPLICATION
CONSISTS OF POTENTIALLY
HUNDREDS OF PROCESSES
IS IT POSSIBLE TO GET A
COMPLETE VIEW OF THE STATE
OF A SUCH APPLICATION?!
13. 13 | Copyright © 2019
SQUASH DEFAULT MODE
Node
Namespace: ns-a Namespace: squash
s-dlvc1
14. 14 | Copyright © 2019
-> ls -l /proc/self/ns
total 0
lrwxrwxrwx 1 idit idit 0 Dec 7 01:14 cgroup -> cgroup:[4026531835]
lrwxrwxrwx 1 idit idit 0 Dec 7 01:14 ipc -> ipc:[4026531839]
lrwxrwxrwx 1 idit idit 0 Dec 7 01:14 mnt -> mnt:[4026531840]
lrwxrwxrwx 1 idit idit 0 Dec 7 01:14 net -> net:[4026532009]
lrwxrwxrwx 1 idit idit 0 Dec 7 01:14 pid -> pid:[4026531836]
lrwxrwxrwx 1 idit idit 0 Dec 7 01:14 pid_for_children -> pid:[4026531836]
lrwxrwxrwx 1 idit idit 0 Dec 7 01:14 user -> user:[4026531837]
lrwxrwxrwx 1 idit idit 0 Dec 7 01:14 uts -> uts:[4026531838]
-> inod of mnt namespace (unique identifier to the container namespace)
via CRI api call ExecSyncRequest
Node
Namespace: ns-a
s-dlv
CRI
c1
we need to translate the pid of the process (application that run in the container) to the host pid namespace
to allow debugger to attach.
Namespace: Squash
15. 15 | Copyright © 2019
SQUASH SECURE MODE
Node
Namespace: ns-a Namespace: squash
s-dlvc1
CRD Intent
squash
16. 16 | Copyright © 2019
DOCS: HTTPS://SQUASH.SOLO.IO
GITHUB: HTTPS://GITHUB.COM/SOLO-IO/ SQUASH
COMMUNITY: HTTPS://SLACK.SOLO.IO
18. 18 | Copyright © 201918 | Copyright © 2019
DEBUGGING IN PRODUCTION
19. 19 | Copyright © 2019
DB S3
DEBUGGING IN PRODUCTION
CLUSTER
POD 1 POD 2
> ONLY HEADER WILL BE SENT
> SAMPLING
POD 3 POD 4
20. 20 | Copyright © 2019
DB S3
P P P P
DEBUGGING IN PRODUCTION
CLUSTER
POD 1 POD 2 POD 3 POD 4
> ONLY HEADER WILL BE SENT
> SAMPLING
21. 21 | Copyright © 2019
DB S3
P P P P
DEBUGGING IN PRODUCTION
CLUSTER
POD 1 POD 2 POD 3 POD 4
22. 22 | Copyright © 2019
DB S3
P P P P
DEBUGGING IN PRODUCTION
CLUSTER
POD 1 POD 2 POD 3 POD 4
25. 25 | Copyright © 2019
DOCS: COMING REAL SOON …
GITHUB: COMING REAL SOON …
COMMUNITY: HTTPS://SLACK.SOLO.IO
27. 27 | Copyright © 2019
CHAOS ENGINEERING
THINK OF A VACCINE OR A FLU SHOT
INJECT YOURSELF WITH SOMETHING HARMFUL
IN ORDER TO PREVENT A FUTURE ISSUE.
CAREFULLY INJECTING THIS HARM INTO YOUR SYSTEMS
TO TEST THE SYSTEM’S ABILITY TO RESPOND TO IT.
“BREAK THINGS ON PURPOSE" IN ORDER TO LEARN
HOW TO BUILD MORE RESILIENT SYSTEMS.
28. 28 | Copyright © 2019
PROBLEMS WITH CHAOS ENGINEERING TODAY?
LANGUAGE SPECIFIC CODE MODIFICATION
1 2
29. 29 | Copyright © 2019
NETWORK ABSTRACTION
EAST-WEST
TRAFFIC
NORTH-SOUTH
TRAFFIC
SERVICE
I
SERVICE
II
SERVICE
III
SERVICE
IV
SERVICE
V
30. 30 | Copyright © 2019
CONTROL EXPERIMENT
⍄ DEFINE EXPERIMENTS (SET OF: MESSAGE DELAYS, NETWORK FAULTS)
⍄ RUN EVERY INTERVAL (E.G. EVERY FRIDAY AT 9PM)
⍄ GATHERED METRICS – COMPARE BASELINE
⍄ STOP EXPERIMENT IF CONDITION REACHED
31. 31 | Copyright © 2019
GLOOSHOT
GLOOSHOT ALLOWS YOU TO PERFORM CHAOS EXPERIMENTS AT THE SERVICE MESH
LEVEL.
DEFINE ERROR CONDITIONS IN TERMS OF SUCH FAILURE MODES:
⍄ MESSAGE DELAYS
⍄ NETWORK FAULTS.
RUN EXPERIMENTS UNTIL A STOP CONDITION IS MET.
GLOOSHOT INTERFACES WITH ALL MAJOR SERVICE MESHES THROUGH SERVICE MESH
INTERFACE (SMI).
33. 33 | Copyright © 2019
DOCS: HTTPS://GLOOSHOT.SOLO.IO
GITHUB: HTTPS://GITHUB.COM/SOLO-IO/GLOOSHOT
COMMUNITY: HTTPS://SLACK.SOLO.IO
34. 34 | Copyright © 201934 | Copyright © 2019
WHAT IS THE BEST WAY TO START?
38. 38 | Copyright © 2019
SUMMARY
DEBUG CAPTURE FIND
SQUASH
DEBUGGER FOR
MICROSERVICES AND
SERVERLESS
RECORD, REPLY, AND
DEBUG ERRONEOUS
REQUESTS OUTSIDE
PRODUCTION
CHAOS ENGINEERING
FOR SERVICE MESH
LOOP GLOOSHOT
SHARE
FIRST INDUSTRY SERVICE
MESH HUB TO
ACCELERATE ADOPTION,
ADVANCE INNOVATION
AND FOSTER
COLLABORATION
SM HUB
39. 39 | Copyright © 2019
MORE FROM SOLO
GATEWAY GRAPHQL MESH
GLOO
AN ENVOY-POWERED API
GATEWAY, A K8S INGRESS, A
MIGRATION FACILITATOR, AND
MORE
PERFORM YOUR
QUERIES ACROSS ALL YOUR
APIS WITHOUT WRITING CODE
THE SIMPLEST WAY TO
INSTALL AND MANAGE
ONE OR MORE SERVICE
MESH TECHNOLOGIES.
SQOOP SUPERGLOO
40. 40 | Copyright © 201940 | Copyright © 2019
@SOLOIO_INC
HTTPS://GITHUB.COM/SOLO-IO
CHRISTIAN POSTA
@CHRISTIANPOSTA