Chaos Engineering is used in a distributed system to test integrally all the application by simulating error conditions within the system and observes how the application reacts to that stimulus. With all this information and analyzing it correctly, you can write applications more resilient to the failures. This talk will provide an introduction to the principles of Chaos Engineering, how to perform experiments, identify the weakness of the architecture and fix these problems.
Come to this session to learn different tools like Istio, Chaos Toolkit or Glooshot to run Chaos Engineering in Kubernetes and what strategies you can use to prevent chaos from taking over your system.
26. @alexsotob26
Hypothesis
What happen in case of …
Service starts returning Error Codes
Latency increased to 500 ms
Database is not available
Database is not available
Time travel
Partially deleting Kafka topics
27. @alexsotob27
Run
Canary Release
X v1
X v2
user
90%
10%
Dark Canaries
X v1
X v2
user
*
[10.0.X.Y]
Containerise the experiment
Define Expected Behaviour
Make it public within the organisation
46. @alexsotob46
Start with the most sympathetic and innovative groups
MarketGrowth
Time
2.5% 13,5% 34% 34% 16%
The Chasm
Early
Adopters
Innovators
Early
Majority
Late
Majority
Laggards
The Technology Adoption Curve (Source: Moore and McKenna, Crossing The Chasm)
47. @alexsotob47
Start small
As close as possible to production
Communicate with everybody
Have a failover plan
Start Manual
To get started
48. @alexsotob48
This is the circle of sadness.
Your job is to make sure
that all sadness stays
inside of it.
— Joy
“
49. @alexsotob49
Oh yes,
the past can hurt.
But the way I see it,
you can either run from it
or learn from it.
— Rafiki
“
50. @alexsotob
Hay un amigo en mí,
cuando salgan a volar,
hay un amigo en mí
— Toy Story
“
@alexsotob
asotobue@redhat.com
http://www.lordofthejars.com/
lordofthejars