SlideShare a Scribd company logo
1 of 106
Josh Evans – Engineering Leader
November 8, 2016
Mastering Chaos
A Netflix Guide to Microservices
Illness in the Family
Myelin Sheathe
Autoimmune disorder
Externally trigger
Treatable
Guillain-Barré Syndrome
Breathing is a miraculous act of
bravery
ELB
and so is taking traffic
Introductions
Microservice Basics
Challenges & Solutions
Organization & Architecture
Our Talk Today
Introductions
Microservice Basics
Challenges & Solutions
Organization & Architecture
Our Talk Today
1999 – 2009
Engineer & Engineering Manager
Ecommerce (DVD  Streaming)
2009 – 2013
Director of Engineering - Playback Services
2013 – 2016
Director of Operations Engineering
Josh Evans
Taking time off
Spending time with family
Thinking about what’s next
Today
Leader in subscription internet tv service
Hollywood, indy, local
Growing slate of original content
86 million members
~190 countries, 10s of languages
1000s of device types
Microservices on AWS
Introductions
Microservice Basics
Challenges & Solutions
Organization & Architecture
Our Talk Today
Netflix DVD Data Center - 2000
Linux Host
What microservices are not
Apache
Tomcat
Javaweb
STORE
LoadBalancer
BILLING
HTTP
JDBC
DB Link
HTTP/S
Monolithic code base
Monolithic database
Tightly coupled architecture
What is a microservice?
…the microservice architectural style is an
approach to developing a single application as a
suite of small services, each running in its own
process and communicating with lightweight
mechanisms, often an HTTP resource API.
- Martin Fowler
Separation of concerns
Modularity, encapsulation
Scalability
Horizontally scaling
Workload partitioning
Virtualization & elasticity
Automated operations
On demand provisioning
An Evolutionary Response
Organ Systems
Each organ has a purpose
Organs form systems
Systems form an organism
Edge
ELB
Zuul
NCCP
API
Middle Tier & Platform
Product
• Bucket testing
• Subscriber
• Recommendations
Platform
• Routing
• Configuration
• Crypto
Persistence
• Cache
• Database
Client Application
Client Library
EVCache Client Service Client
S S S S. . .
DB DB DB DB. . .
. . .
Microservices are an abstraction
. . .
Microservice
Introductions
Microservice Basics
Challenges & Solutions
Organization & Architecture
Our Talk Today
Dependency
Scale
Variance
Change
Challenges & Solutions
Dependency
Scale
Variance
Change
Challenges & Solutions
Intra-service requests
Client libraries
Data Persistence
Infrastructure
Use Cases
Intra-service Requests
Crossing the Chasm
Linux Host
Linux Host
Linux Host
Linux Host
Crossing the Chasm
Linux Host
Apache Tomcat
Linux Host
Apache Tomcat
Network latency, congestion, failure
Logical or scaling failure
Service A Service B
Cascading Failure
How do you know if it works?
Inoculation
Device Service B
Service C
Internet EdgeZuul
Service A
ELB
FITSynthetic transactions
Override by device or account
% of live traffic up to 100%
Fault Injection Testing (FIT)
Device Service B
Service C
Internet EdgeZuul
Service A
ELB
FIT
Fault Injection Testing (FIT)
Enforced throughout the call path
ELB
API
How do we constrain testing scope?
API Gateway
App 1
App 2
App 4
App 5
App 6
App 3
App 7
App 8
99.99
99.99
99.99
99.99
99.99
99.99
99.99
99.99
Proxy
99.99 99.99
Combinatorial Math
99.9910 = 99.9
Critical Microservices
Client Libraries
• Many clients
• Common business logic
• Common access patterns
Return of the Monolith
Heap consumption
Logical defects
Transitive dependencies
Parasitic Infestation
Client Application
Client Library
EVCache Client Service Client
S S S S. . .
DB DB DB DB. . .
. . .
Simple Logic, Common Patterns
. . .
Persistence
In the presence of a network partition, you must choose
between consistency and availability
CAP Theorem
DB
DB
DB
Network B
Network C
Network D
Service
Network A
X
Zone A
Zone B
Zone C
Zone B
Zone C
Client
Zone A
Local Quorum
(Typical)
100ms
Eventual Consistency
Infrastructure
December 24th, 2012
US-East-1
Canada
No place to go
US
Latin America
US-East-1US-West-2 EU-West-1
#NetflixEverywhere Global Architecture
QCon London, 2016
https://www.infoq.com/presentations/netflix-failure-multiple-regions
Dependency
Scale
Variance
Change
Challenges & Solutions
Stateless services
Stateful services
Hybrid services
Use Cases
Stateless Services
Not a cache or a database
Frequently accessed metadata
No instance affinity
Loss a node is a non-event
What is a stateless service?
Minimum size
Desired capacity
Maximum size
Scale out as needed
S3AMI retrieved on demand
Compute efficiency
Node failure
Traffic spikes
Performance bugs
Auto Scaling Groups
Cluster A Cluster D
Edge Cluster
Cluster B
Cluster C
Surviving Instance Failure
Stateful Services
Databases & caches
Custom apps which hold large amounts of data
Loss of a node is a notable event
What is a stateful service?
Dedicated Shards – An Antipattern
Squid 1 Squid 2 Squid 3
Client Application
Subscriber Client Library
Cache Client Service Client
S S S S. . .
DB DB DB DB. . .
Squid n
HA Proxy
Set 1 Set 2 Set 3 Set n
X
Redundancy is fundamental
Zone A Zone B Zone C
. . .. . .. . .
EVCache Writes
Client Application
Client Library
EVCache Client
Client Application
Client Library
EVCache Client
Client Application
Client Library
EVCache Client
Zone A
Client Application
Client Library
EVCache Client
Zone B
Client Application
Client Library
EVCache Client
Zone C
Client Application
Client Library
EVCache Client
. . .. . .. . .
EVCache Reads
Hybrid Services
Client Application
Client Library
EVCache Client Service Client
S S S S. . .
DB DB DB DB. . .
. . .
Hybrid Microservice
. . .
It’s easy to take EVCache for granted
30 million requests/sec
2 trillion requests per day globally
Hundreds of billions of objects
Tens of thousands of memcached instances
Milliseconds of latency per request
Batch
S S S S. . .
DB DB DB DB. . .
. . . . . .
Member Path
Member Path
Member Path
Batch
Batch
Called by many services
Online & offline clients
Called many times / request
800k – 1M RPS
Fallback to service/db
Excessive Load
Batch
S S S S. . .
DB DB DB DB. . .
. . . . . .
Member Path
Member Path
Member Path
Batch
Batch
Excessive Load
X X
Batch
S S S S. . .
DB DB DB DB. . .
. . . . . .
Member Path
Member Path
Member Path
Batch
Batch
Workload partitioning
Request-level caching
Secure token fallback
Chaos under load
Solutions
Online Offline
Dependency
Scale
Variance
Change
Challenges & Solutions
Operational drift
Polyglot & containers
Use Cases
Operational Drift
(Unintentional Variance)
Over time
Alert thresholds
Timeouts, retries, fallbacks
Throughput (RPS)
Across microservices
Reliability best practices
Operational Drift
Autonomic Nervous
System
You don’t have to think about
digestion or breathing
Incident
Resolution
Review
Remediation
Analysis
Best Practice
Automation
Adoption
Continuous Learning & Automation
Alerts
Apache & Tomcat
Automated canary analysis
Autoscaling
Chaos
Consistent naming
ELB config
Healthcheck
Immutable machine images
Squeeze testing
Staged, red/black deployments
Timeouts, retries, fallbacks
Production Ready
Polyglot & Containers
(Intentional Variance)
The Paved Road
Stash
Nebula/Gradle
BaseAMI/Ubuntu
Jenkins
Spinnaker
Runtime Platform
In the Critical Path
In the Critical Path
Productivity tooling
Insight & triage capabilities
Base image fragmentation
Node management
Library/platform duplication
Learning curve - production expertise
Cost of Variance
Raise awareness of costs
Constrain centralized support
Prioritize by impact
Seek reusable solutions
Strategic Stance
Dependency
Scale
Variance
Change
Challenges & Solutions
How do we achieve velocity with confidence?
Global Cloud Management & Delivery
Integrated, Automated Practices
Conformity checks
Red/black pipelines
Automated canaries
Staged deployments
Squeeze tests
Alerts
Apache & Tomcat
Automated canary analysis
Autoscaling
Chaos
Consistent naming
ELB config
Healthcheck
Immutable machine images
Squeeze testing
Staged, red/black deployments
Timeouts, retries, fallbacks
Production Ready
https://www.youtube.com/watch?v=IkPb15FfuQU
Introductions
Microservice Basics
Challenges & Solutions
Organization & Architecture
Our Talk Today
Customer Device Netflix Data Center - 2009
NCCP
Electronic Delivery - NRDP 1.x
LoadBalancer
Netflix App
Security
Activation
Playback
Platform (NRDP)
UI
Collaborative design
XML payloads
Custom responses
Versioned firmware releases
Long cycles
Simple UI – “Queue Reader”
ED
Netflix API - let a 1000 flowers bloom!
Netflix Data Center - 2009
API
Netflix API – from public to private
LoadBalancer
General REST API
JSON schema
HTTP response codes
Oauth security model
Content Metadata
Content
Metadata
Application
Customer Device
Netflix Data Center – 2010
API
Hybrid Architecture
LB
Netflix App
Security
Activation
Playback
Platform (NRDP)
UI
Content
Metadata
NCCP
ED
LB
Distinct
• Services
• Protocols
• Schemas
• Security
Josh: what is the right long term architecture?
Peter: do you care about the organizational
implications?
Conway’s Law
Organizations which design systems are constrained to
produce designs which are copies of the
communication structures of these organizations.
Any piece of software reflects the organizational
structure that produced it.
Conway’s Law
If you have four teams working on a compiler you will
end up with a four pass compiler
NCCP
API
Blade Runner
Outcomes
Productivity & new capabilities
Refactored organization
Lessons
Solutions first, team second
Reconfigure teams to best support your architecture
Outcomes & Lessons
Introductions
Microservice Basics
Challenges & Solutions
Organization & Architecture
Recap
Our Talk Today
Microservice architectures are
complex and organic
Health depends on discipline and
chaos
Dependency
Circuit breakers, fallbacks, chaos
Simple clients
Eventual consistency
Multi-region failover
Scale
Auto-scaling
Redundancy – avoid SPoF
Partitioned workloads
Failure-driven design
Chaos under load
Variance
Engineered operations
Understood cost of variance
Prioritized support by impact
Change
Automated delivery
Integrated practices
Organization & Architecture
Solutions first, team second
netflix.github.io
techblog.netflix.com
Questions?

More Related Content

What's hot

MicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleMicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleSudhir Tonse
 
From Monolithic to Microservices
From Monolithic to Microservices From Monolithic to Microservices
From Monolithic to Microservices Amazon Web Services
 
Deploy 22 microservices from scratch in 30 mins with GitOps
Deploy 22 microservices from scratch in 30 mins with GitOpsDeploy 22 microservices from scratch in 30 mins with GitOps
Deploy 22 microservices from scratch in 30 mins with GitOpsOpsta
 
Microservices Architecture
Microservices ArchitectureMicroservices Architecture
Microservices ArchitectureMateus Prado
 
MicroService Architecture
MicroService ArchitectureMicroService Architecture
MicroService ArchitectureFred George
 
Introduction to Event-Driven Architecture
Introduction to Event-Driven Architecture Introduction to Event-Driven Architecture
Introduction to Event-Driven Architecture Solace
 
Event-Driven Architecture (EDA)
Event-Driven Architecture (EDA)Event-Driven Architecture (EDA)
Event-Driven Architecture (EDA)WSO2
 
Introduction to microservices
Introduction to microservicesIntroduction to microservices
Introduction to microservicesAnil Allewar
 
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SREMicroservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SREAraf Karsh Hamid
 
Microservices, Containers, Kubernetes, Kafka, Kanban
Microservices, Containers, Kubernetes, Kafka, KanbanMicroservices, Containers, Kubernetes, Kafka, Kanban
Microservices, Containers, Kubernetes, Kafka, KanbanAraf Karsh Hamid
 
Microservices Architecture - Bangkok 2018
Microservices Architecture - Bangkok 2018Microservices Architecture - Bangkok 2018
Microservices Architecture - Bangkok 2018Araf Karsh Hamid
 
Funny stories and anti-patterns from DevOps landscape
Funny stories and anti-patterns from DevOps landscapeFunny stories and anti-patterns from DevOps landscape
Funny stories and anti-patterns from DevOps landscapeMikalai Alimenkou
 
APIs in a Microservice Architecture
APIs in a Microservice ArchitectureAPIs in a Microservice Architecture
APIs in a Microservice ArchitectureWSO2
 
Practical Microservice Architecture (edition 2022).pdf
Practical Microservice Architecture (edition 2022).pdfPractical Microservice Architecture (edition 2022).pdf
Practical Microservice Architecture (edition 2022).pdfAhmed Misbah
 

What's hot (20)

MicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleMicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scale
 
From Monolithic to Microservices
From Monolithic to Microservices From Monolithic to Microservices
From Monolithic to Microservices
 
Deploy 22 microservices from scratch in 30 mins with GitOps
Deploy 22 microservices from scratch in 30 mins with GitOpsDeploy 22 microservices from scratch in 30 mins with GitOps
Deploy 22 microservices from scratch in 30 mins with GitOps
 
Microservices Architecture
Microservices ArchitectureMicroservices Architecture
Microservices Architecture
 
Introduction to microservices
Introduction to microservicesIntroduction to microservices
Introduction to microservices
 
MicroService Architecture
MicroService ArchitectureMicroService Architecture
MicroService Architecture
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
Introduction to Event-Driven Architecture
Introduction to Event-Driven Architecture Introduction to Event-Driven Architecture
Introduction to Event-Driven Architecture
 
Event-Driven Architecture (EDA)
Event-Driven Architecture (EDA)Event-Driven Architecture (EDA)
Event-Driven Architecture (EDA)
 
Introduction to microservices
Introduction to microservicesIntroduction to microservices
Introduction to microservices
 
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SREMicroservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SRE
 
Why Microservice
Why Microservice Why Microservice
Why Microservice
 
Microservices, Containers, Kubernetes, Kafka, Kanban
Microservices, Containers, Kubernetes, Kafka, KanbanMicroservices, Containers, Kubernetes, Kafka, Kanban
Microservices, Containers, Kubernetes, Kafka, Kanban
 
Deep Dive - CI/CD on AWS
Deep Dive - CI/CD on AWSDeep Dive - CI/CD on AWS
Deep Dive - CI/CD on AWS
 
Microservices Architecture - Bangkok 2018
Microservices Architecture - Bangkok 2018Microservices Architecture - Bangkok 2018
Microservices Architecture - Bangkok 2018
 
Funny stories and anti-patterns from DevOps landscape
Funny stories and anti-patterns from DevOps landscapeFunny stories and anti-patterns from DevOps landscape
Funny stories and anti-patterns from DevOps landscape
 
Why to Cloud Native
Why to Cloud NativeWhy to Cloud Native
Why to Cloud Native
 
APIs in a Microservice Architecture
APIs in a Microservice ArchitectureAPIs in a Microservice Architecture
APIs in a Microservice Architecture
 
Practical Microservice Architecture (edition 2022).pdf
Practical Microservice Architecture (edition 2022).pdfPractical Microservice Architecture (edition 2022).pdf
Practical Microservice Architecture (edition 2022).pdf
 
Architecture: Microservices
Architecture: MicroservicesArchitecture: Microservices
Architecture: Microservices
 

Similar to Mastering Chaos - A Netflix Guide to Microservices

CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...Adrian Cockcroft
 
C# Client to Cloud
C# Client to CloudC# Client to Cloud
C# Client to CloudStuart Lodge
 
Pros and Cons of a MicroServices Architecture talk at AWS ReInvent
Pros and Cons of a MicroServices Architecture talk at AWS ReInventPros and Cons of a MicroServices Architecture talk at AWS ReInvent
Pros and Cons of a MicroServices Architecture talk at AWS ReInventSudhir Tonse
 
Service Provider Architectures for Tomorrow by Chow Khay Kid
Service Provider Architectures for Tomorrow by Chow Khay KidService Provider Architectures for Tomorrow by Chow Khay Kid
Service Provider Architectures for Tomorrow by Chow Khay KidMyNOG
 
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)Netflix: From Zero to Production-Ready in Minutes (QCon 2017)
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)Tim Bozarth
 
Upgrading_your_microservices_to_next_level_v1.0.pdf
Upgrading_your_microservices_to_next_level_v1.0.pdfUpgrading_your_microservices_to_next_level_v1.0.pdf
Upgrading_your_microservices_to_next_level_v1.0.pdfVladimirRadzivil
 
Intro to Project Calico: a pure layer 3 approach to scale-out networking
Intro to Project Calico: a pure layer 3 approach to scale-out networkingIntro to Project Calico: a pure layer 3 approach to scale-out networking
Intro to Project Calico: a pure layer 3 approach to scale-out networkingPacket
 
Big datadc skyfall_preso_v2
Big datadc skyfall_preso_v2Big datadc skyfall_preso_v2
Big datadc skyfall_preso_v2abramsm
 
Cisco Connect 2018 Thailand - Enabling the next gen data center transformatio...
Cisco Connect 2018 Thailand - Enabling the next gen data center transformatio...Cisco Connect 2018 Thailand - Enabling the next gen data center transformatio...
Cisco Connect 2018 Thailand - Enabling the next gen data center transformatio...NetworkCollaborators
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basFlorent Ramiere
 
(PFC304) Effective Interprocess Communications in the Cloud: The Pros and Con...
(PFC304) Effective Interprocess Communications in the Cloud: The Pros and Con...(PFC304) Effective Interprocess Communications in the Cloud: The Pros and Con...
(PFC304) Effective Interprocess Communications in the Cloud: The Pros and Con...Amazon Web Services
 
Enterprise-Ready Private and Hybrid Cloud Computing Today
Enterprise-Ready Private and Hybrid Cloud Computing TodayEnterprise-Ready Private and Hybrid Cloud Computing Today
Enterprise-Ready Private and Hybrid Cloud Computing TodayRightScale
 
Microservices Architecture, Monolith Migration Patterns
Microservices Architecture, Monolith Migration PatternsMicroservices Architecture, Monolith Migration Patterns
Microservices Architecture, Monolith Migration PatternsAraf Karsh Hamid
 
Netflix Edge Engineering Open House Presentations - June 9, 2016
Netflix Edge Engineering Open House Presentations - June 9, 2016Netflix Edge Engineering Open House Presentations - June 9, 2016
Netflix Edge Engineering Open House Presentations - June 9, 2016Daniel Jacobson
 
Building a scalable microservice architecture with envoy, kubernetes and istio
Building a scalable microservice architecture with envoy, kubernetes and istioBuilding a scalable microservice architecture with envoy, kubernetes and istio
Building a scalable microservice architecture with envoy, kubernetes and istioSAMIR BEHARA
 
Meetup Microservices Commandments
Meetup Microservices CommandmentsMeetup Microservices Commandments
Meetup Microservices CommandmentsBill Zajac
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Sourceaspyker
 

Similar to Mastering Chaos - A Netflix Guide to Microservices (20)

Active network
Active networkActive network
Active network
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
C# Client to Cloud
C# Client to CloudC# Client to Cloud
C# Client to Cloud
 
Pros and Cons of a MicroServices Architecture talk at AWS ReInvent
Pros and Cons of a MicroServices Architecture talk at AWS ReInventPros and Cons of a MicroServices Architecture talk at AWS ReInvent
Pros and Cons of a MicroServices Architecture talk at AWS ReInvent
 
Service Provider Architectures for Tomorrow by Chow Khay Kid
Service Provider Architectures for Tomorrow by Chow Khay KidService Provider Architectures for Tomorrow by Chow Khay Kid
Service Provider Architectures for Tomorrow by Chow Khay Kid
 
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)Netflix: From Zero to Production-Ready in Minutes (QCon 2017)
Netflix: From Zero to Production-Ready in Minutes (QCon 2017)
 
Upgrading_your_microservices_to_next_level_v1.0.pdf
Upgrading_your_microservices_to_next_level_v1.0.pdfUpgrading_your_microservices_to_next_level_v1.0.pdf
Upgrading_your_microservices_to_next_level_v1.0.pdf
 
Intro to Project Calico: a pure layer 3 approach to scale-out networking
Intro to Project Calico: a pure layer 3 approach to scale-out networkingIntro to Project Calico: a pure layer 3 approach to scale-out networking
Intro to Project Calico: a pure layer 3 approach to scale-out networking
 
Big datadc skyfall_preso_v2
Big datadc skyfall_preso_v2Big datadc skyfall_preso_v2
Big datadc skyfall_preso_v2
 
Cisco Connect 2018 Thailand - Enabling the next gen data center transformatio...
Cisco Connect 2018 Thailand - Enabling the next gen data center transformatio...Cisco Connect 2018 Thailand - Enabling the next gen data center transformatio...
Cisco Connect 2018 Thailand - Enabling the next gen data center transformatio...
 
Mini-Track: Lessons from Public Cloud
Mini-Track: Lessons from Public CloudMini-Track: Lessons from Public Cloud
Mini-Track: Lessons from Public Cloud
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
 
(PFC304) Effective Interprocess Communications in the Cloud: The Pros and Con...
(PFC304) Effective Interprocess Communications in the Cloud: The Pros and Con...(PFC304) Effective Interprocess Communications in the Cloud: The Pros and Con...
(PFC304) Effective Interprocess Communications in the Cloud: The Pros and Con...
 
Enterprise-Ready Private and Hybrid Cloud Computing Today
Enterprise-Ready Private and Hybrid Cloud Computing TodayEnterprise-Ready Private and Hybrid Cloud Computing Today
Enterprise-Ready Private and Hybrid Cloud Computing Today
 
Introduction To Cloud Computing
Introduction To Cloud ComputingIntroduction To Cloud Computing
Introduction To Cloud Computing
 
Microservices Architecture, Monolith Migration Patterns
Microservices Architecture, Monolith Migration PatternsMicroservices Architecture, Monolith Migration Patterns
Microservices Architecture, Monolith Migration Patterns
 
Netflix Edge Engineering Open House Presentations - June 9, 2016
Netflix Edge Engineering Open House Presentations - June 9, 2016Netflix Edge Engineering Open House Presentations - June 9, 2016
Netflix Edge Engineering Open House Presentations - June 9, 2016
 
Building a scalable microservice architecture with envoy, kubernetes and istio
Building a scalable microservice architecture with envoy, kubernetes and istioBuilding a scalable microservice architecture with envoy, kubernetes and istio
Building a scalable microservice architecture with envoy, kubernetes and istio
 
Meetup Microservices Commandments
Meetup Microservices CommandmentsMeetup Microservices Commandments
Meetup Microservices Commandments
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
 

More from Josh Evans

Vision and Strategy - Epiphanies of a Netflix leader
Vision and Strategy - Epiphanies of a Netflix leaderVision and Strategy - Epiphanies of a Netflix leader
Vision and Strategy - Epiphanies of a Netflix leaderJosh Evans
 
Refactoring Organizations - A Netflix Study (QCon NYC 2017)
Refactoring Organizations - A Netflix Study (QCon NYC 2017)Refactoring Organizations - A Netflix Study (QCon NYC 2017)
Refactoring Organizations - A Netflix Study (QCon NYC 2017)Josh Evans
 
#NetflixEverywhere Global Architecture
#NetflixEverywhere Global Architecture#NetflixEverywhere Global Architecture
#NetflixEverywhere Global ArchitectureJosh Evans
 
Beyond DevOps - How Netflix Bridges the Gap
Beyond DevOps - How Netflix Bridges the GapBeyond DevOps - How Netflix Bridges the Gap
Beyond DevOps - How Netflix Bridges the GapJosh Evans
 
Engineering Netflix Global Operations in the Cloud
Engineering Netflix Global Operations in the CloudEngineering Netflix Global Operations in the Cloud
Engineering Netflix Global Operations in the CloudJosh Evans
 
Embracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at NetflixEmbracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at NetflixJosh Evans
 

More from Josh Evans (6)

Vision and Strategy - Epiphanies of a Netflix leader
Vision and Strategy - Epiphanies of a Netflix leaderVision and Strategy - Epiphanies of a Netflix leader
Vision and Strategy - Epiphanies of a Netflix leader
 
Refactoring Organizations - A Netflix Study (QCon NYC 2017)
Refactoring Organizations - A Netflix Study (QCon NYC 2017)Refactoring Organizations - A Netflix Study (QCon NYC 2017)
Refactoring Organizations - A Netflix Study (QCon NYC 2017)
 
#NetflixEverywhere Global Architecture
#NetflixEverywhere Global Architecture#NetflixEverywhere Global Architecture
#NetflixEverywhere Global Architecture
 
Beyond DevOps - How Netflix Bridges the Gap
Beyond DevOps - How Netflix Bridges the GapBeyond DevOps - How Netflix Bridges the Gap
Beyond DevOps - How Netflix Bridges the Gap
 
Engineering Netflix Global Operations in the Cloud
Engineering Netflix Global Operations in the CloudEngineering Netflix Global Operations in the Cloud
Engineering Netflix Global Operations in the Cloud
 
Embracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at NetflixEmbracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at Netflix
 

Recently uploaded

IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119APNIC
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxAndrieCagasanAkio
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxNIMMANAGANTI RAMAKRISHNA
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxMario
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxmibuzondetrabajo
 
Cybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesCybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesLumiverse Solutions Pvt Ltd
 

Recently uploaded (9)

IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119IP addressing and IPv6, presented by Paul Wilson at IETF 119
IP addressing and IPv6, presented by Paul Wilson at IETF 119
 
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书
 
TRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptxTRENDS Enabling and inhibiting dimensions.pptx
TRENDS Enabling and inhibiting dimensions.pptx
 
ETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptxETHICAL HACKING dddddddddddddddfnandni.pptx
ETHICAL HACKING dddddddddddddddfnandni.pptx
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Company Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptxCompany Snapshot Theme for Business by Slidesgo.pptx
Company Snapshot Theme for Business by Slidesgo.pptx
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
Unidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptxUnidad 4 – Redes de ordenadores (en inglés).pptx
Unidad 4 – Redes de ordenadores (en inglés).pptx
 
Cybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best PracticesCybersecurity Threats and Cybersecurity Best Practices
Cybersecurity Threats and Cybersecurity Best Practices
 

Mastering Chaos - A Netflix Guide to Microservices

Editor's Notes

  1. Is anyone familiar with this condition?
  2. Even the simple act of breathing is a complex act requiring many systems to cooperate and posing the potential to inhale dangerous gases or pathogens. Pause – so you’re probably wondering why I’m talking about biology and disease in a talk about microservices?
  3. And just as we human beings thrive in a world filled with threats so can your microservice architecture And just as my step mother Barbara’s own body attacked itself in response to some unknown pathogen our own services can do the same thing. Poorly tuned timouts, retries, and fallbacks can reek havoc and take your entire customer-facing service down There are big challenges but every challenge has a solution But, just as for all of us, it requires discipline to stay fit. You must embrace the chaos and that it is impossible for any one individual to fully understand the whole distributed system. This is why we’re here today – to talk about the Netflix microservice journey. How we walk the razors edge between discipline ad chaos. And how you can benefit from the lessons we’ve learned over the last 7 years.
  4. Even the simple act of breathing is a complex act requiring many systems to cooperate and posing the potential to inhale dangerous gases or pathogens.
  5. Read from cache On cache miss call service Service calls DB & responds Service updates cache
  6. External trigger, internal response
  7. As soon as you go out of process and/or off box – you have a distributed system Combinatorial math on nines of availability Adrian Cockcroft suggested Netflix in a box as a thought experiment early on – to address connectivity concerns
  8. * If you do not defend against failure at each level then you have what is essentially a distributed monolith – if any microservice fails then they all fail * Calls start failing, retries make it worse, thread pools become saturated, lack of isolation leads to full cascading failure
  9. This nasty looking creature comes right out of your favorite horror movie The good news is that it’s a very tiny creature – not something that would destroy Tokyo The bad news is that it’s a vampire – a hookworm that attaches itself to the wall of the intestine, puncturing blood vessels and feeding on blood. This can lead to severe anemia, effecting the health of the whole organism And – just like the hookworm, client libraries can consume resources of your microservice application
  10. cache, service, backfill Request level caching
  11. Client writes to any node Coordinator replicates to nodes Nodes ack to coordinator Coordinator acks to client Write to commit log Hinted handoff to offline nodes
  12. On Christmas eve, 2012 Netflix experiences a region-wide outage due an accidental ELB configuration change Many engineers were on call, missing time with their families They spent much of the night and into the morning trying to mitigate the impact of the outage on our customers but to no avail We ultimately had to wait for Amazon to address the root cause And our members, many of them new to Netflix were unable to stream Their responses varied in intensity from…
  13. Early on we had two competing approaches to caching. The Subscriber service team leaned on Squid caches, applying a dedicated shard model This model proved problematic– involving long outages for members when a shard went down. In addition – lack of proper thread pool isolation meant that the entire Netflix service might be come unavailable when one shard became unavailable I was on a conference call several years ago where it took four hours to recover from such an outage
  14. Even the simple act of breathing is a complex act requiring many systems to cooperate and posing the potential to inhale dangerous gases or pathogens. Pause – so you’re probably wondering why I’m talking about biology and disease in a talk about microservices?
  15. Now let’s look at scale from the perspective of a complex microservice architecture One in which there is a caching tier fronting the microservice tier
  16. In this case the subscriber team heavily relied on the caching tier Taking traffic north of 800k rps
  17. In this case the subscriber team heavily relied on the caching tier Taking traffic north of 800k rps
  18. There are several solutions that address this anti-pattern…
  19. Story – bricked test environment for 6 hours – global configuration change Staging necessary for deployments & configuration changes
  20. Architecture first, organizational structure second Blameless incident reviews Commitment to continuous improvement
  21. Different end points, protocols, security made life difficult for client teams Especially when we wanted to integrated UI and playback functionality
  22. Architecture first, organizational structure second Blameless incident reviews Commitment to continuous improvement
  23. Even the simple act of breathing is a complex act requiring many systems to cooperate and posing the potential to inhale dangerous gases or pathogens. Pause – so you’re probably wondering why I’m talking about biology and disease in a talk about microservices?