SlideShare a Scribd company logo
1 of 36
Download to read offline
Bảo mật Dành cho Tên công ty Phiên bản 1.0
Latency Control & Supervision in
Resilience Design Patterns
Tu Pham - CTO @ Eway
Bảo mật Dành cho Tên công ty Phiên bản 1.0
Terminology
Why It So
IMPORTANT
Why It So HARD
Design Patterns
Anti Patterns
Q & A
TOC
Terminology
Distributed Systems
These are networked components which communicate with each other
by passing messages most often to achieve a common goal.
Resiliency
The capacity of any system to recover from difficulties.
Availability
Probability that any system is operating at time `t`.
Reliability
Degree to which a system / component performs specified functions
under specified conditions for a specified period of time
Faults
Fault is an incorrect internal state in your
system. Examples:
1. Slowing down of storage layer
2. Memory leaks in application
3. Blocked threads
4. Dependency failures
5. Bad data propagating in the system (Most
often because there’s not enough validations
on input data)
Terminology
Failure
Failure is an inability of the system to perform
its intended job. Examples:
Failure means loss of Up-Time and availability
on systems. Faults if not contained from
propagating, can lead to failures.
Why It So IMPORTANT
1
Losing customers and partners to
competitors => Financial losses for the
company
2
Affecting livelihood of publishers and
advertisers
3
Affecting salary and bonus of OUR TEAM
:))
4
Affecting services for customers and
colleges
But building resiliency in a complex
micro-services architecture with
multiple distributed systems
communicating with each other is
difficult.
Why It So HARD
Some of the things which make it
hard are:
1. The network is unreliable
2. Dependencies can always fail
3. User behavior is unpredictable
Why It So HARD
Patterns
Latency
Control
● Complements isolation
● Detection and handling of non-timely
responses
● Avoid cascading temporal failures
● Different approaches and patterns available
0
20
40
60
80
Timeout
● Preserve responsiveness
independent of downstream latency
● Measure response time of
downstream calls
● Stop waiting after a pre-determined
timeout
● Take alternate action if timeout was
reached
Fail Fast
● “If you know you’re going to fail, you
better fail fast”
● Avoid foreseeable failures
● Usually implemented by adding
checks in front of costly actions
● Enhances probability of not failing
Circuit Breaker
● Probably most often cited resilience
pattern
● Extension of the timeout pattern
● Takes downstream unit offline if
calls fail multiple times
● Specific variant of the fail fast
pattern
Fan out & quickest
reply
● Send request to multiple workers
● Use quickest reply and discard all
other responses
● Reduces probability of latent
responses
● Tradeoff is WASTE of resources
Bounded Queues
● Limit request queue sizes in front of
highly utilized resources
● Avoids latency due to overloaded
resources
● Introduces pushback on the callers
● Another variant of the fail fast
pattern
Supervision
● Provides failure handling beyond the means of
a single failure unit
● Detect unit failures
● Provide means for error escalation
● Different approaches and patterns available
Shed Load
● Upstream isolation pattern
● Avoid becoming overloaded due to
too many requests
● Install a gatekeeper in front of the
resource
● Shed requests based on resource
load
Monitor
● Observe unit behavior and
interactions from the outside
● Automatically respond to detected
failures
● Part of the system – complex failure
handling strategies possible
● Outside the system – more robust
against system level failures
Error Handler
● Units often don’t have enough time
or information to handle errors
● Separate business logic and error
handling
● Business logic just focuses on
getting the task done (quickly)
● Error handler has sufficient time
and information to handle errors
Escalation
● Units often don’t have enough time
or information to handle errors
● Escalation peer with more time and
information needed
● Often multi-level hierarchies
● Pure design issue
Other
Patterns
Fallback
● Units often don’t have enough time
or information to handle errors
● Instead of aborting the computation
because of a missing response, we
fill in a fallback value.
● Of course, it can be DANGEROUS !!!
Retry
● Units have enough time or
information to handle errors
● Just send the requests again and
again til it reach the BOUNDARY of
policy
Escalation
● Units often don’t have enough time
or information to handle errors
● Escalation peer with more time and
information needed
● Often multi-level hierarchies
● Pure design issue
Just Don’t
● Infinity delay
● One config / policy for all situations
● Fallback logics without confirmation from
business departments / upper managers
● Laggy / buggy monitoring system
References
● https://github.com/Netflix/Hystrix
● https://github.com/alibaba/Sentinel
● https://github.com/resilience4j/resilience4j
● https://github.com/jhalterman/failsafe
“Just Design Our Systems For Failure”
Q&A

More Related Content

What's hot

Cloud university intel security
Cloud university intel securityCloud university intel security
Cloud university intel securityIngram Micro Cloud
 
RightScale Webinar: Security Monitoring in the Cloud: How RightScale Does It
RightScale Webinar: Security Monitoring in the Cloud: How RightScale Does ItRightScale Webinar: Security Monitoring in the Cloud: How RightScale Does It
RightScale Webinar: Security Monitoring in the Cloud: How RightScale Does ItRightScale
 
Securing Applications in the Cloud
Securing Applications in the CloudSecuring Applications in the Cloud
Securing Applications in the CloudSecurity Innovation
 
The Top Cloud Security Issues
The Top Cloud Security IssuesThe Top Cloud Security Issues
The Top Cloud Security IssuesHTS Hosting
 
Rethinking Security: The Cloud Infrastructure Effect
Rethinking Security: The Cloud Infrastructure EffectRethinking Security: The Cloud Infrastructure Effect
Rethinking Security: The Cloud Infrastructure EffectCloudPassage
 
Security for cloud native workloads
Security for cloud native workloadsSecurity for cloud native workloads
Security for cloud native workloadsRuncy Oommen
 
Assessing System Risk the Smart Way
Assessing System Risk the Smart WayAssessing System Risk the Smart Way
Assessing System Risk the Smart WaySecurity Innovation
 
Cloud Security Demystified
Cloud Security DemystifiedCloud Security Demystified
Cloud Security DemystifiedMichael Torres
 
Cloud security privacy- org
Cloud security  privacy- orgCloud security  privacy- org
Cloud security privacy- orgDharmalingam S
 
Managed Threat Detection & Response for AWS Applications
Managed Threat Detection & Response for AWS ApplicationsManaged Threat Detection & Response for AWS Applications
Managed Threat Detection & Response for AWS ApplicationsAlert Logic
 
Cloud Security Engineering - Tools and Techniques
Cloud Security Engineering - Tools and TechniquesCloud Security Engineering - Tools and Techniques
Cloud Security Engineering - Tools and TechniquesGokul Alex
 
Css sf azure_8-9-17 - 5_ways to_optimize_your_azure_infrastructure_thayer gla...
Css sf azure_8-9-17 - 5_ways to_optimize_your_azure_infrastructure_thayer gla...Css sf azure_8-9-17 - 5_ways to_optimize_your_azure_infrastructure_thayer gla...
Css sf azure_8-9-17 - 5_ways to_optimize_your_azure_infrastructure_thayer gla...Alert Logic
 
Managing Cloud Security Risks in Your Organization
Managing Cloud Security Risks in Your OrganizationManaging Cloud Security Risks in Your Organization
Managing Cloud Security Risks in Your OrganizationCharles Lim
 
#ALSummit: Realities of Security in the Cloud
#ALSummit: Realities of Security in the Cloud#ALSummit: Realities of Security in the Cloud
#ALSummit: Realities of Security in the CloudAlert Logic
 
CSS17: Houston - Azure Shared Security Model Overview
CSS17: Houston - Azure Shared Security Model OverviewCSS17: Houston - Azure Shared Security Model Overview
CSS17: Houston - Azure Shared Security Model OverviewAlert Logic
 
Venom vulnerability Overview and a basic demo
Venom vulnerability Overview and a basic demoVenom vulnerability Overview and a basic demo
Venom vulnerability Overview and a basic demoAkash Mahajan
 
Cloud Security - Kloudlearn
Cloud Security - KloudlearnCloud Security - Kloudlearn
Cloud Security - KloudlearnKloudLearn
 

What's hot (20)

Cloud university intel security
Cloud university intel securityCloud university intel security
Cloud university intel security
 
Security As A Service In Cloud(SECaaS)
Security As A Service In Cloud(SECaaS)Security As A Service In Cloud(SECaaS)
Security As A Service In Cloud(SECaaS)
 
RightScale Webinar: Security Monitoring in the Cloud: How RightScale Does It
RightScale Webinar: Security Monitoring in the Cloud: How RightScale Does ItRightScale Webinar: Security Monitoring in the Cloud: How RightScale Does It
RightScale Webinar: Security Monitoring in the Cloud: How RightScale Does It
 
Securing Applications in the Cloud
Securing Applications in the CloudSecuring Applications in the Cloud
Securing Applications in the Cloud
 
The Top Cloud Security Issues
The Top Cloud Security IssuesThe Top Cloud Security Issues
The Top Cloud Security Issues
 
Rethinking Security: The Cloud Infrastructure Effect
Rethinking Security: The Cloud Infrastructure EffectRethinking Security: The Cloud Infrastructure Effect
Rethinking Security: The Cloud Infrastructure Effect
 
cloud security ppt
cloud security ppt cloud security ppt
cloud security ppt
 
Security for cloud native workloads
Security for cloud native workloadsSecurity for cloud native workloads
Security for cloud native workloads
 
Assessing System Risk the Smart Way
Assessing System Risk the Smart WayAssessing System Risk the Smart Way
Assessing System Risk the Smart Way
 
Cloud security
Cloud securityCloud security
Cloud security
 
Cloud Security Demystified
Cloud Security DemystifiedCloud Security Demystified
Cloud Security Demystified
 
Cloud security privacy- org
Cloud security  privacy- orgCloud security  privacy- org
Cloud security privacy- org
 
Managed Threat Detection & Response for AWS Applications
Managed Threat Detection & Response for AWS ApplicationsManaged Threat Detection & Response for AWS Applications
Managed Threat Detection & Response for AWS Applications
 
Cloud Security Engineering - Tools and Techniques
Cloud Security Engineering - Tools and TechniquesCloud Security Engineering - Tools and Techniques
Cloud Security Engineering - Tools and Techniques
 
Css sf azure_8-9-17 - 5_ways to_optimize_your_azure_infrastructure_thayer gla...
Css sf azure_8-9-17 - 5_ways to_optimize_your_azure_infrastructure_thayer gla...Css sf azure_8-9-17 - 5_ways to_optimize_your_azure_infrastructure_thayer gla...
Css sf azure_8-9-17 - 5_ways to_optimize_your_azure_infrastructure_thayer gla...
 
Managing Cloud Security Risks in Your Organization
Managing Cloud Security Risks in Your OrganizationManaging Cloud Security Risks in Your Organization
Managing Cloud Security Risks in Your Organization
 
#ALSummit: Realities of Security in the Cloud
#ALSummit: Realities of Security in the Cloud#ALSummit: Realities of Security in the Cloud
#ALSummit: Realities of Security in the Cloud
 
CSS17: Houston - Azure Shared Security Model Overview
CSS17: Houston - Azure Shared Security Model OverviewCSS17: Houston - Azure Shared Security Model Overview
CSS17: Houston - Azure Shared Security Model Overview
 
Venom vulnerability Overview and a basic demo
Venom vulnerability Overview and a basic demoVenom vulnerability Overview and a basic demo
Venom vulnerability Overview and a basic demo
 
Cloud Security - Kloudlearn
Cloud Security - KloudlearnCloud Security - Kloudlearn
Cloud Security - Kloudlearn
 

Similar to Bảo mật Dành cho Tên công ty Phiên bản 1.0 - Resilience Design Patterns

Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systemssumitjain2013
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready AppsVMware Tanzu
 
Application and Website Security -- Designer Edition: Using Formal Specificat...
Application and Website Security -- Designer Edition:Using Formal Specificat...Application and Website Security -- Designer Edition:Using Formal Specificat...
Application and Website Security -- Designer Edition: Using Formal Specificat...Daniel Owens
 
Goal Driven Performance Optimization, Peter Zaitsev
Goal Driven Performance Optimization, Peter ZaitsevGoal Driven Performance Optimization, Peter Zaitsev
Goal Driven Performance Optimization, Peter ZaitsevFuenteovejuna
 
Survey Presentation About Application Security
Survey Presentation About Application SecuritySurvey Presentation About Application Security
Survey Presentation About Application SecurityNicholas Davis
 
Why Software Test Performance Matters
Why Software Test Performance MattersWhy Software Test Performance Matters
Why Software Test Performance MattersSolano Labs
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance SystemEhsan Ilahi
 
Performance engineering methodologies
Performance engineering  methodologiesPerformance engineering  methodologies
Performance engineering methodologiesManeesh Chaturvedi
 
Fault tolerance review by tsegabrehan zerihun
Fault tolerance review by tsegabrehan zerihunFault tolerance review by tsegabrehan zerihun
Fault tolerance review by tsegabrehan zerihunTsegabrehan Am
 
Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Peter Tröger
 
High Reliabilty Systems
High Reliabilty SystemsHigh Reliabilty Systems
High Reliabilty SystemsLloydMoore
 
Goal driven performance optimization (Пётр Зайцев)
Goal driven performance optimization (Пётр Зайцев)Goal driven performance optimization (Пётр Зайцев)
Goal driven performance optimization (Пётр Зайцев)Ontico
 
Parallel and Distributed Computing Chapter 12
Parallel and Distributed Computing Chapter 12Parallel and Distributed Computing Chapter 12
Parallel and Distributed Computing Chapter 12AbdullahMunir32
 

Similar to Bảo mật Dành cho Tên công ty Phiên bản 1.0 - Resilience Design Patterns (20)

Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systems
 
Distributed DBMS - Unit 9 - Distributed Deadlock & Recovery
Distributed DBMS - Unit 9 - Distributed Deadlock & RecoveryDistributed DBMS - Unit 9 - Distributed Deadlock & Recovery
Distributed DBMS - Unit 9 - Distributed Deadlock & Recovery
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Application and Website Security -- Designer Edition: Using Formal Specificat...
Application and Website Security -- Designer Edition:Using Formal Specificat...Application and Website Security -- Designer Edition:Using Formal Specificat...
Application and Website Security -- Designer Edition: Using Formal Specificat...
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Fault tolerance techniques
Fault tolerance techniquesFault tolerance techniques
Fault tolerance techniques
 
Goal Driven Performance Optimization, Peter Zaitsev
Goal Driven Performance Optimization, Peter ZaitsevGoal Driven Performance Optimization, Peter Zaitsev
Goal Driven Performance Optimization, Peter Zaitsev
 
Survey Presentation About Application Security
Survey Presentation About Application SecuritySurvey Presentation About Application Security
Survey Presentation About Application Security
 
Why Software Test Performance Matters
Why Software Test Performance MattersWhy Software Test Performance Matters
Why Software Test Performance Matters
 
Fault Tolerance System
Fault Tolerance SystemFault Tolerance System
Fault Tolerance System
 
Performance engineering methodologies
Performance engineering  methodologiesPerformance engineering  methodologies
Performance engineering methodologies
 
Fault tolerance review by tsegabrehan zerihun
Fault tolerance review by tsegabrehan zerihunFault tolerance review by tsegabrehan zerihun
Fault tolerance review by tsegabrehan zerihun
 
Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)Dependable Systems -Fault Tolerance Patterns (4/16)
Dependable Systems -Fault Tolerance Patterns (4/16)
 
High Reliabilty Systems
High Reliabilty SystemsHigh Reliabilty Systems
High Reliabilty Systems
 
Software Performance
Software Performance Software Performance
Software Performance
 
Door to perfomance testing
Door to perfomance testingDoor to perfomance testing
Door to perfomance testing
 
Ch20
Ch20Ch20
Ch20
 
Working Effectively with PeopleSoft Support
Working Effectively with PeopleSoft SupportWorking Effectively with PeopleSoft Support
Working Effectively with PeopleSoft Support
 
Goal driven performance optimization (Пётр Зайцев)
Goal driven performance optimization (Пётр Зайцев)Goal driven performance optimization (Пётр Зайцев)
Goal driven performance optimization (Пётр Зайцев)
 
Parallel and Distributed Computing Chapter 12
Parallel and Distributed Computing Chapter 12Parallel and Distributed Computing Chapter 12
Parallel and Distributed Computing Chapter 12
 

More from Tu Pham

Go from idea to app with no coding using AppSheet.pptx
Go from idea to app with no coding using AppSheet.pptxGo from idea to app with no coding using AppSheet.pptx
Go from idea to app with no coding using AppSheet.pptxTu Pham
 
Secure your app against DDOS, API Abuse, Hijacking, and Fraud
 Secure your app against DDOS, API Abuse, Hijacking, and Fraud Secure your app against DDOS, API Abuse, Hijacking, and Fraud
Secure your app against DDOS, API Abuse, Hijacking, and FraudTu Pham
 
Challenges In Implementing SRE
Challenges In Implementing SREChallenges In Implementing SRE
Challenges In Implementing SRETu Pham
 
IT Strategy
IT Strategy IT Strategy
IT Strategy Tu Pham
 
Set up Learn and Development program
Set up Learn and Development programSet up Learn and Development program
Set up Learn and Development programTu Pham
 
Cost Management For IT Project / Product
Cost Management For IT Project / ProductCost Management For IT Project / Product
Cost Management For IT Project / ProductTu Pham
 
Minimum Viable Product 101
Minimum Viable Product 101Minimum Viable Product 101
Minimum Viable Product 101Tu Pham
 
Understand your customers
Understand your customersUnderstand your customers
Understand your customersTu Pham
 
Let's build great products for mid-size companies
Let's build great products for mid-size companiesLet's build great products for mid-size companies
Let's build great products for mid-size companiesTu Pham
 
End To End Business Intelligence On Google Cloud
End To End Business Intelligence On Google CloudEnd To End Business Intelligence On Google Cloud
End To End Business Intelligence On Google CloudTu Pham
 
High Output Tech Management
High Output Tech Management High Output Tech Management
High Output Tech Management Tu Pham
 
Big Data Driven At Eway
Big Data Driven At Eway Big Data Driven At Eway
Big Data Driven At Eway Tu Pham
 
Security On The Cloud
Security On The CloudSecurity On The Cloud
Security On The CloudTu Pham
 
Eway Tech Talk #2 Coding Guidelines
Eway Tech Talk #2 Coding GuidelinesEway Tech Talk #2 Coding Guidelines
Eway Tech Talk #2 Coding GuidelinesTu Pham
 
End To End Machine Learning With Google Cloud
End To End Machine Learning With Google Cloud End To End Machine Learning With Google Cloud
End To End Machine Learning With Google Cloud Tu Pham
 
Eway Tech Talk #0 Knowledge Sharing
Eway Tech Talk #0 Knowledge SharingEway Tech Talk #0 Knowledge Sharing
Eway Tech Talk #0 Knowledge SharingTu Pham
 
Php 5.6 vs Php 7 performance comparison
Php 5.6 vs Php 7 performance comparisonPhp 5.6 vs Php 7 performance comparison
Php 5.6 vs Php 7 performance comparisonTu Pham
 
System Security on Cloud
System Security on CloudSystem Security on Cloud
System Security on CloudTu Pham
 
Big Data at DYNO
Big Data at DYNOBig Data at DYNO
Big Data at DYNOTu Pham
 
Big data on google cloud
Big data on google cloudBig data on google cloud
Big data on google cloudTu Pham
 

More from Tu Pham (20)

Go from idea to app with no coding using AppSheet.pptx
Go from idea to app with no coding using AppSheet.pptxGo from idea to app with no coding using AppSheet.pptx
Go from idea to app with no coding using AppSheet.pptx
 
Secure your app against DDOS, API Abuse, Hijacking, and Fraud
 Secure your app against DDOS, API Abuse, Hijacking, and Fraud Secure your app against DDOS, API Abuse, Hijacking, and Fraud
Secure your app against DDOS, API Abuse, Hijacking, and Fraud
 
Challenges In Implementing SRE
Challenges In Implementing SREChallenges In Implementing SRE
Challenges In Implementing SRE
 
IT Strategy
IT Strategy IT Strategy
IT Strategy
 
Set up Learn and Development program
Set up Learn and Development programSet up Learn and Development program
Set up Learn and Development program
 
Cost Management For IT Project / Product
Cost Management For IT Project / ProductCost Management For IT Project / Product
Cost Management For IT Project / Product
 
Minimum Viable Product 101
Minimum Viable Product 101Minimum Viable Product 101
Minimum Viable Product 101
 
Understand your customers
Understand your customersUnderstand your customers
Understand your customers
 
Let's build great products for mid-size companies
Let's build great products for mid-size companiesLet's build great products for mid-size companies
Let's build great products for mid-size companies
 
End To End Business Intelligence On Google Cloud
End To End Business Intelligence On Google CloudEnd To End Business Intelligence On Google Cloud
End To End Business Intelligence On Google Cloud
 
High Output Tech Management
High Output Tech Management High Output Tech Management
High Output Tech Management
 
Big Data Driven At Eway
Big Data Driven At Eway Big Data Driven At Eway
Big Data Driven At Eway
 
Security On The Cloud
Security On The CloudSecurity On The Cloud
Security On The Cloud
 
Eway Tech Talk #2 Coding Guidelines
Eway Tech Talk #2 Coding GuidelinesEway Tech Talk #2 Coding Guidelines
Eway Tech Talk #2 Coding Guidelines
 
End To End Machine Learning With Google Cloud
End To End Machine Learning With Google Cloud End To End Machine Learning With Google Cloud
End To End Machine Learning With Google Cloud
 
Eway Tech Talk #0 Knowledge Sharing
Eway Tech Talk #0 Knowledge SharingEway Tech Talk #0 Knowledge Sharing
Eway Tech Talk #0 Knowledge Sharing
 
Php 5.6 vs Php 7 performance comparison
Php 5.6 vs Php 7 performance comparisonPhp 5.6 vs Php 7 performance comparison
Php 5.6 vs Php 7 performance comparison
 
System Security on Cloud
System Security on CloudSystem Security on Cloud
System Security on Cloud
 
Big Data at DYNO
Big Data at DYNOBig Data at DYNO
Big Data at DYNO
 
Big data on google cloud
Big data on google cloudBig data on google cloud
Big data on google cloud
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Bảo mật Dành cho Tên công ty Phiên bản 1.0 - Resilience Design Patterns

  • 1. Bảo mật Dành cho Tên công ty Phiên bản 1.0 Latency Control & Supervision in Resilience Design Patterns Tu Pham - CTO @ Eway
  • 2. Bảo mật Dành cho Tên công ty Phiên bản 1.0 Terminology Why It So IMPORTANT Why It So HARD Design Patterns Anti Patterns Q & A TOC
  • 3. Terminology Distributed Systems These are networked components which communicate with each other by passing messages most often to achieve a common goal. Resiliency The capacity of any system to recover from difficulties. Availability Probability that any system is operating at time `t`. Reliability Degree to which a system / component performs specified functions under specified conditions for a specified period of time
  • 4. Faults Fault is an incorrect internal state in your system. Examples: 1. Slowing down of storage layer 2. Memory leaks in application 3. Blocked threads 4. Dependency failures 5. Bad data propagating in the system (Most often because there’s not enough validations on input data) Terminology Failure Failure is an inability of the system to perform its intended job. Examples: Failure means loss of Up-Time and availability on systems. Faults if not contained from propagating, can lead to failures.
  • 5.
  • 6. Why It So IMPORTANT 1 Losing customers and partners to competitors => Financial losses for the company 2 Affecting livelihood of publishers and advertisers 3 Affecting salary and bonus of OUR TEAM :)) 4 Affecting services for customers and colleges
  • 7. But building resiliency in a complex micro-services architecture with multiple distributed systems communicating with each other is difficult. Why It So HARD
  • 8. Some of the things which make it hard are: 1. The network is unreliable 2. Dependencies can always fail 3. User behavior is unpredictable Why It So HARD
  • 10.
  • 11. Latency Control ● Complements isolation ● Detection and handling of non-timely responses ● Avoid cascading temporal failures ● Different approaches and patterns available 0 20 40 60 80
  • 12. Timeout ● Preserve responsiveness independent of downstream latency ● Measure response time of downstream calls ● Stop waiting after a pre-determined timeout ● Take alternate action if timeout was reached
  • 13.
  • 14. Fail Fast ● “If you know you’re going to fail, you better fail fast” ● Avoid foreseeable failures ● Usually implemented by adding checks in front of costly actions ● Enhances probability of not failing
  • 15. Circuit Breaker ● Probably most often cited resilience pattern ● Extension of the timeout pattern ● Takes downstream unit offline if calls fail multiple times ● Specific variant of the fail fast pattern
  • 16.
  • 17.
  • 18.
  • 19. Fan out & quickest reply ● Send request to multiple workers ● Use quickest reply and discard all other responses ● Reduces probability of latent responses ● Tradeoff is WASTE of resources
  • 20. Bounded Queues ● Limit request queue sizes in front of highly utilized resources ● Avoids latency due to overloaded resources ● Introduces pushback on the callers ● Another variant of the fail fast pattern
  • 21.
  • 22. Supervision ● Provides failure handling beyond the means of a single failure unit ● Detect unit failures ● Provide means for error escalation ● Different approaches and patterns available
  • 23. Shed Load ● Upstream isolation pattern ● Avoid becoming overloaded due to too many requests ● Install a gatekeeper in front of the resource ● Shed requests based on resource load
  • 24. Monitor ● Observe unit behavior and interactions from the outside ● Automatically respond to detected failures ● Part of the system – complex failure handling strategies possible ● Outside the system – more robust against system level failures
  • 25. Error Handler ● Units often don’t have enough time or information to handle errors ● Separate business logic and error handling ● Business logic just focuses on getting the task done (quickly) ● Error handler has sufficient time and information to handle errors
  • 26. Escalation ● Units often don’t have enough time or information to handle errors ● Escalation peer with more time and information needed ● Often multi-level hierarchies ● Pure design issue
  • 27.
  • 29. Fallback ● Units often don’t have enough time or information to handle errors ● Instead of aborting the computation because of a missing response, we fill in a fallback value. ● Of course, it can be DANGEROUS !!!
  • 30. Retry ● Units have enough time or information to handle errors ● Just send the requests again and again til it reach the BOUNDARY of policy
  • 31. Escalation ● Units often don’t have enough time or information to handle errors ● Escalation peer with more time and information needed ● Often multi-level hierarchies ● Pure design issue
  • 32. Just Don’t ● Infinity delay ● One config / policy for all situations ● Fallback logics without confirmation from business departments / upper managers ● Laggy / buggy monitoring system
  • 33.
  • 34.
  • 35. References ● https://github.com/Netflix/Hystrix ● https://github.com/alibaba/Sentinel ● https://github.com/resilience4j/resilience4j ● https://github.com/jhalterman/failsafe
  • 36. “Just Design Our Systems For Failure” Q&A