SlideShare a Scribd company logo
1 of 51
Download to read offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Closing Loops and Opening Minds: How to
Take Control of Systems, Big and Small
Colm MacCárthaigh
Senior Principal Engineer
AWS
A R C 3 3 7
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
“Quality is not an act, it is a habit”
Aristotle, some time around 350BC
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon CloudFront Control Plane
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
CloudFront Control Plane
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
CloudFront Control Plane
(-, +)
(-, -)(+, -)
(+, +)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What goes into high quality designs
Diverse creative minds working in a fearless environment
Systematic reviews and mechanisms to share lessons
Use well-worn patterns where possible and focus
invention where it is truly needed
Testing, testing, testing, testing, testing
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How we make trade offs in design
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Control Planes Vs Data planes
Control Planes are often a bigger design
challenge than the data planes that they
support.
Poorly designed Control Planes have the
ability to cause large outages, or worse:
misconfigurations and corruption.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What do Control Planes do in the Cloud?
Manage the life cycle for resources
Provision software
Provision service configuration
Provision user configuration
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What do Control Planes do in the Cloud?
Manage the life cycle for resources
Provision software
Provision service configuration
Provision user configuration
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Control Theory 101
• Independently discovered in several
fields of engineering and science
• Formalized in the early-to-mid
twentieth century
• One of the most under-appreciated
branches of science, incredibly relevant
to distributed systems
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Control Theory 101
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Control Theory 101
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Control Theory 101
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Control Theory 101
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Control Theory 101
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Control Theory 101
PID
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 1: Checksum all of the things
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 1: Checksum all the things
watch:
out:
for:
- YAML
this:
file:
can:
be:
-truncated
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 2: Cryptographic Authentication
Encrypt and authenticate everything! Control Planes
are powerful and security critical systems
Be able to revoke and rotate every credentials. But also
watch out for certificate expiries
Prevent human access to production credentials
Never allow a non-production control plane to talk to
the production data plane
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 3: Cells, Shells, and Poison Tasters
We divide up our control planes horizontally into
regions, availability zones and cells
It’s also common to compartmentalize control
planes so that the data plane is insulated from
control plane crashes
Poison tasters: check up front that is a change is
safe
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 4: Asynchronous Coupling
Synchronous systems are very strongly coupled
A problem in a synchronous downstream
dependency has immediate impact on the
upstream callers
Retries from upstream callers can all-too-easily
fan-out and amplify problems
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 4: Asynchronous Coupling
Asynchronous coupling systems tend to be more
tolerant
Can make partial progress even when some
components are unavailable
Workflows and queues can be tuned to have
deterministic retry behaviors
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 5: Closed Feedback Loops
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 6: Small pushes and large pulls
Very Frequently Asked Question: Is it better to
push, or to pull?
For example: should data plane hosts accept
connections and be pushed configurations, or
should they connect to the control plane and pull
them?
It’s really the wrong question!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 6: Small pushes and large pulls
Long lived connections can support pushing
timely updates regardless of the “direction” of
the connection
Better to ask: which fleet is bigger? In general,
small fleets should connect to bigger fleets.
This avoids the problems of small fleets being
overwhelmed with thundering herds and retry
storms
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 7: Avoiding Cold Starts and Cold Caches
Caches are bi-modal systems. Super fast when
they have entries, and slow when they are empty
A thundering herd hitting a cold cache can
prevent it from ever getting warm
Retry storms often need to be moderated by
throttles
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 7: Avoiding Cold Starts and Cold Caches
Work out if you really need a cache at all
Pre-warm caches before accepting requests
Consider serving stale entries when backends are
unavailable
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 8: Throttles
Throttles and rate-limits are often needed to
moderate problem requestors and to dampen
fluctuating systems
Example: Amazon Elastic Load Balancer and
Amazon Elastic Compute Cloud (Amazon EC2)
Takes careful work to ensure that throttling does
not impact the end customer experience
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 9: Deltas
What happens when we do have too much
configuration state to push around?
More efficient to compute deltas and distribute
patches
But how do we actually do that?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 9: Deltas
Key Value
foo bar
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 9: Deltas
Key Value Version
foo bar 1
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 9: Deltas
Key Value Version
foo bar 1
foo baz 2
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 9: Deltas
Key Value Version
foo bar 1
foo baz 2
foo bar 3
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 10: Modality and Constant-Work
So far, we can build a loosely coupled control
plane, with deltas to minimize work, and throttles
to keep things safe
But what if a LOT of things change at the same
time?
We don’t want to build up backlogs and queues
and introduce lag
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 10: Modality and Constant-Work
Systems that change performance in response to
workload or data patterns can be fragile
Example: Relational databases are great for
flexible business queries, but terrible for stable
control planes. Hidden optimizations and query
plan flips can wreck chaos
Deployments, peak events, power events, all incur
risk because they can be new modes
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 10: Modality and Constant-Work
How dumb would it be to make a really really
simple control plane?
User calls an API that edits a configuration file on
Amazon Simple Storage Service (Amazon S3).
Push that configuration file every 10 second …
whether it changed or not!
Very very reliable and robust
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 10: Modality and Constant-Work
Our Network health checks, including Amazon
Route 53 Health Checks are a good example
Health Checks are happening all of the time
Results being published to consumers, all of the
time
Zone or Region failure = no difference!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Pattern 10: Modality and Constant-Work
100 nodes requesting a configuration every
second
$1200 / year in request costs
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What did we learn about building stable systems?
Closing loops is critical, measure the progress!
Loose asynchronous coupling helps
Think about the modalities of the system
Our lessons are baked into Amazon API Gateway
and AWS Lambda
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

分散DB Apache Kuduのアーキテクチャ DBの性能と一貫性を両立させる仕組み 「HybridTime」とは
分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは
分散DB Apache Kuduのアーキテクチャ DBの性能と一貫性を両立させる仕組み 「HybridTime」とはCloudera Japan
 
リアクティブ・アーキテクチャ ~大規模サービスにおける必要性と課題〜 #devsumi
リアクティブ・アーキテクチャ ~大規模サービスにおける必要性と課題〜 #devsumiリアクティブ・アーキテクチャ ~大規模サービスにおける必要性と課題〜 #devsumi
リアクティブ・アーキテクチャ ~大規模サービスにおける必要性と課題〜 #devsumiYuta Okamoto
 
20170525 jsug バッチは地味だが役に立つ
20170525 jsug バッチは地味だが役に立つ20170525 jsug バッチは地味だが役に立つ
20170525 jsug バッチは地味だが役に立つYuichi Hasegawa
 
AWSでAPI Gatewayから非同期でLambdaを起動してS3にファイルアップロードしようとしたらハマった話。
AWSでAPI Gatewayから非同期でLambdaを起動してS3にファイルアップロードしようとしたらハマった話。AWSでAPI Gatewayから非同期でLambdaを起動してS3にファイルアップロードしようとしたらハマった話。
AWSでAPI Gatewayから非同期でLambdaを起動してS3にファイルアップロードしようとしたらハマった話。Takehiro Suemitsu
 
DatadogでAWS監視やってみた
DatadogでAWS監視やってみたDatadogでAWS監視やってみた
DatadogでAWS監視やってみたtyamane
 
トランクベース開発を活用して爆速に開発した話
トランクベース開発を活用して爆速に開発した話トランクベース開発を活用して爆速に開発した話
トランクベース開発を活用して爆速に開発した話Tier_IV
 
PostgreSQLからのデータ連携/同期も完全対応!DBを『活かす』なら、Syniti DR 9.7!
PostgreSQLからのデータ連携/同期も完全対応!DBを『活かす』なら、Syniti DR 9.7!PostgreSQLからのデータ連携/同期も完全対応!DBを『活かす』なら、Syniti DR 9.7!
PostgreSQLからのデータ連携/同期も完全対応!DBを『活かす』なら、Syniti DR 9.7!株式会社クライム
 
Snowflake architecture and_performance_kansaidb20180421
Snowflake architecture and_performance_kansaidb20180421Snowflake architecture and_performance_kansaidb20180421
Snowflake architecture and_performance_kansaidb20180421Mineaki Motohashi
 
Yahoo! JAPANのコンテンツプラットフォームを支えるSpring Cloud Streamによるマイクロサービスアーキテクチャ #jsug #sf_52
Yahoo! JAPANのコンテンツプラットフォームを支えるSpring Cloud Streamによるマイクロサービスアーキテクチャ #jsug #sf_52Yahoo! JAPANのコンテンツプラットフォームを支えるSpring Cloud Streamによるマイクロサービスアーキテクチャ #jsug #sf_52
Yahoo! JAPANのコンテンツプラットフォームを支えるSpring Cloud Streamによるマイクロサービスアーキテクチャ #jsug #sf_52Yahoo!デベロッパーネットワーク
 
apostila minist infantil.pdf
apostila minist infantil.pdfapostila minist infantil.pdf
apostila minist infantil.pdfLUCIANA ROCHA
 
Wantedlyを2年間Herokuで運用した話
Wantedlyを2年間Herokuで運用した話Wantedlyを2年間Herokuで運用した話
Wantedlyを2年間Herokuで運用した話Yoshinori Kawasaki
 
Rdraはどう形作られたか?
Rdraはどう形作られたか?Rdraはどう形作られたか?
Rdraはどう形作られたか?Zenji Kanzaki
 
Beyond the Twelve-Factor App
Beyond the Twelve-Factor AppBeyond the Twelve-Factor App
Beyond the Twelve-Factor AppKazuya Takahashi
 
ベロシティを上手く使って 技術的負債を計画的に解消する
ベロシティを上手く使って 技術的負債を計画的に解消するベロシティを上手く使って 技術的負債を計画的に解消する
ベロシティを上手く使って 技術的負債を計画的に解消するKoichiro Matsuoka
 
smarthrを支えるインフラ
smarthrを支えるインフラsmarthrを支えるインフラ
smarthrを支えるインフラtei-k
 
アジャイル事例紹介
アジャイル事例紹介アジャイル事例紹介
アジャイル事例紹介hiko99
 
次世代バンキングシステムを活用した「みんなのBaaS」
次世代バンキングシステムを活用した「みんなのBaaS」次世代バンキングシステムを活用した「みんなのBaaS」
次世代バンキングシステムを活用した「みんなのBaaS」API Meetup
 
忙しい人のための Rocky Linux 入門〜Rocky LinuxはCentOSの後継者たり得るか?〜
忙しい人のための Rocky Linux 入門〜Rocky LinuxはCentOSの後継者たり得るか?〜忙しい人のための Rocky Linux 入門〜Rocky LinuxはCentOSの後継者たり得るか?〜
忙しい人のための Rocky Linux 入門〜Rocky LinuxはCentOSの後継者たり得るか?〜Masahito Zembutsu
 
セイリエンスモデルとは?
セイリエンスモデルとは?セイリエンスモデルとは?
セイリエンスモデルとは?Taku Aoyama
 
Salesforce開発で気を付けたいポイント
Salesforce開発で気を付けたいポイントSalesforce開発で気を付けたいポイント
Salesforce開発で気を付けたいポイントy-maeda
 

What's hot (20)

分散DB Apache Kuduのアーキテクチャ DBの性能と一貫性を両立させる仕組み 「HybridTime」とは
分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは
分散DB Apache Kuduのアーキテクチャ DBの性能と一貫性を両立させる仕組み 「HybridTime」とは
 
リアクティブ・アーキテクチャ ~大規模サービスにおける必要性と課題〜 #devsumi
リアクティブ・アーキテクチャ ~大規模サービスにおける必要性と課題〜 #devsumiリアクティブ・アーキテクチャ ~大規模サービスにおける必要性と課題〜 #devsumi
リアクティブ・アーキテクチャ ~大規模サービスにおける必要性と課題〜 #devsumi
 
20170525 jsug バッチは地味だが役に立つ
20170525 jsug バッチは地味だが役に立つ20170525 jsug バッチは地味だが役に立つ
20170525 jsug バッチは地味だが役に立つ
 
AWSでAPI Gatewayから非同期でLambdaを起動してS3にファイルアップロードしようとしたらハマった話。
AWSでAPI Gatewayから非同期でLambdaを起動してS3にファイルアップロードしようとしたらハマった話。AWSでAPI Gatewayから非同期でLambdaを起動してS3にファイルアップロードしようとしたらハマった話。
AWSでAPI Gatewayから非同期でLambdaを起動してS3にファイルアップロードしようとしたらハマった話。
 
DatadogでAWS監視やってみた
DatadogでAWS監視やってみたDatadogでAWS監視やってみた
DatadogでAWS監視やってみた
 
トランクベース開発を活用して爆速に開発した話
トランクベース開発を活用して爆速に開発した話トランクベース開発を活用して爆速に開発した話
トランクベース開発を活用して爆速に開発した話
 
PostgreSQLからのデータ連携/同期も完全対応!DBを『活かす』なら、Syniti DR 9.7!
PostgreSQLからのデータ連携/同期も完全対応!DBを『活かす』なら、Syniti DR 9.7!PostgreSQLからのデータ連携/同期も完全対応!DBを『活かす』なら、Syniti DR 9.7!
PostgreSQLからのデータ連携/同期も完全対応!DBを『活かす』なら、Syniti DR 9.7!
 
Snowflake architecture and_performance_kansaidb20180421
Snowflake architecture and_performance_kansaidb20180421Snowflake architecture and_performance_kansaidb20180421
Snowflake architecture and_performance_kansaidb20180421
 
Yahoo! JAPANのコンテンツプラットフォームを支えるSpring Cloud Streamによるマイクロサービスアーキテクチャ #jsug #sf_52
Yahoo! JAPANのコンテンツプラットフォームを支えるSpring Cloud Streamによるマイクロサービスアーキテクチャ #jsug #sf_52Yahoo! JAPANのコンテンツプラットフォームを支えるSpring Cloud Streamによるマイクロサービスアーキテクチャ #jsug #sf_52
Yahoo! JAPANのコンテンツプラットフォームを支えるSpring Cloud Streamによるマイクロサービスアーキテクチャ #jsug #sf_52
 
apostila minist infantil.pdf
apostila minist infantil.pdfapostila minist infantil.pdf
apostila minist infantil.pdf
 
Wantedlyを2年間Herokuで運用した話
Wantedlyを2年間Herokuで運用した話Wantedlyを2年間Herokuで運用した話
Wantedlyを2年間Herokuで運用した話
 
Rdraはどう形作られたか?
Rdraはどう形作られたか?Rdraはどう形作られたか?
Rdraはどう形作られたか?
 
Beyond the Twelve-Factor App
Beyond the Twelve-Factor AppBeyond the Twelve-Factor App
Beyond the Twelve-Factor App
 
ベロシティを上手く使って 技術的負債を計画的に解消する
ベロシティを上手く使って 技術的負債を計画的に解消するベロシティを上手く使って 技術的負債を計画的に解消する
ベロシティを上手く使って 技術的負債を計画的に解消する
 
smarthrを支えるインフラ
smarthrを支えるインフラsmarthrを支えるインフラ
smarthrを支えるインフラ
 
アジャイル事例紹介
アジャイル事例紹介アジャイル事例紹介
アジャイル事例紹介
 
次世代バンキングシステムを活用した「みんなのBaaS」
次世代バンキングシステムを活用した「みんなのBaaS」次世代バンキングシステムを活用した「みんなのBaaS」
次世代バンキングシステムを活用した「みんなのBaaS」
 
忙しい人のための Rocky Linux 入門〜Rocky LinuxはCentOSの後継者たり得るか?〜
忙しい人のための Rocky Linux 入門〜Rocky LinuxはCentOSの後継者たり得るか?〜忙しい人のための Rocky Linux 入門〜Rocky LinuxはCentOSの後継者たり得るか?〜
忙しい人のための Rocky Linux 入門〜Rocky LinuxはCentOSの後継者たり得るか?〜
 
セイリエンスモデルとは?
セイリエンスモデルとは?セイリエンスモデルとは?
セイリエンスモデルとは?
 
Salesforce開発で気を付けたいポイント
Salesforce開発で気を付けたいポイントSalesforce開発で気を付けたいポイント
Salesforce開発で気を付けたいポイント
 

Similar to Closing Loops and Opening Minds: How to Take Control of Systems, Big and Small (ARC337) - AWS re:Invent 2018

Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...Amazon Web Services
 
Keynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringKeynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringAmazon Web Services
 
Resiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudResiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudAmazon Web Services
 
Mejores prácticas para administrar las operaciones de seguridad en AWS - MXO2...
Mejores prácticas para administrar las operaciones de seguridad en AWS - MXO2...Mejores prácticas para administrar las operaciones de seguridad en AWS - MXO2...
Mejores prácticas para administrar las operaciones de seguridad en AWS - MXO2...Amazon Web Services
 
Chaos Engineering with Kubernetes
Chaos Engineering with KubernetesChaos Engineering with Kubernetes
Chaos Engineering with KubernetesArun Gupta
 
Keynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practicedKeynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practicedAWS User Group Bengaluru
 
Applying principles of chaos engineering to serverless (reinvent DVC305)
Applying principles of chaos engineering to serverless (reinvent DVC305)Applying principles of chaos engineering to serverless (reinvent DVC305)
Applying principles of chaos engineering to serverless (reinvent DVC305)Yan Cui
 
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...Amazon Web Services
 
Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018
Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018
Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018Amazon Web Services
 
New AWS Security Solutions to Protect Your Workload
New AWS Security Solutions to Protect Your WorkloadNew AWS Security Solutions to Protect Your Workload
New AWS Security Solutions to Protect Your WorkloadAmazon Web Services
 
ServerlessConf 2018 Keynote - Debunking Serverless Myths
ServerlessConf 2018 Keynote - Debunking Serverless MythsServerlessConf 2018 Keynote - Debunking Serverless Myths
ServerlessConf 2018 Keynote - Debunking Serverless MythsTim Wagner
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Amazon Web Services
 
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...Amazon Web Services
 
Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Adrian Hornsby
 
Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...
Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...
Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...Amazon Web Services
 
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...Amazon Web Services
 
Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018
Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018
Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018Amazon Web Services
 
Optimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS Summit
Optimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS SummitOptimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS Summit
Optimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS SummitAmazon Web Services
 
Come Out From Behind Your Firewall
Come Out From Behind Your FirewallCome Out From Behind Your Firewall
Come Out From Behind Your FirewallAmazon Web Services
 
SRV203 Optimizing Amazon EC2 for Fun and Profit
 SRV203 Optimizing Amazon EC2 for Fun and Profit SRV203 Optimizing Amazon EC2 for Fun and Profit
SRV203 Optimizing Amazon EC2 for Fun and ProfitAmazon Web Services
 

Similar to Closing Loops and Opening Minds: How to Take Control of Systems, Big and Small (ARC337) - AWS re:Invent 2018 (20)

Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
 
Keynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringKeynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos Engineering
 
Resiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudResiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the Cloud
 
Mejores prácticas para administrar las operaciones de seguridad en AWS - MXO2...
Mejores prácticas para administrar las operaciones de seguridad en AWS - MXO2...Mejores prácticas para administrar las operaciones de seguridad en AWS - MXO2...
Mejores prácticas para administrar las operaciones de seguridad en AWS - MXO2...
 
Chaos Engineering with Kubernetes
Chaos Engineering with KubernetesChaos Engineering with Kubernetes
Chaos Engineering with Kubernetes
 
Keynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practicedKeynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practiced
 
Applying principles of chaos engineering to serverless (reinvent DVC305)
Applying principles of chaos engineering to serverless (reinvent DVC305)Applying principles of chaos engineering to serverless (reinvent DVC305)
Applying principles of chaos engineering to serverless (reinvent DVC305)
 
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Inve...
 
Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018
Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018
Leadership Session: AWS Security (SEC305-L) - AWS re:Invent 2018
 
New AWS Security Solutions to Protect Your Workload
New AWS Security Solutions to Protect Your WorkloadNew AWS Security Solutions to Protect Your Workload
New AWS Security Solutions to Protect Your Workload
 
ServerlessConf 2018 Keynote - Debunking Serverless Myths
ServerlessConf 2018 Keynote - Debunking Serverless MythsServerlessConf 2018 Keynote - Debunking Serverless Myths
ServerlessConf 2018 Keynote - Debunking Serverless Myths
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
 
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
 
Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.
 
Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...
Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...
Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...
 
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
 
Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018
Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018
Configure Your Cloud to Make It Rain on Threats (SEC335-R1) - AWS re:Invent 2018
 
Optimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS Summit
Optimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS SummitOptimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS Summit
Optimize Amazon EC2 for Fun and Profit - SRV203 - Chicago AWS Summit
 
Come Out From Behind Your Firewall
Come Out From Behind Your FirewallCome Out From Behind Your Firewall
Come Out From Behind Your Firewall
 
SRV203 Optimizing Amazon EC2 for Fun and Profit
 SRV203 Optimizing Amazon EC2 for Fun and Profit SRV203 Optimizing Amazon EC2 for Fun and Profit
SRV203 Optimizing Amazon EC2 for Fun and Profit
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Closing Loops and Opening Minds: How to Take Control of Systems, Big and Small (ARC337) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Closing Loops and Opening Minds: How to Take Control of Systems, Big and Small Colm MacCárthaigh Senior Principal Engineer AWS A R C 3 3 7
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 8. “Quality is not an act, it is a habit” Aristotle, some time around 350BC
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon CloudFront Control Plane
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. CloudFront Control Plane
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. CloudFront Control Plane (-, +) (-, -)(+, -) (+, +)
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What goes into high quality designs Diverse creative minds working in a fearless environment Systematic reviews and mechanisms to share lessons Use well-worn patterns where possible and focus invention where it is truly needed Testing, testing, testing, testing, testing
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How we make trade offs in design
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Control Planes Vs Data planes Control Planes are often a bigger design challenge than the data planes that they support. Poorly designed Control Planes have the ability to cause large outages, or worse: misconfigurations and corruption.
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What do Control Planes do in the Cloud? Manage the life cycle for resources Provision software Provision service configuration Provision user configuration
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What do Control Planes do in the Cloud? Manage the life cycle for resources Provision software Provision service configuration Provision user configuration
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Control Theory 101 • Independently discovered in several fields of engineering and science • Formalized in the early-to-mid twentieth century • One of the most under-appreciated branches of science, incredibly relevant to distributed systems
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Control Theory 101
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Control Theory 101
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Control Theory 101
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Control Theory 101
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Control Theory 101
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Control Theory 101 PID
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 1: Checksum all of the things
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 1: Checksum all the things watch: out: for: - YAML this: file: can: be: -truncated
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 2: Cryptographic Authentication Encrypt and authenticate everything! Control Planes are powerful and security critical systems Be able to revoke and rotate every credentials. But also watch out for certificate expiries Prevent human access to production credentials Never allow a non-production control plane to talk to the production data plane
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 3: Cells, Shells, and Poison Tasters We divide up our control planes horizontally into regions, availability zones and cells It’s also common to compartmentalize control planes so that the data plane is insulated from control plane crashes Poison tasters: check up front that is a change is safe
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 4: Asynchronous Coupling Synchronous systems are very strongly coupled A problem in a synchronous downstream dependency has immediate impact on the upstream callers Retries from upstream callers can all-too-easily fan-out and amplify problems
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 4: Asynchronous Coupling Asynchronous coupling systems tend to be more tolerant Can make partial progress even when some components are unavailable Workflows and queues can be tuned to have deterministic retry behaviors
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 5: Closed Feedback Loops
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 6: Small pushes and large pulls Very Frequently Asked Question: Is it better to push, or to pull? For example: should data plane hosts accept connections and be pushed configurations, or should they connect to the control plane and pull them? It’s really the wrong question!
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 6: Small pushes and large pulls Long lived connections can support pushing timely updates regardless of the “direction” of the connection Better to ask: which fleet is bigger? In general, small fleets should connect to bigger fleets. This avoids the problems of small fleets being overwhelmed with thundering herds and retry storms
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 7: Avoiding Cold Starts and Cold Caches Caches are bi-modal systems. Super fast when they have entries, and slow when they are empty A thundering herd hitting a cold cache can prevent it from ever getting warm Retry storms often need to be moderated by throttles
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 7: Avoiding Cold Starts and Cold Caches Work out if you really need a cache at all Pre-warm caches before accepting requests Consider serving stale entries when backends are unavailable
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 8: Throttles Throttles and rate-limits are often needed to moderate problem requestors and to dampen fluctuating systems Example: Amazon Elastic Load Balancer and Amazon Elastic Compute Cloud (Amazon EC2) Takes careful work to ensure that throttling does not impact the end customer experience
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 9: Deltas What happens when we do have too much configuration state to push around? More efficient to compute deltas and distribute patches But how do we actually do that?
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 9: Deltas Key Value foo bar
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 9: Deltas Key Value Version foo bar 1
  • 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 9: Deltas Key Value Version foo bar 1 foo baz 2
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 9: Deltas Key Value Version foo bar 1 foo baz 2 foo bar 3
  • 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 10: Modality and Constant-Work So far, we can build a loosely coupled control plane, with deltas to minimize work, and throttles to keep things safe But what if a LOT of things change at the same time? We don’t want to build up backlogs and queues and introduce lag
  • 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 10: Modality and Constant-Work Systems that change performance in response to workload or data patterns can be fragile Example: Relational databases are great for flexible business queries, but terrible for stable control planes. Hidden optimizations and query plan flips can wreck chaos Deployments, peak events, power events, all incur risk because they can be new modes
  • 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 10: Modality and Constant-Work How dumb would it be to make a really really simple control plane? User calls an API that edits a configuration file on Amazon Simple Storage Service (Amazon S3). Push that configuration file every 10 second … whether it changed or not! Very very reliable and robust
  • 47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 10: Modality and Constant-Work Our Network health checks, including Amazon Route 53 Health Checks are a good example Health Checks are happening all of the time Results being published to consumers, all of the time Zone or Region failure = no difference!
  • 48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Pattern 10: Modality and Constant-Work 100 nodes requesting a configuration every second $1200 / year in request costs
  • 49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What did we learn about building stable systems? Closing loops is critical, measure the progress! Loose asynchronous coupling helps Think about the modalities of the system Our lessons are baked into Amazon API Gateway and AWS Lambda
  • 50. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 51. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.