SlideShare a Scribd company logo
1 of 97
Download to read offline
Production Readiness
Strategies in an
Automated World
Sean Chittenden
Engineering, HashiCorp
@SeanChittenden
https://keybase.io/seanc
Dev to Prod
Background
Software Life Cycle
Idea!
Software Life Cycle
Idea!
Software Life Cycle
Software Life Cycle
Time
Prod
1) Idea!
R&D
Software Life Cycle
Time
Prod
1) Idea!
2) Production Ready
R&D
Software Life Cycle
Time
Prod
1) Idea!
2) Production Ready
R&D
Software Life Cycle
Time
Prod
1) Idea!
2) Production Ready
R&D
Software Life Cycle
Time
Prod
1) Idea!
2) Production Ready
R&D
Software Life Cycle
Time
Readiness
1) Idea!
2) Production Ready 3) End of Life
2.9) "It’ll be time to wind this service down
when ___ happens and ___ comes online."
R&D
Software Life Cycle
Time
Production
1) Idea!
2) Production Ready
3) End of Life
"Production Supported"
"Oops"
R&D
Software Life Cycle
Time
Production
1) Idea!
2) Production Ready
4) End of Life
"Production Supported"
3) "Oops"
R&D
Software Life Cycle
Time
Production
1) Idea!
N) End of Life
"Production Supported"
Forced to fix code or docs.
R&D
Software Life Cycle
Time
Production
1) Idea!
2) Production Ready
N) End of Life
"Production Supported"
"Drug feet to produce docs."
[3,M) "Oops"
R&D
N-1) "That’s it, we’ve had enough…"
Software Life Cycle
Time
Production
1) Idea!
2) Production Ready
N) End of Life
"Production Supported"
[3,M) "Oops"
R&D
N-2) "That’s it, we’ve had enough…"
N-1) "Just support it until
the next version is out"
Operations in the "Real World"
Complexity Abound
The Echo Service: Stateless HTTP Echo
$ go get github.com/hashicorp/http-echo
$ http-echo -text foo
$ curl http://127.0.0.1:5678/
foo
Echo as a Service
Components:
• Echo Service
• Load Balancer
• "Hardware" / OS
• Metrics Agent
• Logs Management
• Reproducible Builds
$ cd $GOPATH/src/github.com/hashicorp/http-echo/
$ git checkout 87ee38c517094993932bd76b37af03980e8c4151
$ go build
Complexity In The Simple Case
Simple Example: The Echo Service
Minimum of 6x dimensions to be concerned about
No downstream services: only request + response
Echo as a Service
Dimensions of Work to
measure:
• CPU
• RAM usage
• Network Usage
• TCP accept/connection rate
• Disk Capacity
• Disk IO (maybe?)
• Stability
• Request volume
• Request Latency
"Can't Escape the Signal, Mal"
The Echo Service: Stateless HTTP Echo
2016/11/18 03:29:58 Server is listening on :5678
2016/11/18 03:30:00 127.0.0.1:5678 127.0.0.1:61932 "GET / HTTP/1.1" 200 4 "curl/7.51.0" 15.94µs
Echo as a Service
Complexity Factor: ~10
Echo's Operational Concerns
Loss Aversion
• Uptime
• Secrets
• Planned Failure Modes: failure on a probability curve
• Server Uptime (e.g. OS or Hardware)
• Unplanned Failure Modes (e.g. DC or AZ fails)
Entropy and Failure: Best Friends
Echo's Operational Concerns
Loss Aversion
• Uptime
• Secrets
• Planned Failure Modes: failure on a probability curve
• Server Uptime (e.g. OS or Hardware)
• Unplanned Failure Modes (e.g. DC or AZ fails in an earthquake)
• Success Failure Modes
Randall A. Lewis and David H. Reiley. 2013. Down-to-the-minute effects of
super bowl advertising on online search behavior.

http://dx.doi.org/10.1145/2482540.2482600
Echo's Operational Concerns
Loss Aversion
• Uptime
• Secrets
• Planned Failure Modes: failure on a probability curve
• Server Uptime (e.g. OS or Hardware)
• Unplanned Failure Modes (e.g. DC or AZ fails)
• Success Failure Modes
• Known Architectural Limits
• Unknown Architectural Limits
Performance Spelunking
Exciting, but not very fun
Lurking Significant Details
Imagine a more complex service:
• an API server that fans out to ~20 downstream services
• Uses async scatter/gather to fan out requests
• Transient failures become the norm
Stateful Complexity
Database-as-a-Service: PostgreSQL Edition
SQL
WAL Files Log Files
PostgreSQL as a Service
Components:
• PostgreSQL
• Connection Pooler (pgbouncer)
• PITR Manager (WAL-E, omnipitr,
pgBackRest)
• Logs Analyzer (pgbadger, pgfouine)
• Metrics Agent
• Failover Manager (Connections, State, Data
Continuity/Self-Healing)
• SchemaVersioning
SQL
WAL Files Log Files
PostgreSQL as a Service
Dimensions of Work to measure:
• CPU
• RAM usage
• Network Usage
• TCP accept/connection rate
• Disk Capacity
• Maybe disk IO (read, write)
• Stability
• Request volume
• Request Latency
• Query performance
• Kernel Lock Contention
• Userland buffer eviction rate
• Cache-miss rate
• Size of blast radius
• ... etc.
SQL
WAL Files Log Files
PostgreSQL as a Service
Complexity Factor:
~30 x (number of tables x metrics per
table)
SQL
WAL Files Log Files
PostgreSQL as a Service
Database PSATangent:
• Don't confuse complexity with value.
• Databases are amazingly useful things
because of their productivity and value
as a network service.
• Databases assume the lions share of
complexity burden: centralized
complexity is easier than distributed
complexity.
How do you systematically
address inherent,
necessary complexity?
Checklists
• Identify Problems
• Read - Do Checklists
• Ensure critical steps hit
• Useful in emergencies (plane on fire? Do X,Y, and Z...)
• Do - Confirm Checklists
• Verify muscle memory
• Combats atrophy and fatigue
Building a Modern
Operations Checklist
Who uses checklists?
Astronauts
Surgeons
Pilots
Inspectors
Military
IT/Operations?
Good Checklists
• Have a clear purpose
• Are brief: 10-20 items, fit on a single page
• Focus on what's essential/mandatory
• Enumerate what must be done (and frequently forgotten)
• Don't replace personal judgement or skill
• Enforce discipline
• Provide tools for collaboration and communication
• Establish protocol or enforce a norm
Good Checklists
• Have a clear purpose
• Are brief: 10-20 items, fit on a single page
• Focus on what's essential/mandatory
• Enumerate what must be done (and frequently forgotten)
• Don't replace personal judgement
• Enforce discipline
• Provide tools for collaboration and communication
• Establish protocol or enforce a norm
Building a Modern
Operations
Checkli^WAudit
Production Ready
SQL
WAL Files Log Files
Production Ready
SQL
WAL Files Log Files
Organizational Challenges Technical Challenges
Organizational Prerequisites
Standardized Jargon (e.g. SEV1 vs SEV2, client vs consumer)
Policy for Unique Service namespaces (app1 vs appN vs dbN)
# Deny registration access to services prefixed
# "app1-". Discovery of the service is still
# allowed in read mode.
service "app1-" {
policy = "read"
}
service "app2-" {
policy = "write"
}
Organizational Prerequisites
Standardized Jargon (e.g. SEV1 vs SEV2, client vs consumer)
Policy for Unique Service namespaces (app1 vs appN vs dbN)
Naming conventions established within a service (app1-api1 vs app1-dbN)
Rules of Engagement outlining how outage is:
1. Identified
2. Responded to
3. Recovery is conducted
4. Prevention
5. Preparation
6. GOTO step #1
Organizational Prerequisites
Standardized Jargon (e.g. SEV1 vs SEV2, client vs consumer)
Policy for Unique Service namespaces (app1 vs appN vs dbN)
Naming conventions established within a service (app1-api1 vs app1-dbN)
Rules of Engagement outlining how outage is handled
Centralized documentation
Establish a culture of systems thinking
Organizational Prerequisites
Establish a culture of systems thinking:
•a system is composed of parts
•a system is greater than the sum of its parts
•all the parts of a system must be related (directly or indirectly),
else there are really two or more distinct systems
•a system is encapsulated (has a boundary)
•a system can be nested inside another system
•a system can overlap with another system
•a system consists of processes that transform inputs into outputs
•a system is autonomous in fulfilling its purpose:



A car is not a system. A car with a driver is a system.
Organizational Prerequisites
Standardized Jargon (e.g. SEV1 vs SEV2, client vs consumer)
Policy for Unique Service namespaces (app1 vs appN vs dbN)
Naming conventions established within a service (app1-api1 vs app1-dbN)
Rules of Engagement outlining how outage is handled
Centralized documentation
Establish a culture of SystemsThinking
Establish end-to-end ownership
Decoupled service names from team names
Why do we care?
• We aren't always going to be working on our code.
• We need to establish a culture of maintenance and the necessary
supporting systems, both organizational and technical.
Audit Reduced to a Checklist
High-level summary of the service?
Stateful or Stateless
List of important consumers
Release Process
On-Call Instructions / Incident Response
Health Defined
Customer Service Endpoint?
Backups
Geographic Redundancy
Audit back to Checklist
High-level summary of the service?
Stateful or Stateless
List of important consumers
Release Process
On-Call Instructions / Incident Response
Health Defined
Customer Service Endpoint?
Backups
Geographic Redundancy
=> Organizational Concern
=>Technical Concern
=>Tech and Org Concern
=> Organizational Concern
=> Organizational Concern
=>Technical Concern
=> Organizational Concern
=> Organizational Concern
=> Organizational Concern
Plan, Doc, Vet, and Decide Starting Here...
Time
Prod
1) Idea!
2) Production Ready
R&D
... ideally before here...
Time
Production
1) Idea!
N) End of Life
"Production Supported"
Forced to fix code or docs.
R&D
... but NO later than here!!!
Time
Production
1) Idea!
N) End of Life
"Production Supported"
Forced to fix code or docs.
R&D
(It's good to refine here when this happens)
Time
Production
1) Idea!
N) End of Life
"Production Supported"
Forced to fix code or docs.
R&D
Value from Checklists
High-level summary of the service?
Stateful or Stateless
List of important consumers
Release Process
On-Call Instructions / Incident Response
Health Defined
Customer Service Endpoint?
Backups
Geographic Redundancy
=> FasterTraining / Fungible Skills
=> Universal / Consistent / Standard
=> Faster Understanding andTraining
=> Faster Resolution / Fungible Skills
=> Larger Pool / Increased Sympathy
=> Standardized Resolution
=> One Source ofTruth
=> Standard Procedures
=> Unplanned Disasters Mitigated
How do you build a checklist?
Summary: Vertical Places to Look
SQL
WAL Files Log Files
Organizational Challenges Technical Challenges
Summary: Horizontal Places to Look
Time
Prod
1) Idea!
2) Production Ready
R&D
Questions?
Thank the audience for their time.
Name: Sean Chittenden
Twitter: @SeanChittenden
Recommended Reading
Seed Questions for Checklists
Service Checklist: Overview
Service Overview
• Description and relevance to the business
• Short explanation of how the service fits into the eco system
of micro services
• Pointers to more detailed documentation
• Pointers to the current team owners
Stateful or Stateless service
Does the service employ any internal caching
Dependency management: e.g. embedded libraries that have been
vendor/'ed (not necessary with Go, this is self-evident)
Service Overview
$ head my-service.job
# This declares a job named "service123". There can be exactly one
# job declaration per job file.
job "service123" {
# Specify this job should run in the region named "us". Regions
# are defined by the Nomad servers' configuration.
region = "us"
# Spread the tasks in this job between us-west-2 and us-east-1.
datacenters = ["us-west-2", "us-east-1"]
# Run this job as a "service" type. Each job type has different
# properties. See the documentation below for more examples.
type = "service"
Service Checklist: Overview
Service Overview
$ head my-docs.job
# This declares a job named "docs". There can be exactly one
# job declaration per job file.
job "docs" {
meta {
owner = "https://github.com/myorg/myproject/blob/master/owners.md"
docs-url = "https://github.com/myorg/myproject"
system-summary = "https://github.com/myorg/myproject/blob/master/system-summary.md"
}
Service Checklist: Overview
Service Overview
• Auditable via the API:

http://nomad.service.consul:4646/v1/job/<ID>
Service Checklist: Overview
List of high-level consumers
• API consumed by other services within the organization
• Public Internet
• Marketing (a/b testing?)
• Customer Service
Service Confidentiality Classification
Sales Information
• Unofficial docs that can be used by sales or marketing.
Authoritative information comes from the team writing the
service. Doesn't need to be final copy, but should include useful
figures about this service.
Service Checklist: Overview
Release Process
On-call - what's the fallback strategy for a small service with a team
of two?
How is the service installed?
How is the service configured?
How is the service's process managed?
• How is it started?
• How is it stopped?
• Is there a graceful shutdown procedure vs a rapid shutdown
procedure?
• Can you send a SIGKILL signal to the process?
Incident Response
Release Process
On-call - what's the fallback strategy for a small service with a team
of two?
How is the service installed?
How is the service configured?
How is the service's process managed?
Is the process management platform-specific?
Is there a table mapping each signal to the effect of the signal
Process Management
Is Process Management hooked into the monitoring and alerting
framework?
Incident Response
Health
Health of the Service
What is the definition of healthy?
TIP: Use Consul Health Checks for Break/Fix
{
"service": {
"name": "redis",
"tags": ["master"],
"address": "127.0.0.1",
"port": 8000,
"enableTagOverride": false,
"checks": [
{
"script": "/usr/local/bin/check_redis.py",
"interval": "10s"
}
]
}
}
Health of the Service
What is the definition of healthy?
Is there any Seasonality to the definition of healthy?
How do you observe the service?
Is there any automated capacity planning attached to the service?
Health
Customer Service
How does customer service interact with this service?
Does CS have direct access to PII or other sensitive material?
Customer Service
Quality Metrics
What are the important KPIs coming out of this service?
• If you don't measure it, you won't optimize for it.
• If you don't measure it, you can't manage it.
• You can only succeed at what you can measure.
• You can't improve what you don't measure.
Quality Metrics
What are the important KPIs coming out of this service?
Measuring the number of round-trips between Support and
Customers/Users
Measuring the number of round-trips between Support and
Engineering
Measuring the "level of effort" or amount of input a person has
to submit in order to receive support.
Accuracy of information provided by customers?
Measure the "rate of access" to PII information.
Quality Metrics
What are the important KPIs coming out of this service?
Strategy: Centralize and poll for number of tagged issues out of
GitHub.
Organization Prerequisites
Define the gradients in an outage
• SEV1 - Hard outage, complete loss of service or "major impact to
business value/revenue".
• SEV2 - Partial outage or impaired service (SLA violation).
• SEV3 - Integrity of service issue (bugs).
• SEV4 - Non-critical issue that needs to be prioritized 9-5 M-F.
• SEV5 - Janitorial work that needs to happen on a routine schedule.
Define what it means to follow through with an outage.
• What level of follow through is required?
• Postmortems?
• Who patches it and who receives time to actually fix it permanently?
Outage Consequences
Revenue Impact User Impact Systems Impact Escalation
SEV1
SEV2
SEV3
SEV4
SEV5
Outage Consequences
Define the gradients in an outage
Sketch out the direct and indirect consequences on the system
Tracing
Is there a tracing token sent by upstream? If not, why not?
Is this service at the boundary of HTTP and RPC?
Is there an API library available that will automatically inject the
tracing token into downstream calls?
Can tracing only be used in aggregate or can it be used for
individual problems?
Geographic Redundancy
Is the service geographically redundant or not? If not, why not?
If yes:
Does this happen automatically?
Geographic Redundancy
{
"Name": "my-query",
"Session": "adf4238a-882b-9ddc-4a9d-5b6758e4159e",
"Token": "",
"Near": "node1",
"Service": {
"Service": "redis",
"Failover": {
"NearestN": 3,
"Datacenters": ["dc1", "dc2"]
},
"OnlyPassing": false,
"Tags": ["master", "!experimental"]
},
"DNS": {
"TTL": "10s"
}
}
Geographic Redundancy
Is the service geographically redundant or not? If not, why not?
If yes:
Does this happen automatically?
What mechanisms handle this?
Are there any regulatory concerns that come into play?
Is the failover process manual?
Does this happen at human timescale or on a machine
timescale?
Is the geographically redundant path continually tested?
Active-Active
Can this service be active-active?
If not, why not?
If yes, what kind of locking concerns or information sharing
concerns need to be factored in?
Data Classification
Does the service come in contact with any sensitive data?
If yes:
What type of data? (PII, passwords, keys, financial information,
credit cards,ACH, etc.)
What regulatory compliance applicable to this service?
(SafeHarbor, PCI, SOx?)
Is the data stored, or just passed in transit?
Can any sensitive data end up in log files?
Can sensitive, but necessary data use a proxy token instead?
Can this information leave the organization and goto a third
party?
SPOFs
What SPOFs exist, if any?
What's the timescale for this SPOF?
What's the timescale for transition from leader to follower or
follower to leader?
If stateful, is "split brain" possible?
NOTE: State is a SPOF: failing over state takes time.
Escalation Path
What's the escalation path inside of the organization?
What's the escalation path outside of the organization? Open
Source community or commercial support?
Is there semi-regular training on how to triage and escalate?
Is there a playbook for relevant low-level debugging tools available
for use?
TIP: Use automatic escalations within PagerDuty or OpsGenie.
TIP: Use standardized service techniques to create fungible support
resources.
Quantiles of Health
Can health be defined in terms of quantiles vs binary up/down?
What are the upper and lower bounds for healthy?
What system is authoritative for determining if something is
healthy?
How can an external actor verify if the system is healthy? Is there
a command-line tool or API?
Canary
Does the request have a "canary request mode?"
Can this be enabled per customer?
Is the canary mode used in monitoring to validate end-to-end
functionality?
Downstream Services
How does this service respond upstream to failures in its downstream
dependencies?
Is there a metric to indicate timed-out requests?
Is there a feature-flag that enables a circuit-breaker?
How are connectivity problems retried in the system?
Retry the same backend?
Retry a different backend?
Timeout?
Is there a deadline timer passed in?
Is a header added to indicate partial failure of downstream services?
Are response codes standardized?
Architectural Limits
What are the expected limits of this system?
How often is "peak-load" defined?
Is there 3x capacity for the service in order to absorb reasonable
bustiness?
Is the band of nominal resource usage defined?
• "At 10K RPS, network utilization should be between
200-300Mbps, using two cores at ~60% utilization, 50MB of
RAM, and doing an average of 5-10 disk IOPs. All values are
+/- 25%."
Logging
How is logging setup?
What gets logged?
What is the minimum log retention?
How often are logs rotated? By size or by fixed interval?
Are logs shipped off box?
Are they streamed without hitting disk?
Is there any sensitive data in the logs?
Load Shedding
How can you load-shed?
Are there any feature flags that enable circuit breakers that
reduce expensive functionality?
Prepare For the Worst
Assume the service can't come back online, what's the impact?
Backup and Restore
Does this system have a reproducible build?
How often are backups taken?
How often are the restores executed?
What's the recovery point objective?
What's the mean time to recovery?
What's the definition of acceptable data loss in the event of
failure?
Deployment
How is this service tested and deployed?
Is the deployment in prod any different than test?
How can you roll back?
Is the application part of a CI/CD pipeline?
How is production data scrubbed and used in staging/UAT in
order to simulate production-like loads without using production
data?

More Related Content

What's hot

Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015
Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015
Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015Fastly
 
Successful Software Development with Apache Cassandra
Successful Software Development with Apache CassandraSuccessful Software Development with Apache Cassandra
Successful Software Development with Apache Cassandrazznate
 
Stampede con 2014 cassandra in the real world
Stampede con 2014   cassandra in the real worldStampede con 2014   cassandra in the real world
Stampede con 2014 cassandra in the real worldzznate
 
Altitude SF 2017: Security at the edge
Altitude SF 2017: Security at the edgeAltitude SF 2017: Security at the edge
Altitude SF 2017: Security at the edgeFastly
 
Software Development with Apache Cassandra
Software Development with Apache CassandraSoftware Development with Apache Cassandra
Software Development with Apache Cassandrazznate
 
Integrated Cache on Netscaler
Integrated Cache on NetscalerIntegrated Cache on Netscaler
Integrated Cache on NetscalerMark Hillick
 
NGINX Can Do That? Test Drive Your Config File!
NGINX Can Do That? Test Drive Your Config File!NGINX Can Do That? Test Drive Your Config File!
NGINX Can Do That? Test Drive Your Config File!Jeff Anderson
 
Cassandra and security
Cassandra and securityCassandra and security
Cassandra and securityBen Bromhead
 
Altitude SF 2017: Debugging Fastly VCL 101
Altitude SF 2017: Debugging Fastly VCL 101Altitude SF 2017: Debugging Fastly VCL 101
Altitude SF 2017: Debugging Fastly VCL 101Fastly
 
DAST в CI/CD, Ольга Свиридова
DAST в CI/CD, Ольга СвиридоваDAST в CI/CD, Ольга Свиридова
DAST в CI/CD, Ольга СвиридоваMail.ru Group
 
Rails Caching Secrets from the Edge
Rails Caching Secrets from the EdgeRails Caching Secrets from the Edge
Rails Caching Secrets from the EdgeMichael May
 
NGINX High-performance Caching
NGINX High-performance CachingNGINX High-performance Caching
NGINX High-performance CachingNGINX, Inc.
 
How to Use Cryptography Properly: Common Mistakes People Make When Using Cry...
How to Use Cryptography Properly:  Common Mistakes People Make When Using Cry...How to Use Cryptography Properly:  Common Mistakes People Make When Using Cry...
How to Use Cryptography Properly: Common Mistakes People Make When Using Cry...All Things Open
 
Hardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoiaHardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoiazznate
 
BloodHound: Attack Graphs Practically Applied to Active Directory
BloodHound: Attack Graphs Practically Applied to Active DirectoryBloodHound: Attack Graphs Practically Applied to Active Directory
BloodHound: Attack Graphs Practically Applied to Active DirectoryAndy Robbins
 
Comparing ZooKeeper and Consul
Comparing ZooKeeper and ConsulComparing ZooKeeper and Consul
Comparing ZooKeeper and ConsulIvan Glushkov
 
Content Caching with NGINX and NGINX Plus
Content Caching with NGINX and NGINX PlusContent Caching with NGINX and NGINX Plus
Content Caching with NGINX and NGINX PlusKevin Jones
 
Hardening cassandra q2_2016
Hardening cassandra q2_2016Hardening cassandra q2_2016
Hardening cassandra q2_2016zznate
 

What's hot (20)

Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015
Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015
Mitigating Security Threats with Fastly - Joe Williams at Fastly Altitude 2015
 
Successful Software Development with Apache Cassandra
Successful Software Development with Apache CassandraSuccessful Software Development with Apache Cassandra
Successful Software Development with Apache Cassandra
 
Stampede con 2014 cassandra in the real world
Stampede con 2014   cassandra in the real worldStampede con 2014   cassandra in the real world
Stampede con 2014 cassandra in the real world
 
Altitude SF 2017: Security at the edge
Altitude SF 2017: Security at the edgeAltitude SF 2017: Security at the edge
Altitude SF 2017: Security at the edge
 
Software Development with Apache Cassandra
Software Development with Apache CassandraSoftware Development with Apache Cassandra
Software Development with Apache Cassandra
 
Integrated Cache on Netscaler
Integrated Cache on NetscalerIntegrated Cache on Netscaler
Integrated Cache on Netscaler
 
NGINX Can Do That? Test Drive Your Config File!
NGINX Can Do That? Test Drive Your Config File!NGINX Can Do That? Test Drive Your Config File!
NGINX Can Do That? Test Drive Your Config File!
 
Cassandra and security
Cassandra and securityCassandra and security
Cassandra and security
 
Data Encryption at Rest
Data Encryption at RestData Encryption at Rest
Data Encryption at Rest
 
Altitude SF 2017: Debugging Fastly VCL 101
Altitude SF 2017: Debugging Fastly VCL 101Altitude SF 2017: Debugging Fastly VCL 101
Altitude SF 2017: Debugging Fastly VCL 101
 
DAST в CI/CD, Ольга Свиридова
DAST в CI/CD, Ольга СвиридоваDAST в CI/CD, Ольга Свиридова
DAST в CI/CD, Ольга Свиридова
 
Rails Caching Secrets from the Edge
Rails Caching Secrets from the EdgeRails Caching Secrets from the Edge
Rails Caching Secrets from the Edge
 
NGINX High-performance Caching
NGINX High-performance CachingNGINX High-performance Caching
NGINX High-performance Caching
 
How to Use Cryptography Properly: Common Mistakes People Make When Using Cry...
How to Use Cryptography Properly:  Common Mistakes People Make When Using Cry...How to Use Cryptography Properly:  Common Mistakes People Make When Using Cry...
How to Use Cryptography Properly: Common Mistakes People Make When Using Cry...
 
Hardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoiaHardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoia
 
BloodHound: Attack Graphs Practically Applied to Active Directory
BloodHound: Attack Graphs Practically Applied to Active DirectoryBloodHound: Attack Graphs Practically Applied to Active Directory
BloodHound: Attack Graphs Practically Applied to Active Directory
 
Comparing ZooKeeper and Consul
Comparing ZooKeeper and ConsulComparing ZooKeeper and Consul
Comparing ZooKeeper and Consul
 
Content Caching with NGINX and NGINX Plus
Content Caching with NGINX and NGINX PlusContent Caching with NGINX and NGINX Plus
Content Caching with NGINX and NGINX Plus
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Hardening cassandra q2_2016
Hardening cassandra q2_2016Hardening cassandra q2_2016
Hardening cassandra q2_2016
 

Viewers also liked

[NYC Meetup] Docker at Nuxeo
[NYC Meetup] Docker at Nuxeo[NYC Meetup] Docker at Nuxeo
[NYC Meetup] Docker at NuxeoNuxeo
 
etcd based PostgreSQL HA Cluster
etcd based PostgreSQL HA Clusteretcd based PostgreSQL HA Cluster
etcd based PostgreSQL HA Clusterwinsletts
 
KubeCon EU 2016: Full Automatic Database: PostgreSQL HA with Kubernetes
KubeCon EU 2016: Full Automatic Database: PostgreSQL HA with KubernetesKubeCon EU 2016: Full Automatic Database: PostgreSQL HA with Kubernetes
KubeCon EU 2016: Full Automatic Database: PostgreSQL HA with KubernetesKubeAcademy
 
High Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniHigh Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniZalando Technology
 
Ingesting Drone Data into Big Data Platforms
Ingesting Drone Data into Big Data Platforms Ingesting Drone Data into Big Data Platforms
Ingesting Drone Data into Big Data Platforms Timothy Spann
 
PostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesPostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesSean Chittenden
 

Viewers also liked (7)

FreeBSD: Dev to Prod
FreeBSD: Dev to ProdFreeBSD: Dev to Prod
FreeBSD: Dev to Prod
 
[NYC Meetup] Docker at Nuxeo
[NYC Meetup] Docker at Nuxeo[NYC Meetup] Docker at Nuxeo
[NYC Meetup] Docker at Nuxeo
 
etcd based PostgreSQL HA Cluster
etcd based PostgreSQL HA Clusteretcd based PostgreSQL HA Cluster
etcd based PostgreSQL HA Cluster
 
KubeCon EU 2016: Full Automatic Database: PostgreSQL HA with Kubernetes
KubeCon EU 2016: Full Automatic Database: PostgreSQL HA with KubernetesKubeCon EU 2016: Full Automatic Database: PostgreSQL HA with Kubernetes
KubeCon EU 2016: Full Automatic Database: PostgreSQL HA with Kubernetes
 
High Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniHigh Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando Patroni
 
Ingesting Drone Data into Big Data Platforms
Ingesting Drone Data into Big Data Platforms Ingesting Drone Data into Big Data Platforms
Ingesting Drone Data into Big Data Platforms
 
PostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesPostgreSQL + ZFS best practices
PostgreSQL + ZFS best practices
 

Similar to Production Readiness Strategies in an Automated World

261197832 8-performance-tuning-part i
261197832 8-performance-tuning-part i261197832 8-performance-tuning-part i
261197832 8-performance-tuning-part iNaviSoft
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics HeroTechWell
 
香港六合彩
香港六合彩香港六合彩
香港六合彩taoyan
 
Software Architecture and Predictive Models in R
Software Architecture and Predictive Models in RSoftware Architecture and Predictive Models in R
Software Architecture and Predictive Models in RHarlan Harris
 
CD presentation march 12th, 2018
CD presentation march 12th, 2018CD presentation march 12th, 2018
CD presentation march 12th, 2018Ran Levy
 
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suroDevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suroGaurav "GP" Pal
 
Patterns & Practices of Microservices
Patterns & Practices of MicroservicesPatterns & Practices of Microservices
Patterns & Practices of MicroservicesWesley Reisz
 
Oracle Drivers configuration for High Availability
Oracle Drivers configuration for High AvailabilityOracle Drivers configuration for High Availability
Oracle Drivers configuration for High AvailabilityLudovico Caldara
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Aaron Shilo
 
Improving DevOps through Cloud Automation and Management - Real-World Rocket ...
Improving DevOps through Cloud Automation and Management - Real-World Rocket ...Improving DevOps through Cloud Automation and Management - Real-World Rocket ...
Improving DevOps through Cloud Automation and Management - Real-World Rocket ...Ostrato
 
From Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auFrom Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auevanbottcher
 
Nonfunctional Testing: Examine the Other Side of the Coin
Nonfunctional Testing: Examine the Other Side of the CoinNonfunctional Testing: Examine the Other Side of the Coin
Nonfunctional Testing: Examine the Other Side of the CoinTechWell
 
Preparing for Enterprise Continuous Delivery - 5 Critical Steps
Preparing for Enterprise Continuous Delivery - 5 Critical StepsPreparing for Enterprise Continuous Delivery - 5 Critical Steps
Preparing for Enterprise Continuous Delivery - 5 Critical StepsXebiaLabs
 
VMworld Europe 2014: Virtualizing Databases Doing IT Right – The Sequel
VMworld Europe 2014: Virtualizing Databases Doing IT Right – The SequelVMworld Europe 2014: Virtualizing Databases Doing IT Right – The Sequel
VMworld Europe 2014: Virtualizing Databases Doing IT Right – The SequelVMworld
 
5 Steps on the Way to Continuous Delivery
5 Steps on the Way to Continuous Delivery5 Steps on the Way to Continuous Delivery
5 Steps on the Way to Continuous DeliveryXebiaLabs
 
שבוע אורקל 2016
שבוע אורקל 2016שבוע אורקל 2016
שבוע אורקל 2016Aaron Shilo
 
제3회난공불락 오픈소스 인프라세미나 - MySQL Performance
제3회난공불락 오픈소스 인프라세미나 - MySQL Performance제3회난공불락 오픈소스 인프라세미나 - MySQL Performance
제3회난공불락 오픈소스 인프라세미나 - MySQL PerformanceTommy Lee
 
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsOracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsZohar Elkayam
 

Similar to Production Readiness Strategies in an Automated World (20)

261197832 8-performance-tuning-part i
261197832 8-performance-tuning-part i261197832 8-performance-tuning-part i
261197832 8-performance-tuning-part i
 
Become a Performance Diagnostics Hero
Become a Performance Diagnostics HeroBecome a Performance Diagnostics Hero
Become a Performance Diagnostics Hero
 
香港六合彩
香港六合彩香港六合彩
香港六合彩
 
Software Architecture and Predictive Models in R
Software Architecture and Predictive Models in RSoftware Architecture and Predictive Models in R
Software Architecture and Predictive Models in R
 
Performance testing material
Performance testing materialPerformance testing material
Performance testing material
 
CD presentation march 12th, 2018
CD presentation march 12th, 2018CD presentation march 12th, 2018
CD presentation march 12th, 2018
 
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suroDevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
 
Patterns & Practices of Microservices
Patterns & Practices of MicroservicesPatterns & Practices of Microservices
Patterns & Practices of Microservices
 
Oracle Drivers configuration for High Availability
Oracle Drivers configuration for High AvailabilityOracle Drivers configuration for High Availability
Oracle Drivers configuration for High Availability
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
 
Introduction to DevOps
Introduction to DevOpsIntroduction to DevOps
Introduction to DevOps
 
Improving DevOps through Cloud Automation and Management - Real-World Rocket ...
Improving DevOps through Cloud Automation and Management - Real-World Rocket ...Improving DevOps through Cloud Automation and Management - Real-World Rocket ...
Improving DevOps through Cloud Automation and Management - Real-World Rocket ...
 
From Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auFrom Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.au
 
Nonfunctional Testing: Examine the Other Side of the Coin
Nonfunctional Testing: Examine the Other Side of the CoinNonfunctional Testing: Examine the Other Side of the Coin
Nonfunctional Testing: Examine the Other Side of the Coin
 
Preparing for Enterprise Continuous Delivery - 5 Critical Steps
Preparing for Enterprise Continuous Delivery - 5 Critical StepsPreparing for Enterprise Continuous Delivery - 5 Critical Steps
Preparing for Enterprise Continuous Delivery - 5 Critical Steps
 
VMworld Europe 2014: Virtualizing Databases Doing IT Right – The Sequel
VMworld Europe 2014: Virtualizing Databases Doing IT Right – The SequelVMworld Europe 2014: Virtualizing Databases Doing IT Right – The Sequel
VMworld Europe 2014: Virtualizing Databases Doing IT Right – The Sequel
 
5 Steps on the Way to Continuous Delivery
5 Steps on the Way to Continuous Delivery5 Steps on the Way to Continuous Delivery
5 Steps on the Way to Continuous Delivery
 
שבוע אורקל 2016
שבוע אורקל 2016שבוע אורקל 2016
שבוע אורקל 2016
 
제3회난공불락 오픈소스 인프라세미나 - MySQL Performance
제3회난공불락 오픈소스 인프라세미나 - MySQL Performance제3회난공불락 오픈소스 인프라세미나 - MySQL Performance
제3회난공불락 오픈소스 인프라세미나 - MySQL Performance
 
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAsOracle Database Performance Tuning Advanced Features and Best Practices for DBAs
Oracle Database Performance Tuning Advanced Features and Best Practices for DBAs
 

More from Sean Chittenden

pg_prefaulter: Scaling WAL Performance
pg_prefaulter: Scaling WAL Performancepg_prefaulter: Scaling WAL Performance
pg_prefaulter: Scaling WAL PerformanceSean Chittenden
 
FreeBSD VPC Introduction
FreeBSD VPC IntroductionFreeBSD VPC Introduction
FreeBSD VPC IntroductionSean Chittenden
 
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Life Cycle of Metrics, Alerting, and Performance Monitoring in MicroservicesLife Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Life Cycle of Metrics, Alerting, and Performance Monitoring in MicroservicesSean Chittenden
 
Codified PostgreSQL Schema
Codified PostgreSQL SchemaCodified PostgreSQL Schema
Codified PostgreSQL SchemaSean Chittenden
 
PostgreSQL on ZFS Lightning Talk
PostgreSQL on ZFS Lightning TalkPostgreSQL on ZFS Lightning Talk
PostgreSQL on ZFS Lightning TalkSean Chittenden
 

More from Sean Chittenden (7)

BSDCan '19 Core Update
BSDCan '19 Core UpdateBSDCan '19 Core Update
BSDCan '19 Core Update
 
pg_prefaulter: Scaling WAL Performance
pg_prefaulter: Scaling WAL Performancepg_prefaulter: Scaling WAL Performance
pg_prefaulter: Scaling WAL Performance
 
FreeBSD VPC Introduction
FreeBSD VPC IntroductionFreeBSD VPC Introduction
FreeBSD VPC Introduction
 
Universal Userland
Universal UserlandUniversal Userland
Universal Userland
 
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Life Cycle of Metrics, Alerting, and Performance Monitoring in MicroservicesLife Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
 
Codified PostgreSQL Schema
Codified PostgreSQL SchemaCodified PostgreSQL Schema
Codified PostgreSQL Schema
 
PostgreSQL on ZFS Lightning Talk
PostgreSQL on ZFS Lightning TalkPostgreSQL on ZFS Lightning Talk
PostgreSQL on ZFS Lightning Talk
 

Recently uploaded

Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleanscorenetworkseo
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
Intellectual property rightsand its types.pptx
Intellectual property rightsand its types.pptxIntellectual property rightsand its types.pptx
Intellectual property rightsand its types.pptxBipin Adhikari
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 

Recently uploaded (20)

Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleans
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
Intellectual property rightsand its types.pptx
Intellectual property rightsand its types.pptxIntellectual property rightsand its types.pptx
Intellectual property rightsand its types.pptx
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 

Production Readiness Strategies in an Automated World

  • 9. Software Life Cycle Time Prod 1) Idea! 2) Production Ready R&D
  • 10. Software Life Cycle Time Prod 1) Idea! 2) Production Ready R&D
  • 11. Software Life Cycle Time Prod 1) Idea! 2) Production Ready R&D
  • 12. Software Life Cycle Time Prod 1) Idea! 2) Production Ready R&D
  • 13. Software Life Cycle Time Readiness 1) Idea! 2) Production Ready 3) End of Life 2.9) "It’ll be time to wind this service down when ___ happens and ___ comes online." R&D
  • 14. Software Life Cycle Time Production 1) Idea! 2) Production Ready 3) End of Life "Production Supported" "Oops" R&D
  • 15. Software Life Cycle Time Production 1) Idea! 2) Production Ready 4) End of Life "Production Supported" 3) "Oops" R&D
  • 16. Software Life Cycle Time Production 1) Idea! N) End of Life "Production Supported" Forced to fix code or docs. R&D
  • 17. Software Life Cycle Time Production 1) Idea! 2) Production Ready N) End of Life "Production Supported" "Drug feet to produce docs." [3,M) "Oops" R&D N-1) "That’s it, we’ve had enough…"
  • 18. Software Life Cycle Time Production 1) Idea! 2) Production Ready N) End of Life "Production Supported" [3,M) "Oops" R&D N-2) "That’s it, we’ve had enough…" N-1) "Just support it until the next version is out"
  • 19. Operations in the "Real World"
  • 20. Complexity Abound The Echo Service: Stateless HTTP Echo $ go get github.com/hashicorp/http-echo $ http-echo -text foo $ curl http://127.0.0.1:5678/ foo
  • 21. Echo as a Service Components: • Echo Service • Load Balancer • "Hardware" / OS • Metrics Agent • Logs Management • Reproducible Builds $ cd $GOPATH/src/github.com/hashicorp/http-echo/ $ git checkout 87ee38c517094993932bd76b37af03980e8c4151 $ go build
  • 22. Complexity In The Simple Case Simple Example: The Echo Service Minimum of 6x dimensions to be concerned about No downstream services: only request + response
  • 23. Echo as a Service Dimensions of Work to measure: • CPU • RAM usage • Network Usage • TCP accept/connection rate • Disk Capacity • Disk IO (maybe?) • Stability • Request volume • Request Latency
  • 24. "Can't Escape the Signal, Mal" The Echo Service: Stateless HTTP Echo 2016/11/18 03:29:58 Server is listening on :5678 2016/11/18 03:30:00 127.0.0.1:5678 127.0.0.1:61932 "GET / HTTP/1.1" 200 4 "curl/7.51.0" 15.94µs
  • 25. Echo as a Service Complexity Factor: ~10
  • 26. Echo's Operational Concerns Loss Aversion • Uptime • Secrets • Planned Failure Modes: failure on a probability curve • Server Uptime (e.g. OS or Hardware) • Unplanned Failure Modes (e.g. DC or AZ fails)
  • 27. Entropy and Failure: Best Friends
  • 28. Echo's Operational Concerns Loss Aversion • Uptime • Secrets • Planned Failure Modes: failure on a probability curve • Server Uptime (e.g. OS or Hardware) • Unplanned Failure Modes (e.g. DC or AZ fails in an earthquake) • Success Failure Modes Randall A. Lewis and David H. Reiley. 2013. Down-to-the-minute effects of super bowl advertising on online search behavior.
 http://dx.doi.org/10.1145/2482540.2482600
  • 29. Echo's Operational Concerns Loss Aversion • Uptime • Secrets • Planned Failure Modes: failure on a probability curve • Server Uptime (e.g. OS or Hardware) • Unplanned Failure Modes (e.g. DC or AZ fails) • Success Failure Modes • Known Architectural Limits • Unknown Architectural Limits
  • 31. Lurking Significant Details Imagine a more complex service: • an API server that fans out to ~20 downstream services • Uses async scatter/gather to fan out requests • Transient failures become the norm
  • 33. SQL WAL Files Log Files PostgreSQL as a Service Components: • PostgreSQL • Connection Pooler (pgbouncer) • PITR Manager (WAL-E, omnipitr, pgBackRest) • Logs Analyzer (pgbadger, pgfouine) • Metrics Agent • Failover Manager (Connections, State, Data Continuity/Self-Healing) • SchemaVersioning
  • 34. SQL WAL Files Log Files PostgreSQL as a Service Dimensions of Work to measure: • CPU • RAM usage • Network Usage • TCP accept/connection rate • Disk Capacity • Maybe disk IO (read, write) • Stability • Request volume • Request Latency • Query performance • Kernel Lock Contention • Userland buffer eviction rate • Cache-miss rate • Size of blast radius • ... etc.
  • 35. SQL WAL Files Log Files PostgreSQL as a Service Complexity Factor: ~30 x (number of tables x metrics per table)
  • 36. SQL WAL Files Log Files PostgreSQL as a Service Database PSATangent: • Don't confuse complexity with value. • Databases are amazingly useful things because of their productivity and value as a network service. • Databases assume the lions share of complexity burden: centralized complexity is easier than distributed complexity.
  • 37. How do you systematically address inherent, necessary complexity?
  • 38. Checklists • Identify Problems • Read - Do Checklists • Ensure critical steps hit • Useful in emergencies (plane on fire? Do X,Y, and Z...) • Do - Confirm Checklists • Verify muscle memory • Combats atrophy and fatigue
  • 41. Good Checklists • Have a clear purpose • Are brief: 10-20 items, fit on a single page • Focus on what's essential/mandatory • Enumerate what must be done (and frequently forgotten) • Don't replace personal judgement or skill • Enforce discipline • Provide tools for collaboration and communication • Establish protocol or enforce a norm
  • 42. Good Checklists • Have a clear purpose • Are brief: 10-20 items, fit on a single page • Focus on what's essential/mandatory • Enumerate what must be done (and frequently forgotten) • Don't replace personal judgement • Enforce discipline • Provide tools for collaboration and communication • Establish protocol or enforce a norm
  • 45. Production Ready SQL WAL Files Log Files Organizational Challenges Technical Challenges
  • 46. Organizational Prerequisites Standardized Jargon (e.g. SEV1 vs SEV2, client vs consumer) Policy for Unique Service namespaces (app1 vs appN vs dbN) # Deny registration access to services prefixed # "app1-". Discovery of the service is still # allowed in read mode. service "app1-" { policy = "read" } service "app2-" { policy = "write" }
  • 47. Organizational Prerequisites Standardized Jargon (e.g. SEV1 vs SEV2, client vs consumer) Policy for Unique Service namespaces (app1 vs appN vs dbN) Naming conventions established within a service (app1-api1 vs app1-dbN) Rules of Engagement outlining how outage is: 1. Identified 2. Responded to 3. Recovery is conducted 4. Prevention 5. Preparation 6. GOTO step #1
  • 48. Organizational Prerequisites Standardized Jargon (e.g. SEV1 vs SEV2, client vs consumer) Policy for Unique Service namespaces (app1 vs appN vs dbN) Naming conventions established within a service (app1-api1 vs app1-dbN) Rules of Engagement outlining how outage is handled Centralized documentation Establish a culture of systems thinking
  • 49. Organizational Prerequisites Establish a culture of systems thinking: •a system is composed of parts •a system is greater than the sum of its parts •all the parts of a system must be related (directly or indirectly), else there are really two or more distinct systems •a system is encapsulated (has a boundary) •a system can be nested inside another system •a system can overlap with another system •a system consists of processes that transform inputs into outputs •a system is autonomous in fulfilling its purpose:
 
 A car is not a system. A car with a driver is a system.
  • 50. Organizational Prerequisites Standardized Jargon (e.g. SEV1 vs SEV2, client vs consumer) Policy for Unique Service namespaces (app1 vs appN vs dbN) Naming conventions established within a service (app1-api1 vs app1-dbN) Rules of Engagement outlining how outage is handled Centralized documentation Establish a culture of SystemsThinking Establish end-to-end ownership Decoupled service names from team names
  • 51. Why do we care? • We aren't always going to be working on our code. • We need to establish a culture of maintenance and the necessary supporting systems, both organizational and technical.
  • 52. Audit Reduced to a Checklist High-level summary of the service? Stateful or Stateless List of important consumers Release Process On-Call Instructions / Incident Response Health Defined Customer Service Endpoint? Backups Geographic Redundancy
  • 53. Audit back to Checklist High-level summary of the service? Stateful or Stateless List of important consumers Release Process On-Call Instructions / Incident Response Health Defined Customer Service Endpoint? Backups Geographic Redundancy => Organizational Concern =>Technical Concern =>Tech and Org Concern => Organizational Concern => Organizational Concern =>Technical Concern => Organizational Concern => Organizational Concern => Organizational Concern
  • 54. Plan, Doc, Vet, and Decide Starting Here... Time Prod 1) Idea! 2) Production Ready R&D
  • 55. ... ideally before here... Time Production 1) Idea! N) End of Life "Production Supported" Forced to fix code or docs. R&D
  • 56. ... but NO later than here!!! Time Production 1) Idea! N) End of Life "Production Supported" Forced to fix code or docs. R&D
  • 57. (It's good to refine here when this happens) Time Production 1) Idea! N) End of Life "Production Supported" Forced to fix code or docs. R&D
  • 58. Value from Checklists High-level summary of the service? Stateful or Stateless List of important consumers Release Process On-Call Instructions / Incident Response Health Defined Customer Service Endpoint? Backups Geographic Redundancy => FasterTraining / Fungible Skills => Universal / Consistent / Standard => Faster Understanding andTraining => Faster Resolution / Fungible Skills => Larger Pool / Increased Sympathy => Standardized Resolution => One Source ofTruth => Standard Procedures => Unplanned Disasters Mitigated
  • 59. How do you build a checklist?
  • 60. Summary: Vertical Places to Look SQL WAL Files Log Files Organizational Challenges Technical Challenges
  • 61. Summary: Horizontal Places to Look Time Prod 1) Idea! 2) Production Ready R&D
  • 62. Questions? Thank the audience for their time. Name: Sean Chittenden Twitter: @SeanChittenden
  • 64. Seed Questions for Checklists
  • 65. Service Checklist: Overview Service Overview • Description and relevance to the business • Short explanation of how the service fits into the eco system of micro services • Pointers to more detailed documentation • Pointers to the current team owners Stateful or Stateless service Does the service employ any internal caching Dependency management: e.g. embedded libraries that have been vendor/'ed (not necessary with Go, this is self-evident)
  • 66. Service Overview $ head my-service.job # This declares a job named "service123". There can be exactly one # job declaration per job file. job "service123" { # Specify this job should run in the region named "us". Regions # are defined by the Nomad servers' configuration. region = "us" # Spread the tasks in this job between us-west-2 and us-east-1. datacenters = ["us-west-2", "us-east-1"] # Run this job as a "service" type. Each job type has different # properties. See the documentation below for more examples. type = "service" Service Checklist: Overview
  • 67. Service Overview $ head my-docs.job # This declares a job named "docs". There can be exactly one # job declaration per job file. job "docs" { meta { owner = "https://github.com/myorg/myproject/blob/master/owners.md" docs-url = "https://github.com/myorg/myproject" system-summary = "https://github.com/myorg/myproject/blob/master/system-summary.md" } Service Checklist: Overview
  • 68. Service Overview • Auditable via the API:
 http://nomad.service.consul:4646/v1/job/<ID> Service Checklist: Overview
  • 69. List of high-level consumers • API consumed by other services within the organization • Public Internet • Marketing (a/b testing?) • Customer Service Service Confidentiality Classification Sales Information • Unofficial docs that can be used by sales or marketing. Authoritative information comes from the team writing the service. Doesn't need to be final copy, but should include useful figures about this service. Service Checklist: Overview
  • 70. Release Process On-call - what's the fallback strategy for a small service with a team of two? How is the service installed? How is the service configured? How is the service's process managed? • How is it started? • How is it stopped? • Is there a graceful shutdown procedure vs a rapid shutdown procedure? • Can you send a SIGKILL signal to the process? Incident Response
  • 71. Release Process On-call - what's the fallback strategy for a small service with a team of two? How is the service installed? How is the service configured? How is the service's process managed? Is the process management platform-specific? Is there a table mapping each signal to the effect of the signal Process Management Is Process Management hooked into the monitoring and alerting framework? Incident Response
  • 72. Health Health of the Service What is the definition of healthy? TIP: Use Consul Health Checks for Break/Fix { "service": { "name": "redis", "tags": ["master"], "address": "127.0.0.1", "port": 8000, "enableTagOverride": false, "checks": [ { "script": "/usr/local/bin/check_redis.py", "interval": "10s" } ] } }
  • 73. Health of the Service What is the definition of healthy? Is there any Seasonality to the definition of healthy? How do you observe the service? Is there any automated capacity planning attached to the service? Health
  • 74. Customer Service How does customer service interact with this service? Does CS have direct access to PII or other sensitive material? Customer Service
  • 75. Quality Metrics What are the important KPIs coming out of this service? • If you don't measure it, you won't optimize for it. • If you don't measure it, you can't manage it. • You can only succeed at what you can measure. • You can't improve what you don't measure.
  • 76. Quality Metrics What are the important KPIs coming out of this service? Measuring the number of round-trips between Support and Customers/Users Measuring the number of round-trips between Support and Engineering Measuring the "level of effort" or amount of input a person has to submit in order to receive support. Accuracy of information provided by customers? Measure the "rate of access" to PII information.
  • 77. Quality Metrics What are the important KPIs coming out of this service? Strategy: Centralize and poll for number of tagged issues out of GitHub.
  • 78. Organization Prerequisites Define the gradients in an outage • SEV1 - Hard outage, complete loss of service or "major impact to business value/revenue". • SEV2 - Partial outage or impaired service (SLA violation). • SEV3 - Integrity of service issue (bugs). • SEV4 - Non-critical issue that needs to be prioritized 9-5 M-F. • SEV5 - Janitorial work that needs to happen on a routine schedule. Define what it means to follow through with an outage. • What level of follow through is required? • Postmortems? • Who patches it and who receives time to actually fix it permanently?
  • 79. Outage Consequences Revenue Impact User Impact Systems Impact Escalation SEV1 SEV2 SEV3 SEV4 SEV5
  • 80. Outage Consequences Define the gradients in an outage Sketch out the direct and indirect consequences on the system
  • 81. Tracing Is there a tracing token sent by upstream? If not, why not? Is this service at the boundary of HTTP and RPC? Is there an API library available that will automatically inject the tracing token into downstream calls? Can tracing only be used in aggregate or can it be used for individual problems?
  • 82. Geographic Redundancy Is the service geographically redundant or not? If not, why not? If yes: Does this happen automatically?
  • 83. Geographic Redundancy { "Name": "my-query", "Session": "adf4238a-882b-9ddc-4a9d-5b6758e4159e", "Token": "", "Near": "node1", "Service": { "Service": "redis", "Failover": { "NearestN": 3, "Datacenters": ["dc1", "dc2"] }, "OnlyPassing": false, "Tags": ["master", "!experimental"] }, "DNS": { "TTL": "10s" } }
  • 84. Geographic Redundancy Is the service geographically redundant or not? If not, why not? If yes: Does this happen automatically? What mechanisms handle this? Are there any regulatory concerns that come into play? Is the failover process manual? Does this happen at human timescale or on a machine timescale? Is the geographically redundant path continually tested?
  • 85. Active-Active Can this service be active-active? If not, why not? If yes, what kind of locking concerns or information sharing concerns need to be factored in?
  • 86. Data Classification Does the service come in contact with any sensitive data? If yes: What type of data? (PII, passwords, keys, financial information, credit cards,ACH, etc.) What regulatory compliance applicable to this service? (SafeHarbor, PCI, SOx?) Is the data stored, or just passed in transit? Can any sensitive data end up in log files? Can sensitive, but necessary data use a proxy token instead? Can this information leave the organization and goto a third party?
  • 87. SPOFs What SPOFs exist, if any? What's the timescale for this SPOF? What's the timescale for transition from leader to follower or follower to leader? If stateful, is "split brain" possible? NOTE: State is a SPOF: failing over state takes time.
  • 88. Escalation Path What's the escalation path inside of the organization? What's the escalation path outside of the organization? Open Source community or commercial support? Is there semi-regular training on how to triage and escalate? Is there a playbook for relevant low-level debugging tools available for use? TIP: Use automatic escalations within PagerDuty or OpsGenie. TIP: Use standardized service techniques to create fungible support resources.
  • 89. Quantiles of Health Can health be defined in terms of quantiles vs binary up/down? What are the upper and lower bounds for healthy? What system is authoritative for determining if something is healthy? How can an external actor verify if the system is healthy? Is there a command-line tool or API?
  • 90. Canary Does the request have a "canary request mode?" Can this be enabled per customer? Is the canary mode used in monitoring to validate end-to-end functionality?
  • 91. Downstream Services How does this service respond upstream to failures in its downstream dependencies? Is there a metric to indicate timed-out requests? Is there a feature-flag that enables a circuit-breaker? How are connectivity problems retried in the system? Retry the same backend? Retry a different backend? Timeout? Is there a deadline timer passed in? Is a header added to indicate partial failure of downstream services? Are response codes standardized?
  • 92. Architectural Limits What are the expected limits of this system? How often is "peak-load" defined? Is there 3x capacity for the service in order to absorb reasonable bustiness? Is the band of nominal resource usage defined? • "At 10K RPS, network utilization should be between 200-300Mbps, using two cores at ~60% utilization, 50MB of RAM, and doing an average of 5-10 disk IOPs. All values are +/- 25%."
  • 93. Logging How is logging setup? What gets logged? What is the minimum log retention? How often are logs rotated? By size or by fixed interval? Are logs shipped off box? Are they streamed without hitting disk? Is there any sensitive data in the logs?
  • 94. Load Shedding How can you load-shed? Are there any feature flags that enable circuit breakers that reduce expensive functionality?
  • 95. Prepare For the Worst Assume the service can't come back online, what's the impact?
  • 96. Backup and Restore Does this system have a reproducible build? How often are backups taken? How often are the restores executed? What's the recovery point objective? What's the mean time to recovery? What's the definition of acceptable data loss in the event of failure?
  • 97. Deployment How is this service tested and deployed? Is the deployment in prod any different than test? How can you roll back? Is the application part of a CI/CD pipeline? How is production data scrubbed and used in staging/UAT in order to simulate production-like loads without using production data?