Today it’s all about delivering velocity without compromising on quality, yet it’s becoming increasingly difficult for organisations to keep up with the challenges of current release management and traditional operations. The demand for developers to own the end-to-end delivery, including operational ownership, is increasing. A “you build it, you own it” development process requires tools that developers know and understand. So I’d like to introduce “GitOps”- an agile software lifecycle for modern applications.
In this session, I will discuss these industry challenges, including current CICD trends and how they’re converging with operations and monitoring. I’ll also illustrate the GitOps model, identify best practices and tools to use, and explain how you can benefit from adopting this methodology inherited from best practices going back 10-15 years.
3. Hello
● WTF is GitOps
● Why is Cloud Native relevant
● How does GitOps work and in what ways is it different from $MY_DEVOPS
● Tools
● Recap
3
4. Meet Qordoba
● SF based team use machine learning
to create ”local” marketing UX for big
brands
● Rapid iteration while obeying SOC2
compliance
● Google Cloud – Kubernetes & CI
● Weave Cloud – single cont. delivery
& observability pipeline
5.
6. Over 30 releases per day per team, up from 1-2 per week across all teams
1) Estimated time needed to fix prod software bugs ~60% less time
2) Estimated time to respond to customer requests ~43% less time
3) Uptime 99% à 100% (so far…!)
Impact
10. New ways of working
cloud led us to devops
cloud native leads us to gitops
“push code, not containers”
“operations by pull request”
11. • Config is code
• Code must be version controlled
• Config must be version controlled too
GitOps follows the Logic of DevOps
12. GitOps follows the Logic of DevOps
• Config is code
• Code must be version controlled
• Config must be version controlled too
• What can be described can be automated
• Describe everything: code, config,
monitoring & policy; and then keep it in
version control
13. GitOps
• Git as a source of truth for desired state of whole system yes really
the whole system
• Control loop compares desired with actual state to pull changes,
enforce convergent atomic updates and writeback to log in Git
• Diff alerts, eg.:
14. Atomic updates for
declarative stack
Developer experience
is just Git push
Best practice for
Continuous Delivery
with Kubernetes
Kubernetes
Current
State via
Observability
Tools
Control &
Operations
Desired State
in Git Diff
Observe
Orient
Decide
Act
Release
15. What this gets us
• Any developer can use GitHub
• Anyone can join team and ship a new
app or make changes easily
• All changes can be triggered, stored,
audited and validated in Git
And we didn’t have to do anything very
new or clever
16. “The world is envisioned
as a repo and not as a
kubernetes installation"
- Kelsey Hightower
Kubernetes ❤ GitOps
17. Kubernetes is complex, ideally you’d like to…
Make a pull request & just go to a URL to see app change
Avoid kubectl
Have “Bonus points for Metrics… If you give people visibility, they will
stop asking for tools like kubectl to do their job, because now they can
actually observe what’s happening in the cluster”
18. Who is talking about or doing GitOps?
Weaveworks
Cloudbees
Bitnami
OpenFaaS
Hasura
Ocado
Financial Times
& more!
19. 19
About Weaveworks
● Founded in 2014, backed by Google Ventures &
Accel Partners
● Mission: help software teams go faster by
providing technologies that support cloud native
development
22. ● Building cloud-native OSS since 2014
(Weave Net, Moby, Kubernetes, Prometheus)
● Founding member of CNCF
● Alexis Richardson (Weaveworks CEO) is chair of
the CNCF Technical Oversight Committee
● Weave Cloud runs on Kubernetes since 2015
22
About Weaveworks
23. • We use declarative infrastructure ie.
Kubernetes, Docker, Terraform, … and we
“diff all the things”
• Our entire system including code, config,
monitoring rules, dashboards, is described
in GitHub with full audit trail
• We roll out major or minor changes as pull
requests for any updates, outages and D/R
GitOps at Weaveworks
34. CNCF is building a cloud platform
● Goal of a Cloud Platform for era of ubiquitous services
à a bigger deal than the Web
à open like Linux
à everyone is on board this time
● Business Peeps TLDR Cloud Native is Cloud
● Outcome: Innovation and new Business Models for make profit
42. Velocity is a key metric in Continuous Delivery
High-performing teams deploy
more frequently and have
much faster lead times
They make changes with fewer
failures, and recover faster
from failures
200x more frequent
deployments
2,555x shorter lead
times
3x lower
change failure rate
24x faster
recovery from failures
200x
2,555x 3x
24x
Source: 2016 State of DevOps Report (Puppet Labs)
43.
44. Make me a Velocity
Developers write code
that powers Applications
and integrates Services
deployed to a Cloud Platform that is easy, stable & operable
using best practices for Continuous Delivery at high velocity
45. New Cloud Platform
“Just run my code”
Kubernetes
Infra - Cloud & DCs & Edge
Other CNCF
Projects
Local Services &
Data
Code >>
Containers >>
46. 1000s of ways to “Just Run My Code”
● Serverless: Openfaas, Kubeless, OpenEvents, AWS Lambda….
● PaaS (Openshift, Cloud Foundry..), MBaaS, KMaaS, ..
● Kubeflow, Istio, Pachyderm and other k8s native app f/works
● Declarative app def eg compose, ksonnet, ballerina
● Native general frameworks: metaparticle
● Ports: Laravel (PHP!) and other app frameworks to Kube
● Tools: Cert-manager, ChaosIQ, ..
● Explosion of higher order systems is caused by platform
47. Serverless & Kubernetes will converge
● Ubiquity of Kubernetes will pull serverless into the story - from “run my
containers” to “run my code”
● Consumption and packaging of services is where serverless and functions
add value today, and will be part of the Platform. AWS Lambda is a “clue”
not the “answer”.
● Commonly used programming tools will unify Kubernetes, containers,
“serverless”, managed services / APIs
● These models will be cloud agnostic
● The “pay per call” serverless business model will just be a feature of the
cloud platform management layer (eg: AWS Fargate)
48. Getting to a Cloud Platform
2017 2018-20 2020+
Core Platform
- Kubernetes & containers
Observability / Operability
- monitoring (prom.)
- logging (fluentd)
- tracing (jaeger, OT)
Routing
- mesh (envoy, linkerd)
- messaging (nats)
Security:
Spiffe, OPA, SAFE
Storage:
- orchestration
- CSI
- other
Interfaces:
- OpenMetrics
- OpenEvents
Developer On Ramp:
CICD, Helm packaging, &c
Marketplace of Services
and other Add-ons
“Just run my code” user
experiences for 1000s of
different use cases
>> Towards Ubiquity
51. New ways of working
cloud led us to devops
cloud native leads to gitops
“push code not containers”
“operations by pull request”
52. Summary
● Cloud Platform powered by CNCF tools, Kubernetes at the core
● Multi Cloud support: Amazon, Azure, OSS
● Explosion of higher order tools and services
● GitOps for high velocity delivery pipeline
54. ● Why Git
● Examples of what’s in Git (and image repo)
● CICD pipeline
● Security, Compliance & Audit
● Observability & Control
● Tools Overview
GitOps in depth
55
55. GitOps builds on DevOps with Git as a single source of truth for the
desired state of the system
● The entire system state is under version control and described in Git (trunk best)
● Operational changes on production clusters are made by pull request
● Rollback and audit logs are provided via Git
● When disaster strikes, the whole infrastructure can be quickly restored from Git
62. 63
Canonical
source of truth
Clear model with strong separations of concerns
(safety)
Easy rollbacks and reverts (velocity)
Tapping into existing code review tools and
processes
Great compliance tool
Collaboration point between software and
humans
68. Destination
config
apiVersion: config.istio.io/v1beta1
kind: DestinationPolicy
metadata:
name: ratings-lb-policy
namespace: default
spec:
destination:
name: reviews
labels:
version: v1
loadBalancing:
name: ROUND_ROBIN
circuitBreaker:
simpleCb:
maxConnections: 100
httpMaxRequests: 1000
httpMaxRequestsPerConnection: 10
httpConsecutiveErrors: 7
sleepWindow: 15m
httpDetectionInterval: 5m
RANDOM, LEAST_CONN
Limits outgoing connections to
“v1” of the reviews service
● 100 connections
● 1000 concurrent requests
● 10 rps
Load-balances in round-robin
fashion across all reviews “v1”
endpoints
Configures host ejection
● 7 consecutive 5xx errors
● Period of 15 minutes
● Scanned every 5 minutes
69. Egress config
apiVersion: config.istio.io/v1beta1
kind: EgressRule
metadata:
name: foo-egress-rule
spec:
destination:
service: *.foo.com
ports:
- port: 80
protocol: http
- port: 443
protocol: https
Provides access to a set of
services under the foo.com
domain.
Sidecar will handle automatically
upgrading connection to TLS, if
desired.
● Must access as HTTP
● Example:
http://mail.foo.com:443
70. Routing config
apiVersion: config.istio.io/v1beta1
kind: RouteRule
metadata:
name: reviews-rating-jason-rule
namespace: default
spec:
destination:
name: ratings
route:
- labels:
version: v1
weight: 100
match:
source:
name: reviews
labels:
version: v2
request:
headers:
cookie:
regex: "^(.*?;)?(user=jason)(;.*)?"
uri:
For traffic going to the ratings
service send all of it to “v1” if:
● It is coming from “v2” the
reviews services
● And the URL path starts
with /ratings/v2
● And the request contains a
cookie with the value
“user=jason”
71. Redirect Config
Fault Injection
# HTTP Redirect snippet
spec:
destination:
name: ratings
match:
request:
headers:
uri: /v1/getProductRatings
redirect:
uri: /v1/bookRatings
authority: bookratings.default.svc.cluster.local
---
# Fault injection snippet
spec:
destination:
name: reviews
route:
- labels:
version: v1
httpFault:
abort:
percent: 10
httpStatus: 400
HTTP Redirection
● For all requests to
/v1/getProductRatings,
return a 302 with a location
of /v1/bookRatings and
overwrite the
host/authority header.
HTTP Fault injection
● For 10% of requests to v1 of
the reviews service, fail with
a status code of 400
Timeouts, retries, request
rewrites, delays configured
similarly
76. GitOps separation of concerns
CI tooling
Scope: test, build, publish artifacts
● Runs outside the production cluster
● Read access to code repo
● Read/Write access to image repo
● Read/Write access to integration env
● “Push” based
CD tooling
Scope: reconciliation between git and the cluster
● Runs inside the production cluster
● Read/Write access to config repo
● Read access to image repo
● Read/Write access to production cluster
● “Pull” based
78. GitOps enables security
● The CI tooling can be push based but has no production system
access
● The CD tooling is pull based and retains the production
credentials inside the cluster
● Developers can’t push directly to image registry
● Cluster API & credentials are never exposed/cross boundary
● Encrypted API keys and data storage credentials can be stored in
Git and decrypted at deploy time inside the cluster
81. Write back from Kubernetes to maintain TX audit log
○ Config is code & everything is config (‘declarative infra’)
○ Code (& config!) must be version controlled
○ Anything that does not record changes in version
control is harmful – Git as Audit Log
82. Atomic Updates
○ Groups of changes are hard
○ Partial success / failure à redeploy cluster?
○ Want atomic update-in-place
○ Operators can do this. It’s really hard with CI scripts.
○ Git as Transaction Log
84. Typical (not mandatory) Structure of a GitOps repository
● At least 1 repository per application/service
● Config & code in separate repos. Images named via labels.
● Use a separate branch per environment (maps to a Kubernetes
namespace, or cluster)
● Push changes such as the image name, health checks, etc to
staging (or feature) branches first.
● Rolling out to production involves a merge. (use `git merge -s
ours branchname` to skip a set of staging-only changes).
● Use protected branches to enforce code review requirements.
86. Use declarative configuration to define your application and services.
All changes need to go through your git review process – noone should be using
kubectl directly. (also: don’t push from CI to prod)
Use an operator in the cluster to drive the observed cluster state to the desired
state, as declared by your configuration in git
Summary: Three core principles of GitOps
87. Cluster updates are a sequence of atomic transactions which succeed or fail
cleanly, and are so easy to do that your team velocity will rocket up
Git provides a transaction log for rollback, audit, and team work
Config and image repos act as a “firewall” between dev and prod, e.g. so that CI
cannot “own production” if hacked.
Summary: Three technical benefits of GitOps
88. ❯ GitOps operational mindset, all
k8s applications stored in Git.
❯ Securely automate & share
secrets publicly
❯ Asymmetric (public key)
cryptography
❯ Encrypt data up to (and inside)
K8s cluster
Bitnami: Encrypt Kubernetes SecretsSealed
Secrets
101. Improving UX is PART OF DEPLOYMENT
• End user happiness is all
• Integrate GitOps CD pipeline with
tools to observe results of PRs
• Developers have to correlate UX
to operational concepts like
monitoring, tracing, logs
• Like doctors, we must be able to
validate health as well as
diagnose problems
102. Every service should have a unified interactive dash
(eg. metrics + events + actions; image is from Lyft)
104. Three GitOps Takeaways
• Git push is a great DX – “push code not containers" - best
practice for Kubernetes, Cloud Native & Serverless…
• GitOps is about more than triggering cluster deployment via a
PR, it is a full transactional operating model for the whole
stack. It is “scale invariant” and it uses a control loop to
implement a “joined up” pipeline for delivery and observability
• GitOps is different from CI ops. It is based on ‘firewall’ between
Dev and Ops, it guarantees deployments are correct or fail
cleanly, it integrates with Observability & Control tools
107. ● DIY
● CI ops
● PaaS (Heroku, Cloud Foundry …)
● Dedicated modern CD tools
Choices
10
9
108. Not EITHER / OR
● Spinnaker
● Helm
● Weave Flux / Weave Cloud
● JenkinsX
● Skaffold
● Gitkube
● Harness
Dedicated tools for app dev and/or cicd
11
0
109. ● Created by Netflix for Netflix
● Jenkins++ CICD tool, with Pipeline Management and Release Management
● Pipelines GUI, nested pipelines, canary as pipeline…
● Designed for VMs – doesn’t “speak Kubernetes” (also: Terraform?)
● Good if your Release model is “Deploy my VMs and start my cluster”
● “CI Ops”, so Not Good if your Release model is atomic updates pulled by operator
● Does not use Git, uses external DB.
● Audit log & desired state not complete
● Generally complicated with lots of moving parts. Operationally burdensome even if
run in Kubernetes
Spinnaker
11
1
110. ● V2 of Kubernetes templating system
● Writes a group of changes as a “chart” – so can be a packaging tool for Kubernetes
● De facto “app API” for Kubernetes – great for getting started
● *** IS NOT A CD TOOL ***
● CI + Helm is a dangerous pattern
● Non-atomic
● Non-deterministic
● Non-compositional
● Tiller
Helm
11
2
111. ● Created for Kubernetes by Weaveworks, will go to CNCF
● Only does Release Management: pull based CD, policy, staging, audit trail
● Works with any CI but *** does not connect to CI ***
● Watches repos. Updates on label & config change, no need for a “full rebuild”
● Kubernetes native – all Kube objects, also Helm, CRDs – make Helm do GitOps
● Secure (if cluster is)
● Orchestrator forces convergent atomic updates on cluster even for group of
changes – succeeds or fails cleanly, no need for full cluster reboot
● COMPLETE record in Git kept in sync. Rollback & roll forward
● Diffs – continually monitors cluster & repo to spot drift
Weave Flux
11
3
112. ● Simple Gitops model for DEV with Kubernetes
● Push to gitkube remote server that lives in your cluster (ie. runs custom git server
inside Kubernetes cluster)
● Runs build for you, instead of CI. Couples continuous build of Docker images &
continuous deployment to the cluster. These should be decoupled.
● Pushes container into Kubernetes, but not Kube objects, not Helm, not CRDs
● Not atomic or idempotent
● No built in monitoring, so deployments may not converge
● Does not track changes in Git
Gitkube
11
4