Cloud Journey: Lifting a Major Product to Kubernetes

Welcome to
&
Dock8s Meetup
Robert Werlich
Site Reliability Engineer
robert.werlich@verivox.com
Marlen Blaube
Senior HR Business Partner
marlen.blaube@verivox.com

Cloud Journey: Lifting a Major
Product to Kubernetes
Dock8s Meetup Heidelberg
Feb 27th 2019
Martin Danielsson, Haufe Group, Freiburg
@donmartin76 (Twitter, Github)

Dock8s Meetup Heidelberg, February 27th 2019
whoami
C:> WINDOWS.EXE
C/C++/C# Background
10+ years
$ docker ps
Containers & Kubernetes
Since ~4 years
wicked.haufe.io maintainer
OSS API Management
Solution
Architect
Developer
since 2006

Agenda
Operations

Planning & Politics
We‘ll set the scene a little.

Some numbers
100+ active
git repos
874k LOC
10-15 Developers
200-500
concur-
rent users
Typically
100 req/s
448 GB RAM
56 Cores

Major revenue
Strategic move
to containers Modular
Architecture
Without Container
Experience
Hosted with Hoster
(€€€)
Long Release
cycles
(LOTS of) Manual
Work for Releases
Little Operations
Insight
Error tracking
very difficult
Non-Parity
Dev/Test/Prod
(Cost!)
Legacy Web App
(Java based)

Vision – Goals
Enabling
CI/CD
Automatic
Provisioning
Full Insight
Minimize
Ops

Let‘s go DevOps
in the Cloud!

Project
Interfaces
Technology Processes
HR Topics
(Operations)
Stakeholder
Management

Stakeholder Management
CONVINCE THEM,
DON‘T PERSUADE
THEM
COMMUNICATE
OFTEN AND
CLEARLY
DON‘T
UNDERESTIMATE
TASKS AT HAND
BE TRANSPARENT SHARE
SUCCESSES
BUT ALSO
FAILURES!

Team Setup – Vision
100% DevOps Engineers
T-Shaped Engineers
No dedicated manual testers
Automate! YBI, YRI. Ops experience?

Some HR topics
Release
Managers?
Operations
Responsibility?
Quality
Engineers
(testers)?
On Call Duty?

Technology
This is what you came for…?

Technology Stack
Kubernetes
Azure (Public Cloud)

Steps to DevOps Happiness
Provision Deploy CI/CD
Weekly for Production, Daily for Dev/Test
Ship when ready!

Wait, uh, what…?
Target
“No-Ops”
No long-running
systems
Enable validation of
3rd Party component
upgrades
Incremental
changes
Practice Disaster
Recovery Daily
100% Reproducible
Deployments
On-demand Production
Identical Environments

Code
&
Pipelines
So, it‘s all…
… and pipelines are also code

Incremental Backend Development
Merge feature to
master
•After code
review
•Including test
suite changes
Build master
branch
•Includes unit
testing
•First integration
tests
Deploy to
integration system
•Blue/Green with
integration tests
Deploy to
Production
•Blue/Green with
integration tests

Incremental Frontend Development
Merge feature to
master
•After code
review
•Including test
suite changes
Build master
branch
•Includes unit
testing
•First integration
tests
Deploy to
integration system
•Run e2e
integration tests
•Rollback if
failing
Deploy to
Production
•Run e2e
integration tests
•Rollback if
failing

Stateless Components
Stateful Components

Full Provisioning
Create backup
Provision new
infrastructure
•From backups
•Same as
disaster
recovery!
Deploy
components
•Using
deployment
pipelines
•Partly
parallelized
Top level DNS
switch
•Using DNS
traffic
manager
Destroy old
infrastructure
•If tests
succeed

Persistence Options
Roll your own persistence Persistence “as a service”
Self managed VMs (incl. NFS) Managed Disks
(AWS EBS, Azure Managed Disks)
DBaaS (many options)
Files as a service
(AWS EFS, Azure Files)
Gluster/Ceph FS (cluster)

iDesk2 Deployment Architecture
Resource Group
Kubernetes
Cluster
ks8
Master
ks8
Agent
ks8
Agent n
…
NFS
VM(s)
Postgres
VM(s)
Disks
Disks
• Azure Files not fast enough
• Legacy components depend on
UNIX rights (Azure Files is SMB)
• Azure Disks only ReadWriteOnce
• Azure PGaaS was not yet available
• More „bang for your buck“
• PG Admin knowledge in Team

Endless Variants

Some hints…
Assess your Persistence
Needs early on
If possible, use DBaaS
(avoid NIH syndrome)
Externalize Configuration
Shared File Storage is not
“Cloud Native”

Operations
No. You don‘t get around it. Sorry.

Now that we have Kubernetes…?
Self healing
Robust
Production Ready
Battle proven
“Vertrauen ist gut...
… Kontrolle ist besser!”
Complex
Additional Abstraction
Layer

“Kontrolle” - What do you mean?
Detecting these things is a start...

Fail: Lyin’ Monitors
End-to-End Monitoring
ALL GOOD
People logging in
500
… an entire weekend.

Instrument
Monitor and Alert
Enable Insight

Prometheu
s
A
Metrics
Endpoint
http://A:8080/metrics
JVM Metrics
Node.js Metrics
VM Exporters
(node_exporter)
DB Exporters
(pg_exporter)
Kubernetes Statistics
Prometheus Client based
Custom Exporters
...
BTime Series
DB

Alertmanage
r

Metrics
White Box Black Box
Counters
GaugesHistograms
Summaries Application
Network
Latencies
Errors
Timeouts
Infrastructure
Disk Space
CPU
Memory
Pod Status

Friday 9 o’clock
Newsletter

205’886

Alerting?
On what?

Availability
Infrastructure

https://www.zazzle.com/nines_dont_matter_t_shirt-235118578582589495
Charity Majors says…
(@mipsytipsy)

Percentage of document
retrieval requests served within
0.25 and 1s
Percentage of search requests
answered within 1, 3 and 7.5s
Percentage of Error Pages
Indicators
95% and 98.5%
50%, 95% and 98.5%
<1%
Agreements
Service Level

Holistic View
Instrument early (and lots)
Deployments easier
Less fear of change
We are in control!
hope and think we

Fails: Resiliency Issues
VMs are sometimes
patched and restarted.
Or they just die.
So will any
service on them.
Networks are
unreliable.
Connections will fail.
Use (libraries for)
circuit breakers
and retries.
Re-establishing TLS on
each call to external
services is expensive.
… and the service
will hate you. Use
Keep-Alive.
SPOFs will
eventually fail.
Assess and act.
Learn how to
detect problems.

Would we do it again?

Key Performance
Indicators
• >70% Cost Saving
• Release Effort down >98% via
automation
• Higher Release Pace (3-5/y to 15-20/mo)
• Performance measurable
• Faster Reaction to Issues
• Unlocks Cloud Technology

k8s Ops possible
as a Team
Requires full automation
(also test)
Team dedication Rethinking ops is
challenging
No Silver Bullet
Assess your requirements

Some Links…
kubernetes.io
prometheus.io
grafana.com
azure.com
aws.amazon.com
Twitter @donmartin76
GitHub donmartin76
We’re hiring!
www.haufegroup.com/en/career

Cloud Journey: Lifting a Major Product to Kubernetes

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Cloud Journey: Lifting a Major Product to Kubernetes

Similar to Cloud Journey: Lifting a Major Product to Kubernetes (20)

More from Haufe-Lexware GmbH & Co KG

More from Haufe-Lexware GmbH & Co KG (20)

Recently uploaded

Recently uploaded (20)

Cloud Journey: Lifting a Major Product to Kubernetes

Editor's Notes