This talk is from Distributed Data Summit SF 2018 - http://distributeddatasummit.com/2018-sf/sessions#netflix2
Operating C* can involve a lot of required manpower, complex automation, or both. Some of this complexity comes from operational/configuration activity of the underlying kernel and hardware but much of it is operation complexity stemming from C* itself. Some examples of this complexity are restarting the database in a safe way, reliability backing up and restoring snapshots, monitoring the health of the datastore, and even ensuring eventual consistency through repair. As a result of these complexities, C* operators end up with complicated operational setups, which are expensive to build, manage and monitor. As part of this talk, we will share lessons learned in managing such complexity via our Priam sidecar including recent innovations in how our sidecar ensures the highest possible uptime and correctness of Cassandra. We then use this to motivate building in the management sidecar directly as part of C* itself (CASSANDRA-14395).
2. Speakers
Joey Lynch
Senior Software Engineer
Cloud Database Engineering
Netflix
Distributed system addict and
data wrangler
Vinay Chella
Senior Software Engineer
Cloud Database Engineering
Netflix
Cassandra Specialist,
Distributed systems engineer
and creator of NdBench
3. ● Operating C*
● Operating C* with a sidecar
● State of community sidecars
● Lessons learned from Priam (Netflix’s main C* sidecar)
● Goals of C* management sidecar (CASSANDRA-14395)
Agenda
4. ● Bootstrap and data movement
● Configuration (files, jmx)
● Maintenance
● Monitoring/Metrics
● Backup/Restore
● Repair
Operating C*
5. Create a New Cluster
● Seeds
● Token assignment
Operating C*: Bootstrapping
Add/Remove/Replace
● Serial or parallel?
● Streaming?
6. Operating C*: Configuration
● Probably Have to Tune
a. cassandra.yaml
b. topology props
c. JVM options
● May Have to Tune
a. Logging
b. Incremental Backup
c. More JVM options
7. Operating C*: Lifecycle
Rolling Restarts (Upgrades)
● Semi-complex single
node procedure
● One at a time is too slow
● Token range aware
restarts?
What happens when
Cassandra dies?
Ring source @ https://v2.overleaf.com/read/zchtrzskkyjb
8. Operating C*: Maintenance
● All the Power of JMX
● … So many possibilities
a. Many work with jmxterm/jmxsh
b. Many only work with Java code
● What if you want to do it on all
nodes?
9. ● Many Metrics (good!)
● How to Collect Them?
○ JMX … no
○ Agent!
● Which agent ...
Operating C*: Monitoring
10. ● Cassandra ring health
depends on replication
● Strategies
○ Monitor replication of
keyspaces
○ Topology Aware
○ Maintenance Aware
Operating C*: Ring Health
15. Separate solutions for ...
● Bootstrap and data movement
● Maintenance
● Configuration (files, jmx)
● Monitoring/Metrics
● Backup/Restore
● Repair
So ... What is needed to use C*?
19. Sidecar: Bootstrapping
Automatic Seed Management using ASGs/db
Automatic Instance Replacement
Equation+Graph from “Cassandra Availability with Virtual Nodes” by Joey Lynch and Josh Snyder
21. Sidecar: Configuration
● Hierarchy
● Flat namespace that
is merged to provide
Priam config
● Functions for
defaults (e.g. based
on cpu)
prod
cass_nflx
i-08da5d...
28. ● Sidecar monitoring local
node
● Can also look at the ring
○ Gossip state
● Can
○ Ask a few Priams
○ Export info to
streaming system
Sidecar: Ring Health
30. Sidecar: Backup/Restore
(coming soon)
● Similar to cassandra-mirror1
● Keeps track of what files have already been uploaded
○ Unlocks minute level snapshot backups
○ Only uploads files once
● Continuous, point in time, self healing, eventually
consistent
1: https://github.com/hashbrowncipher/cassandra-mirror
31. Sidecar: Repair
(coming soon)
1: https://issues.apache.org/jira/browse/CASSANDRA-14346
● Always on
● Just works
● In Cassandra
itself? 1
39. And that’s not even all of ‘em
… but do they solve our
problems?
40. Puppet/
Chef
Terraform CRON Fabric /
SSH in a
for loop
Reaper Priam Jenkins Sensu/
Nagios
Bootstrap No Yes No Maybe No Yes Maybe No
Maintenance Yes Maybe Maybe Yes No Yes Yes No
Configuration Yes Maybe No Maybe No Yes Maybe No
Monitoring Maybe No Maybe Maybe No Yes Maybe Yes
Backup Maybe No Maybe Maybe Maybe Yes Yes No
Repair No No Sorta Yes Yes Soon Yes No
Easy to Use Yes No* Yes Yes Yes No Maybe Yes
Multi-Cloud Yes Yes Yes Yes Yes No Yes Yes
Does it Scale Yes Yes Yes Sorta Yes* Yes Sorta* Yes
44. ● Netflix OSS specific
● Not easy to use externally
Lessons learned from Priam
45. ● Netflix OSS specific
● Not easy to use externally
● Multi-cloud?
Lessons learned from Priam
46. ● Netflix OSS specific
● Not easy to use externally
● Multi-cloud?
● ~3 different
implementations of
everything
Lessons learned from Priam
47. ● Netflix OSS specific
● Not easy to use externally
● Multi-cloud?
● ~3 different
implementations of
everything
● Version compatibility
Lessons learned from Priam
48. Lesson Learned: Control Plane
Orchestrator
Node1
Node2
Node3
1. Restart!
2. Restart!
3. Restart!
“Push Control”
● Relies on unreliable
communication (ssh < http)
● Hard to make eventually
consistent
● Hard to guarantee safety
● Very easy to implement
49. Lesson Learned: Control Plane
Orchestrator?
Node1
Node2
Node3
Restart C*
older than Y!
Desire State Store
(Cassandra, Config)
Anything to do?
Ah, time to
restart!
Is it safe?
“Pull Control”
● Write desires into state store (or config)
● Nodes coordinate through state store as needed
● Tradeoff: Hard to implement