2. Agenda
Who we are
How much we use Cassandra
How we do it
What we learned
2
3. Who we are
Cloud Database Engineering
Development – Cassandra and related tools
Architecture – data modeling and sizing
Operations – availability, performance and maintenance
Operations
24x7 on-call support for all Cassandra clusters
Cassandra operations tools
Proactive problem hunting
Routine and non-routine maintenances
3
4. How much we use Cassandra
30 Number of production clusters
12 Number of multi-region clusters
3 Max regions, one cluster
65 Total TB of data across all clusters
472 Number of Cassandra nodes
72/28 Largest Cassandra cluster (nodes/data in TB)
50k/250k Max read/writes per second on a single cluster
3* Size of Operations team
* Open position for an additional engineer
4
5. I read that Netflix doesn’t have operations
Extension of Amazon’s PaaS
Decentralized Cassandra ops is expensive at scale
Immature product that changes rapidly (and drastically)
Easily apply best practices across all clusters
5
6. How we configure Cassandra in AWS
Most services get their own Cassandra cluster
Mostly m2.4xlarge instances, but considering others
Cassandra and supporting tools baked into the AMI
Data stored on ephemeral drives
Data durability – all writes to all availabilty zones
Alternate AZs in a replication set
RF = 3
6
7. Minimum cluster configuration
Minimum production cluster configuration – 6 nodes
3 auto-scaling groups
2 instances per auto-scaling group
1 availability zone per auto-scaling group
7
9. Tools we use
Administration
Priam
Jenkins
Monitoring and alerting
Cassandra Explorer
Dashboards
Epic
9
10. Tools we use – Priam
Open-sourced Tomcat webapp running on each instance
Multi-region token management via SimpleDB
Node replacement and ring expansion
Backup and restore
Full nightly snapshot backup to S3
Incremental backup of flushed SSTables to S3 every 30 seconds
Metrics collected via JMX
REST API to most nodetool functions
10
11. Tools we use – Cassandra Explorer
• Kiosk mode – no
alerting
• High level cluster
status (thrift, gossip)
• Warns on a small set
of metrics
11
12. Tools we use – Epic
• Netflix-wide
monitoring and
alerting tool based on
RRD
• Priam proxies all JMX
data to Epic
• Very useful for finding
specific issues
12
13. Tools we use – Dashboards
• Next level cluster
metrics
• Throughput
• Latency
• Gossip status
• Maintenance
operations
• Trouble indicators
• Useful for finding
anomalies
• Most investigations
start here
13
14. Tools we use – Jenkins
• Scheduling tool for additional
monitors and maintenance
tasks
• Push button automation for
recurring tasks
• Repairs, upgrades, and other
tasks are only performed
through Jenkins to preserve
history of actions
• On-call dashboard displays
current issues and maintenance
required
14
15. Things we monitor
Cassandra System
Throughput Disk space
Latency Load average
Compactions I/O errors
Repairs Network errors
Pending threads
Dropped operations
Java heap
SSTable counts
Cassandra log files
15
16. Other things we monitor
Compaction predictions
Backup failures
Recent restarts
Schema changes
Monitors
16
17. What we learned
Having Cassandra developers in house is crucial
Repairs are incredibly expensive
Multi-tenanted clusters are challenging
A down node is better than a slow node
Better to compact on our terms and not Cassandra’s
Sizing and tuning is difficult and often done live
Smaller per-node data size is better
17
18. Q&A (and Recommended viewing)
The Best of Times
Taft and Bakersfield are real places
South Park
Later season episodes like F-Word and Elementary School Musical
Caillou
My kids love this show; I don’t know why
Until the Light Takes Us
Scary documentary on Norwegian Black Metal
18
Editor's Notes
Keywords – Agenda
Centralized Cassandra team used as a resource for other teams
Minimum cluster size = 6
Don’t developers do everything?True for most of the services, Cassandra is an exceptionNeeded a team focused on Cassandra so that services could quickly adopt
M2.4xlarge68.4 GB of memory26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)1690 GB of instance storage64-bit platformI/O Performance: HighAPI name: m2.4xlargeEphemeral drives mean that we have to bootstrap new nodes
Brief overview on this slide, go into detail on the next one
Things to cover on this slideHow AWS balances between AZsWhat happens when an AZ goes awayHow PRIAM alternates nodes around the ring, even in MR
(Vijay should have covered a lot of this)Refer back to previous slideREST useful for automation. Do not have to connect to nodes directly or use JMXPriam only supports doubling the ring
Node, AZ and cluster level metricsTime series metrics with extensive historyCan compare multiple metrics one one graphAlso configure to send alerts
Extension of Epic, using preconfigured dashboards for each clusterAdd additional metrics as we learn which to monitor
Cluster level monitoring, or things that we can not easily derive from JMX or Epic
Try to anticipate when a large minor compaction is going to happenFreedom and responsibility has forced us to monitor schema changesWant to understand every time Cassandra restartsAWS very infrequently swaps out bad nodes. Nodes usually become non-responsive
… Developer in house …Quickly find problems by looking into codeDocumentation/tools for troubleshooting are scarce… repairs …Affect entire replication set, cause very high latency in I/O constrained environment… multi-tenant …Hard to track changes being madeShared resources mean that one service can affect another oneIndividual usage only growsMoving services to a new cluster with the service live is non-trivial… smaller per-node data …Instance level operations (bootstrap, compact, etc) are faster