Devops kc

Apache Cassandra
Philip Thompson
Software Engineer
DataStax
©2014 DataStax. Do not distribute without consent.
1

Who I am
• Philip Thompson
• Software Engineer at DataStax
• Contributor to Apache Cassandra
• A maintainer of CCM, the Cassandra Cluster Manager

Apache Cassandra™
•Apache Cassandra™ is a massively scalable, open source, NoSQL, distributed
database built for modern, mission-critical online applications.
•Written in Java and is a hybrid of Amazon Dynamo and Google BigTable
•Masterless with no single point of failure
•Distributed and data centre aware
•100% uptime
•Predictable scaling
3

http://techblog.netflix.com/2012/07/lessons-netflix-learned-from-aws-storm.html
©2012 DataStax 7 9

Cluster Architecture
©2012 DataStax
8

Data Distribution
75
0
25
50
Murmur3_Hash_Function(Partition Key) >>
Token

Cassandra - More than one server
• All nodes participate in a
cluster
• Shared nothing
• Add or remove as needed
• More capacity? Add a
server
10
• Each node owns a number of tokens
• Tokens denote a range of keys
• 4 nodes? -> Key range/4
• Each node owns 1/4 the data

Cassandra - Locally Distributed
• Client writes to any
node
• Node coordinates with
others
• Data replicated in
parallel
• Replication factor (RF):
How many copies of
your data?
• RF = 3 here
Each node stores 3/4
of clusters total data.
11

Cassandra - Geographically Distributed
• Client writes local
• Data syncs across WAN
• Replication Factor per DC
Single coordinator
12

Cassandra - Replication Factor
• Replication factor (RF):
How many copies of
your data?
• Replication Factor is set
per keyspace
• Can be altered by
operator
13
RF = 3

Cassandra - Consistency
• Consistency Level (CL)
• Client specifies per read
or write
• ALL = All replicas ack
• QUORUM = > 51% of replicas ack
• LOCAL_QUORUM = > 51% in local DC ack
• ONE = Only one replica acks
14

Cassandra - Transparent to the application
• A single node failure shouldn’t bring failure
• Replication Factor + Consistency Level = Success
• This example:
• RF = 3
• CL = QUORUM
>51% Ack so we are good!
15

Cassandra - Scaling
• Take a cluster of four nodes
• Where does the fifth node go?
• Rebalancing is costly
75
16
0
25
50

Gossip
• Manages cluster state
• Nodes up/down
• Nodes joining/leaving
• Decentralized
• “Heartbeat” every second
• Every node contacts 1-3 other nodes

Snitch
• Responsible for determining cluster topology
• Datacenter awareness
• Tracks node responsiveness
• Many snitches provided out of the box
• SimpleSnitch
• GossipingPropertyFileSnitch (recommended for production)
• EC2Snitch and EC2MultiRegionSnitch
• For use with AWS
• Comparable GCE snitch has just been added
• Custom snitches can be added
20

Anti-Entropy - Hinted Handoff
• Three hour window
• Hints are replayed when node is
restored
• Stored in system.hints table on
coordinator
• Cassandra does not copy Dynamo’s
“sloppy quorum”
22

Anti-Entropy - Repair
• Nodetool repair
• Uses merkle trees for data
comparison
• Should be run weekly.
• Cassandra 2.1 has drastically
improved repair times, thanks to
incremental repair
23

Node Architecture
©2012 DataStax
24

Write Path
commit log
Memtable
SSTable
Write
Memory
Disk

Write Path
• By default data is fsynced every 10s
• This can be configured in cassandra.yaml
commit log
Memtable
SSTable
Write

Read Path
Memtable
SSTable
Read
SSTable
Memory
Disk

Debugging your data model
• Tracing
cqlsh> tracing on;
Now tracing requests.
cqlsh:foo> INSERT INTO test (a, b) VALUES (1, 'example');
Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9
activity | timestamp | source | source_elapsed
-------------------------------------+--------------+-----------+----------------
execute_cql3_query | 00:02:37,015 | 127.0.0.1 | 0
Parsing statement | 00:02:37,015 | 127.0.0.1 | 81
Preparing statement | 00:02:37,015 | 127.0.0.1 | 273
Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 | 540
Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 | 779
Messsage received from /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 63
Applying mutation | 00:02:37,016 | 127.0.0.2 | 220
Acquiring switchLock | 00:02:37,016 | 127.0.0.2 | 250
Appending to commitlog | 00:02:37,016 | 127.0.0.2 | 277
Adding to memtable | 00:02:37,016 | 127.0.0.2 | 378
Enqueuing response to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 710
Sending message to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 888
Messsage received from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2334
Processing response from /127.0.0.2 | 00:02:37,017 | 127.0.0.1 | 2550
Request complete | 00:02:37,017 | 127.0.0.1 | 2581

Nodetool
• Command line interface for monitoring Cassandra and performing routine
database operations
• Commands for viewing detailed metrics for tables, server metrics, and
compaction statistics:
• cfstats: statistics for each table and keyspace
• cfhistograms: statistics about a table, including read/write latency, row size, column count,
and number of SSTables
• netstats: statistics about network operations and connections
• tpstats: statistics about the number of active, pending, and completed tasks for each stage of
Cassandra operations by thread pool
32

Cassandra
• Download from source:
• git clone git://git.apache.org/cassandra.git
• Packaged install and tarballs available:
• http://www.datastax.com/documentation/cassandra/2.1/cassandra/install/install_cassan
draTOC.html

CCM
• CCM - Cassandra Cluster Manager
• https://github.com/pcmanus/ccm
•Warning: not lightweight
• Example:
• ccm create test -v 2.0.1
• ccm populate -n 3
• ccm start

Clients
• Cqlsh
• Bundled with Cassandra
• Drivers
• java: https://github.com/datastax/java-driver
• python: https://github.com/datastax/python-driver
• .net: https://github.com/datastax/csharp-driver
• and more: http://www.datastax.com/download/clientdrivers
• Ruby, C/C++, NodeJS

Get Help
• IRC: #cassandra on freenode
• Mailing Lists
• Subscribe at cassandra.apache.org
• Stack Overflow
• DataStax Docs
• http://www.datastax.com/docs
37

Devops kc

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Devops kc

Similar to Devops kc (20)

Recently uploaded

Recently uploaded (20)

Devops kc