Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Introduction to Cassandra 
Patrick McFadin 
Chief Evangelist/Solution Architect - DataStax 
@PatrickMcFadin 
©2013 DataSta...
Cassandra - An introduction
Five years of Cassandra 
0.1 0.3 0.6 0.7 1.0 1.2 
... 
2.0 
0 1 2 3 4 5 
DSE 
Jul-08
Why Cassandra? 
Or any non-relational?
The world has changed 
!More Connected 
!More Data
!Less Tolerant 
For the record. 
Can you spell 
HTTP?
!Traditional solutions… 
!…may not be a good fit
We still try though 
shard 1 shard 2 shard 3 shard 4 
router 
client 
All your wildest ! 
dreams will come ! 
true.
A new plan 
Let’s apply a little Computer Science
Dynamo Paper(2007) 
• How do we build a data store that is: 
• Reliable 
• Performant 
• “Always On” 
• Nothing new and sh...
BigTable(2006) 
• Richer data model 
• 1 key. Lots of values 
• Fast sequential access 
• 38 Papers cited
Cassandra(2008) 
• Distributed features of Dynamo 
• Data Model and storage from 
BigTable 
• February 17, 2010 it graduat...
Cassandra - More than one server 
• All nodes participate in a cluster 
• Shared nothing 
• Add or remove as needed 
• Mor...
http://globalbigdataconference.com/bigdata-bootcamp-sca-january-2014.php 
VLDB benchmark (RWS) 
Cassandra HBase Redis MySQ...
Cassandra - Fully Replicated 
• Client writes local 
• Data syncs across WAN 
• Replication Factor per DC 
16
Focus on a single server 
17
Cassandra - Writes on a single node
Client writes to server
Data written to commit log
Data written to memtable
Server acknowledges to client
Memtable flushed to disk
Dealing with duplicates
Compaction
Compacted file written
Clean up old files
Cassandra - Writes in the cluster
Fully distributed, no SPOF 
p1 
p1 
p1 
p3 
p6 
Client
Partitioning 
Primary key determines placement* 
jim age: 36 car: camaro gender: M 
carol age: 37 car: subaru gender: F 
j...
jim 
carol 
johnny 
suzy 
PK 
MD5 Hash 
5e02739678... 
a9a0198010... 
f4eb27cea7... 
78b421309e... 
MD5* hash 
operation y...
The Token Ring 
Node A 
Node B 
Node D Node C
Start End 
A 0xc000000000..1 0x0000000000..0 
B 0x0000000000..1 0x4000000000..0 
C 0x4000000000..1 0x8000000000..0 
D 0x80...
Start End 
A 0xc000000000..1 0x0000000000..0 
B 0x0000000000..1 0x4000000000..0 
C 0x4000000000..1 0x8000000000..0 
D 0x80...
Start End 
A 0xc000000000..1 0x0000000000..0 
B 0x0000000000..1 0x4000000000..0 
C 0x4000000000..1 0x8000000000..0 
D 0x80...
Start End 
A 0xc000000000..1 0x0000000000..0 
B 0x0000000000..1 0x4000000000..0 
C 0x4000000000..1 0x8000000000..0 
D 0x80...
Start End 
A 0xc000000000..1 0x0000000000..0 
B 0x0000000000..1 0x4000000000..0 
C 0x4000000000..1 0x8000000000..0 
D 0x80...
Node A 
Node B 
Node D Node C 
Replication 
carol a9a0198010...
Node A 
Node B 
Node D Node C 
Replication 
carol a9a0198010...
Node A 
Node B 
Node D Node C 
Replication 
carol a9a0198010... 
Replication factor = 3 
Consistency? 
More on that later
C’’ 
A’’ 
D’ 
C’ 
A’ D 
C 
B’ 
A 
B 
Virtual Nodes 
Node A 
Node B 
Node D Node C 
Without vnodes With vnodes
Cassandra - Reads
Coordinated reads
Consistency Level 
• Set with every read and write 
• ONE 
• QUORUM - >51% replicas ack 
• LOCAL_QUORUM - >51% replicas ac...
QUORUM and availability
Rapid Read Protection 
NONE
Cassandra Query Language - CQL
CQL Tables 
• List of columns 
• No sizes? 
• First item in PRMARY KEY is the 
partition key 
CREATE TABLE users (! 
usern...
CQL Inserts 
• Insert will always overwrite 
INSERT INTO users (username, firstname, lastname, ! 
email, password, created...
Advanced Data Models 
CREATE TABLE user_activity (! 
! username varchar,! 
! interaction_time timeuuid,! 
! activity_code ...
Real time data access 
select * from user_activity limit 5; 
username | interaction_time | detail | activity_code! 
------...
©2013 DataStax Confidential. Do not distribute without consent. 52
Introduction to cassandra 2014
Upcoming SlideShare
Loading in …5
×

Introduction to cassandra 2014

Apache Cassandra is a popular choice for a wide variety of application persistence needs. There are many design choices that can effect uptime and performance. In this talk we'll look at some of the many things to consider from a single server to multiple data centers. Basic understanding of Cassandra features coupled with client driver features can be a very powerful combination. This talk will be an introduction but will deep dive into the technical details of how Cassandra works.

Introduction to cassandra 2014

  1. 1. Introduction to Cassandra Patrick McFadin Chief Evangelist/Solution Architect - DataStax @PatrickMcFadin ©2013 DataStax Confidential. Do not distribute without consent.
  2. 2. Cassandra - An introduction
  3. 3. Five years of Cassandra 0.1 0.3 0.6 0.7 1.0 1.2 ... 2.0 0 1 2 3 4 5 DSE Jul-08
  4. 4. Why Cassandra? Or any non-relational?
  5. 5. The world has changed !More Connected !More Data
  6. 6. !Less Tolerant For the record. Can you spell HTTP?
  7. 7. !Traditional solutions… !…may not be a good fit
  8. 8. We still try though shard 1 shard 2 shard 3 shard 4 router client All your wildest ! dreams will come ! true.
  9. 9. A new plan Let’s apply a little Computer Science
  10. 10. Dynamo Paper(2007) • How do we build a data store that is: • Reliable • Performant • “Always On” • Nothing new and shiny • 24 papers cited ! !Evolutionary. Real. Computer Science Also the basis for Riak and Voldemort
  11. 11. BigTable(2006) • Richer data model • 1 key. Lots of values • Fast sequential access • 38 Papers cited
  12. 12. Cassandra(2008) • Distributed features of Dynamo • Data Model and storage from BigTable • February 17, 2010 it graduated to a top-level Apache project
  13. 13. Cassandra - More than one server • All nodes participate in a cluster • Shared nothing • Add or remove as needed • More capacity? Add a server 14
  14. 14. http://globalbigdataconference.com/bigdata-bootcamp-sca-january-2014.php VLDB benchmark (RWS) Cassandra HBase Redis MySQL 15 THROUGHPUT OPS/SEC)
  15. 15. Cassandra - Fully Replicated • Client writes local • Data syncs across WAN • Replication Factor per DC 16
  16. 16. Focus on a single server 17
  17. 17. Cassandra - Writes on a single node
  18. 18. Client writes to server
  19. 19. Data written to commit log
  20. 20. Data written to memtable
  21. 21. Server acknowledges to client
  22. 22. Memtable flushed to disk
  23. 23. Dealing with duplicates
  24. 24. Compaction
  25. 25. Compacted file written
  26. 26. Clean up old files
  27. 27. Cassandra - Writes in the cluster
  28. 28. Fully distributed, no SPOF p1 p1 p1 p3 p6 Client
  29. 29. Partitioning Primary key determines placement* jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F
  30. 30. jim carol johnny suzy PK MD5 Hash 5e02739678... a9a0198010... f4eb27cea7... 78b421309e... MD5* hash operation yields a 128-bit number for keys of any size. Key Hashing
  31. 31. The Token Ring Node A Node B Node D Node C
  32. 32. Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  33. 33. Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  34. 34. Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  35. 35. Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  36. 36. Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0 jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e...
  37. 37. Node A Node B Node D Node C Replication carol a9a0198010...
  38. 38. Node A Node B Node D Node C Replication carol a9a0198010...
  39. 39. Node A Node B Node D Node C Replication carol a9a0198010... Replication factor = 3 Consistency? More on that later
  40. 40. C’’ A’’ D’ C’ A’ D C B’ A B Virtual Nodes Node A Node B Node D Node C Without vnodes With vnodes
  41. 41. Cassandra - Reads
  42. 42. Coordinated reads
  43. 43. Consistency Level • Set with every read and write • ONE • QUORUM - >51% replicas ack • LOCAL_QUORUM - >51% replicas ack in local DC • LOCAL_ONE - Read repair only in local DC • TWO • ALL - All replicas ack. Full consistency
  44. 44. QUORUM and availability
  45. 45. Rapid Read Protection NONE
  46. 46. Cassandra Query Language - CQL
  47. 47. CQL Tables • List of columns • No sizes? • First item in PRMARY KEY is the partition key CREATE TABLE users (! username varchar,! firstname varchar,! lastname varchar,! email list<varchar>,! password varchar,! created_date timestamp,! PRIMARY KEY (username)! ); INSERT INTO users (username, firstname, lastname, ! email, password, created_date)! VALUES ('pmcfadin','Patrick','McFadin',! ['patrick@datastax.com'],'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');!
  48. 48. CQL Inserts • Insert will always overwrite INSERT INTO users (username, firstname, lastname, ! email, password, created_date)! VALUES ('pmcfadin','Patrick','McFadin',! ['patrick@datastax.com'],'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');!
  49. 49. Advanced Data Models CREATE TABLE user_activity (! ! username varchar,! ! interaction_time timeuuid,! ! activity_code varchar,! ! detail varchar,! ! PRIMARY KEY (username, interaction_time)! ) WITH CLUSTERING ORDER BY (interaction_time DESC); Reverse order based on timestamp INSERT INTO user_activity ! (username,interaction_time,activity_code,detail)! VALUES ('pmcfadin',0D1454E0- D202-11E2-8B8B-0800200C9A66,'100','Normal login')! USING TTL 2592000; Expire after 30 days
  50. 50. Real time data access select * from user_activity limit 5; username | interaction_time | detail | activity_code! ----------+--------------------------------------+------------------------------------------+------------------! pmcfadin | 9ccc9df0-d076-11e2-923e-5d8390e664ec | Entered shopping area: Jewelry | 301! pmcfadin | 9c652990-d076-11e2-923e-5d8390e664ec | Created shopping cart: Anniversary gifts | 202! pmcfadin | 1b5cef90-d076-11e2-923e-5d8390e664ec | Deleted shopping cart: Gadgets I want | 205! pmcfadin | 1b0e5a60-d076-11e2-923e-5d8390e664ec | Opened shopping cart: Gadgets I want | 201! pmcfadin | 1b0be960-d076-11e2-923e-5d8390e664ec | Normal login | 100 Maybe put a sale item for flowers too?
  51. 51. ©2013 DataStax Confidential. Do not distribute without consent. 52

×