SlideShare a Scribd company logo
1 of 28
Download to read offline
Apache Cassandra 
Fundamentals 
or: 
How I stopped worrying and learned to love the CAP theorem 
Russell Spitzer 
@RussSpitzer 
Software Engineer in Test at DataStax
Who am I? 
• Former Bioinformatics Student 
at UCSF 
• Work on the integration of 
Cassandra (C*) with Hadoop, 
Solr, and Redacted! 
• I Spend a lot of time spinning up 
clusters on EC2, GCE, Azure, … 
http://www.datastax.com/dev/ 
blog/testing-cassandra-1000- 
nodes-at-a-time 
• Developing new ways to make 
sure that C* Scales
Apache Cassandra is a Linearly Scaling 
and Fault Tolerant noSQL Database 
Linearly Scaling: 
The power of the database 
increases linearly with the 
number of machines 
2x machines = 2x throughput 
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html 
Fault Tolerant: 
Nodes down != Database Down 
Datacenter down != Database Down
CAP Theorem Limits What 
Distributed Systems can do 
Consistency 
When I ask the same question to any part of the system I should get the same answer 
How many planes do we have?
CAP Theorem Limits What 
Distributed Systems can do 
Consistency 
When I ask the same question to any part of the system I should get the same answer 
How many planes do we have? 
Consistent 
1 1 1 1 1 1 1
CAP Theorem Limits What 
Distributed Systems can do 
Consistency 
When I ask the same question to any part of the system I should get the same answer 
How many planes do we have? 
Not Consistent 
1 4 1 2 1 8 1
CAP Theorem Limits What 
Distributed Systems can do 
When I ask a question I will get an answer 
Availability 
How many planes do we have? 
Available 
1 zzzzz *snort* zzz
CAP Theorem Limits What 
Distributed Systems can do 
Availability 
When I ask a question I will get an answer 
How many planes do we have? 
I have to wait for major snooze to wake up 
zzzzz *snort* zzz 
Not Available
CAP Theorem Limits What 
Distributed Systems can do 
Partition Tolerance 
I can ask questions even when the system is having intra-system communication 
problems 
How many planes do we have? 
Team Edward Team Jacob 
1 
Tolerant
CAP Theorem Limits What 
Distributed Systems can do 
Partition Tolerance 
I can ask questions even when the system is having intra-system communication 
problems 
How many planes do we have? 
Not Tolerant 
Team Edward Team Jacob 
I’m not sure without asking those 
vampire lovers and we aren’t speaking
Cassandra is an AP System 
which is Eventually Consistent 
Eventually consistent: 
New information will make it to everyone eventually 
How many planes do we have? How many planes do we have? 
I don’t know without asking those 
vampire lovers and we aren’t speaking 
1 1 1 1 1 1 
I just heard ! 
we actually ! 
have 2 
2 2 2 2 2 2 2
Two knobs control fault tolerance in 
C*: Replication and Consistency Level 
Server Side - Replication: 
How many copies of a data should exist in the cluster? 
Coordinator 
for this operation 
ABD ABC 
ACD 
BCD 
RF=3 
Client 
SimpleStrategy: Replicas 
NetworkTopologyStrategy: Replicas per Datacenter
Two knobs control fault tolerance in 
C*: Replication and Consistency Level 
Client Side - Consistency Level: 
How many replicas should we check before 
acknowledgment? 
ABD ABC 
ACD 
BCD 
Client 
Coordinator 
for this operation 
CL = One
Two knobs control fault tolerance in 
C*: Replication and Consistency Level 
Client Side - Consistency Level: 
How many replicas should we check before 
acknowledgment? 
ABD ABC 
ACD 
BCD 
CL = Quorum 
Client 
Coordinator 
for this operation
Nodes own data whose primary key 
hashes to their their token ranges 
ABD ABC 
ACD 
BCD 
Every piece of data belongs on 
the node who owns the 
Murmur3(2.0) Hash of its 
partition key + (RF-1) other 
nodes 
Partition Key Clustering Key 
Rest of Data 
ID: ICBM_432 Time: 30 
Loc: SF , Status: Idle 
ID: ICBM_432 
Murmur3Hash 
Murmur3: A
Cassandra writes are FAST 
due to log-append storage 
Par Clu Re Memory 
Memtable 
Memtable Memtable 
Commit Log 
Par Clu Re 
Par Clu Re 
Par Clu Re 
Disk Flushed 
SSTable SSTable
Deletes in a distributed 
System are Challenging 
We need to keep records of 
deletions in case of network 
partitions 
Node1 
Node2 Power Outage 
Time 
Tombstone Tombstone 
Tombstone
Compactions merge and 
unify data in our stables 
SSTable 
1 
+ SSTable 
SSTable 
2 3 
Since SSTables are immutable 
this is our chance to 
consolidate rows and remove 
tombstones (After GC Grace)
Layout of Data Allows for Rapid 
Queries Along Clustering Columns 
ID: ICBM_432 
ID: ICBM_900 
ID: ICBM_9210 
Time: 30 
Loc: 
SF 
Status: 
Idle 
Time: 45 
Loc: 
SF 
Status: 
Idle 
Time: 60 
Loc: 
SF 
Status: 
Idle 
Time: 30 
Loc: 
Boston 
Status: 
Idle 
Time: 45 
Loc: 
Boston 
Status: 
Idle 
Time: 60 
Loc: 
Boston 
Status: 
Idle 
Time: 30 
Loc: 
Tulsa 
Status: 
Idle 
Time: 45 
Loc: 
Tulsa 
Status: 
Idle 
Time: 60 
Loc: 
Tulsa 
Status: 
Idle 
Disclaimer: Not exactly like this (Use sstable2json to see real layout)
CQL allows easy definition 
of Table Structures 
ID: ICBM_432 
Time: 30 
Loc: 
SF 
Status: 
Idle 
Time: 45 
Loc: 
SF 
Status: 
Idle 
Time: 60 
Loc: 
SF 
Status: 
Idle 
CREATE TABLE icbmlog ( 
name text, 
time timestamp, 
location text, 
status text, 
PRIMARY KEY (name,time) 
);
Reading data is FAST but 
limited by disk IO 
Memory 
Memtable 
Memtable Memtable 
Commit Log 
Par Clu Re 
Par Clu Re 
Par Clu Re 
Disk 
SSTable SSTable 
Client 
Par Clu Re 
LWW 
Replica 
Par Clu Re
Reading data is FAST but 
limited by disk IO 
Memory 
Memtable 
Memtable Memtable 
Commit Log 
Par Clu Re 
Par Clu Re 
Par Clu Re 
Disk 
SSTable SSTable 
Client 
Par Clu Re 
LWW 
Replica 
Par Clu Re 
Read 
Repair
New Clients provide a 
holistic view of the C* cluster 
Client 
ABD ABC 
ACD 
BCD 
Initial Contact 
Cluster.builder().addContactPoint("127.0.0.1").build()
Session Objects Are used 
for Executing Requests 
session = cluster.connect() 
session.execute("DROP KEYSPACE IF EXISTS icbmkey") 
session.execute("CREATE KEYSPACE icbmkey with 
replication = 
{'class':'SimpleStrategy','replication_factor':'1'}") 
For highest throughput use asynchronous methods 
ResultSetFuture executeAsync(Query query) 
Then add a callback or Queue the ResultSetFutures 
ResultSetFuture 
ResultSetFuture 
ResultSetFuture
Token Aware Policies allow the reduction 
in the number of intra-network requests 
made 
Client 
ABD ABC 
ACD 
BCD 
A
Prepared statements allow for 
sending less data over the wire 
Query is prepared on all nodes by driver 
Prepared batch statements 
can further improve throughput 
PreparedStatement ps = session.prepare("INSERT INTO messages (user_id, msg_id, title, body) VALUES (?, ?, ?, ?)"); 
BatchStatement batch = new BatchStatement(); 
batch.add(ps.bind(uid, mid1, title1, body1)); 
batch.add(ps.bind(uid, mid2, title2, body2)); 
batch.add(ps.bind(uid, mid3, title3, body3)); 
session.execute(batch);
Avoid 
• Preparing statements more than once 
• Creating batches which are too large 
• Running statements in serial 
• Using consistency-levels above your need 
• Secondary Indexes in your main queries 
• or really at all unless you are doing analytics
Have fun with C* 
Questions?

More Related Content

What's hot

Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...DataStax
 
Zero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and CassandraZero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and CassandraRussell Spitzer
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache CassandraPatrick McFadin
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...DataStax
 
Spark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureSpark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureRussell Spitzer
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015Patrick McFadin
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesPatrick McFadin
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team ApachePatrick McFadin
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesDuyhai Doan
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on firePatrick McFadin
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Matthias Niehoff
 
Cassandra and Spark
Cassandra and Spark Cassandra and Spark
Cassandra and Spark datastaxjp
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and SparkPatrick McFadin
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionPatrick McFadin
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraPatrick McFadin
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandranickmbailey
 
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayAnalytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayMatthias Niehoff
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelinesPatrick McFadin
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureDataStax Academy
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strataPatrick McFadin
 

What's hot (20)

Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
 
Zero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and CassandraZero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and Cassandra
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
Maximum Overdrive: Tuning the Spark Cassandra Connector (Russell Spitzer, Dat...
 
Spark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureSpark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and Future
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseries
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team Apache
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-Cases
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra
 
Cassandra and Spark
Cassandra and Spark Cassandra and Spark
Cassandra and Spark
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and Spark
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayAnalytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelines
 
Spark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and FurureSpark Cassandra Connector: Past, Present and Furure
Spark Cassandra Connector: Past, Present and Furure
 
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strata
 

Similar to Cassandra Fundamentals - C* 2.0

Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Monal Daxini
 
RAC - The Savior of DBA
RAC - The Savior of DBARAC - The Savior of DBA
RAC - The Savior of DBANikhil Kumar
 
A Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersA Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersLuke Tillman
 
Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistencyScyllaDB
 
Percona XtraDB 集群文档
Percona XtraDB 集群文档Percona XtraDB 集群文档
Percona XtraDB 集群文档YUCHENG HU
 
Cassandra basic
Cassandra basicCassandra basic
Cassandra basiczqhxuyuan
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupAdam Hutson
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystemAlex Thompson
 
Querying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern FragmentsQuerying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern FragmentsRuben Verborgh
 
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...DataStax Academy
 
Apache Cassandra - Drivers deep dive
Apache Cassandra - Drivers deep diveApache Cassandra - Drivers deep dive
Apache Cassandra - Drivers deep diveAlex Thompson
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]Chris Suszyński
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...DataStax Academy
 
Azure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveAzure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveIlyas F ☁☁☁
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraLuke Tillman
 
Scalable Data Storage Getting You Down? To The Cloud!
Scalable Data Storage Getting You Down? To The Cloud!Scalable Data Storage Getting You Down? To The Cloud!
Scalable Data Storage Getting You Down? To The Cloud!Mikhail Panchenko
 
Scalable Data Storage Getting you Down? To the Cloud!
Scalable Data Storage Getting you Down? To the Cloud!Scalable Data Storage Getting you Down? To the Cloud!
Scalable Data Storage Getting you Down? To the Cloud!Mikhail Panchenko
 

Similar to Cassandra Fundamentals - C* 2.0 (20)

Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014
 
RAC - The Savior of DBA
RAC - The Savior of DBARAC - The Savior of DBA
RAC - The Savior of DBA
 
A Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET DevelopersA Deep Dive into Apache Cassandra for .NET Developers
A Deep Dive into Apache Cassandra for .NET Developers
 
Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistency
 
Percona XtraDB 集群文档
Percona XtraDB 集群文档Percona XtraDB 集群文档
Percona XtraDB 集群文档
 
Cassandra basic
Cassandra basicCassandra basic
Cassandra basic
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User Group
 
The Apache Cassandra ecosystem
The Apache Cassandra ecosystemThe Apache Cassandra ecosystem
The Apache Cassandra ecosystem
 
Querying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern FragmentsQuerying federations 
of Triple Pattern Fragments
Querying federations 
of Triple Pattern Fragments
 
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
C* Summit 2013: Netflix Open Source Tools and Benchmarks for Cassandra by Adr...
 
Apache Cassandra - Drivers deep dive
Apache Cassandra - Drivers deep diveApache Cassandra - Drivers deep dive
Apache Cassandra - Drivers deep dive
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]
You need Event Mesh, not Service Mesh - Chris Suszynski [WJUG 301]
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Azure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveAzure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep Dive
 
System Design.pdf
System Design.pdfSystem Design.pdf
System Design.pdf
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Scalable Data Storage Getting You Down? To The Cloud!
Scalable Data Storage Getting You Down? To The Cloud!Scalable Data Storage Getting You Down? To The Cloud!
Scalable Data Storage Getting You Down? To The Cloud!
 
Scalable Data Storage Getting you Down? To the Cloud!
Scalable Data Storage Getting you Down? To the Cloud!Scalable Data Storage Getting you Down? To the Cloud!
Scalable Data Storage Getting you Down? To the Cloud!
 

Recently uploaded

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Cassandra Fundamentals - C* 2.0

  • 1. Apache Cassandra Fundamentals or: How I stopped worrying and learned to love the CAP theorem Russell Spitzer @RussSpitzer Software Engineer in Test at DataStax
  • 2. Who am I? • Former Bioinformatics Student at UCSF • Work on the integration of Cassandra (C*) with Hadoop, Solr, and Redacted! • I Spend a lot of time spinning up clusters on EC2, GCE, Azure, … http://www.datastax.com/dev/ blog/testing-cassandra-1000- nodes-at-a-time • Developing new ways to make sure that C* Scales
  • 3. Apache Cassandra is a Linearly Scaling and Fault Tolerant noSQL Database Linearly Scaling: The power of the database increases linearly with the number of machines 2x machines = 2x throughput http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html Fault Tolerant: Nodes down != Database Down Datacenter down != Database Down
  • 4. CAP Theorem Limits What Distributed Systems can do Consistency When I ask the same question to any part of the system I should get the same answer How many planes do we have?
  • 5. CAP Theorem Limits What Distributed Systems can do Consistency When I ask the same question to any part of the system I should get the same answer How many planes do we have? Consistent 1 1 1 1 1 1 1
  • 6. CAP Theorem Limits What Distributed Systems can do Consistency When I ask the same question to any part of the system I should get the same answer How many planes do we have? Not Consistent 1 4 1 2 1 8 1
  • 7. CAP Theorem Limits What Distributed Systems can do When I ask a question I will get an answer Availability How many planes do we have? Available 1 zzzzz *snort* zzz
  • 8. CAP Theorem Limits What Distributed Systems can do Availability When I ask a question I will get an answer How many planes do we have? I have to wait for major snooze to wake up zzzzz *snort* zzz Not Available
  • 9. CAP Theorem Limits What Distributed Systems can do Partition Tolerance I can ask questions even when the system is having intra-system communication problems How many planes do we have? Team Edward Team Jacob 1 Tolerant
  • 10. CAP Theorem Limits What Distributed Systems can do Partition Tolerance I can ask questions even when the system is having intra-system communication problems How many planes do we have? Not Tolerant Team Edward Team Jacob I’m not sure without asking those vampire lovers and we aren’t speaking
  • 11. Cassandra is an AP System which is Eventually Consistent Eventually consistent: New information will make it to everyone eventually How many planes do we have? How many planes do we have? I don’t know without asking those vampire lovers and we aren’t speaking 1 1 1 1 1 1 I just heard ! we actually ! have 2 2 2 2 2 2 2 2
  • 12. Two knobs control fault tolerance in C*: Replication and Consistency Level Server Side - Replication: How many copies of a data should exist in the cluster? Coordinator for this operation ABD ABC ACD BCD RF=3 Client SimpleStrategy: Replicas NetworkTopologyStrategy: Replicas per Datacenter
  • 13. Two knobs control fault tolerance in C*: Replication and Consistency Level Client Side - Consistency Level: How many replicas should we check before acknowledgment? ABD ABC ACD BCD Client Coordinator for this operation CL = One
  • 14. Two knobs control fault tolerance in C*: Replication and Consistency Level Client Side - Consistency Level: How many replicas should we check before acknowledgment? ABD ABC ACD BCD CL = Quorum Client Coordinator for this operation
  • 15. Nodes own data whose primary key hashes to their their token ranges ABD ABC ACD BCD Every piece of data belongs on the node who owns the Murmur3(2.0) Hash of its partition key + (RF-1) other nodes Partition Key Clustering Key Rest of Data ID: ICBM_432 Time: 30 Loc: SF , Status: Idle ID: ICBM_432 Murmur3Hash Murmur3: A
  • 16. Cassandra writes are FAST due to log-append storage Par Clu Re Memory Memtable Memtable Memtable Commit Log Par Clu Re Par Clu Re Par Clu Re Disk Flushed SSTable SSTable
  • 17. Deletes in a distributed System are Challenging We need to keep records of deletions in case of network partitions Node1 Node2 Power Outage Time Tombstone Tombstone Tombstone
  • 18. Compactions merge and unify data in our stables SSTable 1 + SSTable SSTable 2 3 Since SSTables are immutable this is our chance to consolidate rows and remove tombstones (After GC Grace)
  • 19. Layout of Data Allows for Rapid Queries Along Clustering Columns ID: ICBM_432 ID: ICBM_900 ID: ICBM_9210 Time: 30 Loc: SF Status: Idle Time: 45 Loc: SF Status: Idle Time: 60 Loc: SF Status: Idle Time: 30 Loc: Boston Status: Idle Time: 45 Loc: Boston Status: Idle Time: 60 Loc: Boston Status: Idle Time: 30 Loc: Tulsa Status: Idle Time: 45 Loc: Tulsa Status: Idle Time: 60 Loc: Tulsa Status: Idle Disclaimer: Not exactly like this (Use sstable2json to see real layout)
  • 20. CQL allows easy definition of Table Structures ID: ICBM_432 Time: 30 Loc: SF Status: Idle Time: 45 Loc: SF Status: Idle Time: 60 Loc: SF Status: Idle CREATE TABLE icbmlog ( name text, time timestamp, location text, status text, PRIMARY KEY (name,time) );
  • 21. Reading data is FAST but limited by disk IO Memory Memtable Memtable Memtable Commit Log Par Clu Re Par Clu Re Par Clu Re Disk SSTable SSTable Client Par Clu Re LWW Replica Par Clu Re
  • 22. Reading data is FAST but limited by disk IO Memory Memtable Memtable Memtable Commit Log Par Clu Re Par Clu Re Par Clu Re Disk SSTable SSTable Client Par Clu Re LWW Replica Par Clu Re Read Repair
  • 23. New Clients provide a holistic view of the C* cluster Client ABD ABC ACD BCD Initial Contact Cluster.builder().addContactPoint("127.0.0.1").build()
  • 24. Session Objects Are used for Executing Requests session = cluster.connect() session.execute("DROP KEYSPACE IF EXISTS icbmkey") session.execute("CREATE KEYSPACE icbmkey with replication = {'class':'SimpleStrategy','replication_factor':'1'}") For highest throughput use asynchronous methods ResultSetFuture executeAsync(Query query) Then add a callback or Queue the ResultSetFutures ResultSetFuture ResultSetFuture ResultSetFuture
  • 25. Token Aware Policies allow the reduction in the number of intra-network requests made Client ABD ABC ACD BCD A
  • 26. Prepared statements allow for sending less data over the wire Query is prepared on all nodes by driver Prepared batch statements can further improve throughput PreparedStatement ps = session.prepare("INSERT INTO messages (user_id, msg_id, title, body) VALUES (?, ?, ?, ?)"); BatchStatement batch = new BatchStatement(); batch.add(ps.bind(uid, mid1, title1, body1)); batch.add(ps.bind(uid, mid2, title2, body2)); batch.add(ps.bind(uid, mid3, title3, body3)); session.execute(batch);
  • 27. Avoid • Preparing statements more than once • Creating batches which are too large • Running statements in serial • Using consistency-levels above your need • Secondary Indexes in your main queries • or really at all unless you are doing analytics
  • 28. Have fun with C* Questions?