SlideShare a Scribd company logo
1 of 13
Download to read offline
Cassandra History
● Created at Facebook
● Open-sourced since 2008
● Current version = 3.3
● Column-oriented ☞ distributed table
NoSQL database Cluster Layer
● Amazon DynamoDB paper
● Masterless architecture
Data-store layer
● Google Big Table paper
● Columns/columns family
But what is Cassandra ?
● Fast Distributed Database/Table
● High Availability
● Linear Scalability
● Predictable Performance
● No SPOF
● Multi-DC
● Commodity Hardware
● Easy to manage operationally
How is Cassandra different ?
● No master / slave / replica sets
● No config servers, zookeeper
● Data is partitioned around ring
● Data is replicated to RF=N servers
● All nodes hold data and can answer
queries (both reads & writes)
● Location of data on ring is determined by
partition key
● Cassandra chooses Availability & Partition
Tolerance over Consistency
● Queries have tunable consistency level
● ALL, QUORUM, ONE
● Hinted Handoff to deal with failed nodes
CAP Theorem
Cassandra under the hood
Cassandra Writes
● Writes are written to any node in the cluster
(coordinator)
● Writes are written to commit log, then to
memtable
● Every write includes a timestamp
● Memtable flushed to disk periodically (sstable)
● Deletes are actually a special write case, called
a “tombstone”
● Partition is spread across multiple SSTables
● Same column can be in multiple SSTables
● Merged through compaction, only latest
timestamp is kept
Cassandra Reads
● Any node may be queried, it acts as
coordinator
● On each node, data is pulled from
Memtable + SSTables and merged
● Each SSTable has a BloomFilter
Cassandra is not...
● An In-Memory database
● A Key-Value storage (it has schema)
● A magical unicorn database that farts rainbow
Cassandra Data Modeling
Relational Cassandra
CQL vs SQL
● no Sequences
○ Natural Keys - email etc…
○ Surrogate Keys - UUID
● no Order By - during select
Clustering Key
// Comments for a given video
CREATE TABLE comments_by_video (
videoid uuid,
comment_time timestamp,
userid uuid,
comment text,
PRIMARY KEY (videoid, comment_time)
) WITH CLUSTERING ORDER BY (comment_time DESC);
SELECT * FROM comments_by_video
videoid commendid userid comment
Video1 Dec 13, 11:00 user1 comment3
Video1 Dec 12, 15:00 user2 comment2
Video1 Dec 12, 12:00 user1 comment1
... ... ... ...
more comments
more columns
...Dec 13, 11:00
user1, comment3
Dec 12, 15:00
user2, comment2
Dec 12, 12:00
user1, comment3
Video1 Up to 2 Billion
New Data
Example: KillrVideo.com
CREATE TABLE user_videos (
userid uuid,
added_date timestamp,
videoid uuid,
name text,
priview_image_location text,
PRIMARY KEY (userid, added_date, video_id)
) WITH CLUSTERING ORDER BY (added_date DESC, video_id ASC);
// Index for tag keyword
CREATE TABLE videos_by_tag (
tag text,
videoid uuid,
added_date timestamp,
name text,
priview_image_location text,
tagged_date timestamp
PRIMARY KEY (tag, video_id)
);
// Index for tag keyword
CREATE TABLE tags_by_letter (
first_letter text,
tag text,
PRIMARY KEY (first_letter, tag)
);
Secondary Indexes
Country Continent
Germany Europe
China Asia
France Europe
NODE 1
Germany: Europe
NODE 2
China: Asia
NODE 2
France: Europe
SELECT * FROM … WHERE Continent = ‘Europe’
SELECT * FROM … WHERE Country = ‘France’
● Continent is Secondary
Index.

More Related Content

What's hot

Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax AstraApache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax AstraAnant Corporation
 
How Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and SaferHow Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and SaferScyllaDB
 
Seattle Cassandra Meetup - Cassandra 1.2 - Eddie Satterly
Seattle Cassandra Meetup - Cassandra 1.2 - Eddie SatterlySeattle Cassandra Meetup - Cassandra 1.2 - Eddie Satterly
Seattle Cassandra Meetup - Cassandra 1.2 - Eddie Satterlybtoddb
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandrashimi_k
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)RichardWarburton
 
Cassandra Anti-Patterns
Cassandra Anti-PatternsCassandra Anti-Patterns
Cassandra Anti-PatternsMatthew Dennis
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsMatthew Dennis
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community
 
Distributed systems and consistency
Distributed systems and consistencyDistributed systems and consistency
Distributed systems and consistencyseldo
 
Sharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika DhananjaySharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika DhananjayGluster.org
 

What's hot (14)

Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax AstraApache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
Apache Cassandra Lunch #67: Moving Data from Cassandra to Datastax Astra
 
Barcamp presentation
Barcamp presentationBarcamp presentation
Barcamp presentation
 
How Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and SaferHow Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and Safer
 
Seattle Cassandra Meetup - Cassandra 1.2 - Eddie Satterly
Seattle Cassandra Meetup - Cassandra 1.2 - Eddie SatterlySeattle Cassandra Meetup - Cassandra 1.2 - Eddie Satterly
Seattle Cassandra Meetup - Cassandra 1.2 - Eddie Satterly
 
Cassandra at BrightTag
Cassandra at BrightTagCassandra at BrightTag
Cassandra at BrightTag
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)
 
Cassandra Anti-Patterns
Cassandra Anti-PatternsCassandra Anti-Patterns
Cassandra Anti-Patterns
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patterns
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Distributed systems and consistency
Distributed systems and consistencyDistributed systems and consistency
Distributed systems and consistency
 
Sharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika DhananjaySharding: Past, Present and Future with Krutika Dhananjay
Sharding: Past, Present and Future with Krutika Dhananjay
 
Cassandra On EC2
Cassandra On EC2Cassandra On EC2
Cassandra On EC2
 

Similar to Introduction to Cassandra

An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache CassandraSaeid Zebardast
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overviewSean Murphy
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamojbellis
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfCédrick Lunven
 
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra GuruUse Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra GuruTim Callaghan
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUGStu Hood
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra Knoldus Inc.
 
High-availability with Galera Cluster for MySQL
High-availability with Galera Cluster for MySQLHigh-availability with Galera Cluster for MySQL
High-availability with Galera Cluster for MySQLFromDual GmbH
 
Introduction to Postrges-XC
Introduction to Postrges-XCIntroduction to Postrges-XC
Introduction to Postrges-XCAshutosh Bapat
 
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous AvailabilityRamp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous AvailabilityPythian
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandraAaron Ploetz
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedJ On The Beach
 
MySQL always-up with Galera Cluster
MySQL always-up with Galera ClusterMySQL always-up with Galera Cluster
MySQL always-up with Galera ClusterFromDual GmbH
 

Similar to Introduction to Cassandra (20)

An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache Cassandra
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
 
Cassandra
CassandraCassandra
Cassandra
 
Cassandra
CassandraCassandra
Cassandra
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdf
 
Cassandra
CassandraCassandra
Cassandra
 
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra GuruUse Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra
 
Running Cassandra in AWS
Running Cassandra in AWSRunning Cassandra in AWS
Running Cassandra in AWS
 
High-availability with Galera Cluster for MySQL
High-availability with Galera Cluster for MySQLHigh-availability with Galera Cluster for MySQL
High-availability with Galera Cluster for MySQL
 
Introduction to Postrges-XC
Introduction to Postrges-XCIntroduction to Postrges-XC
Introduction to Postrges-XC
 
Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous AvailabilityRamp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Spark Deep Dive
Spark Deep DiveSpark Deep Dive
Spark Deep Dive
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous Speed
 
MySQL always-up with Galera Cluster
MySQL always-up with Galera ClusterMySQL always-up with Galera Cluster
MySQL always-up with Galera Cluster
 

Recently uploaded

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Recently uploaded (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Introduction to Cassandra

  • 1. Cassandra History ● Created at Facebook ● Open-sourced since 2008 ● Current version = 3.3 ● Column-oriented ☞ distributed table NoSQL database Cluster Layer ● Amazon DynamoDB paper ● Masterless architecture Data-store layer ● Google Big Table paper ● Columns/columns family
  • 2. But what is Cassandra ? ● Fast Distributed Database/Table ● High Availability ● Linear Scalability ● Predictable Performance ● No SPOF ● Multi-DC ● Commodity Hardware ● Easy to manage operationally
  • 3. How is Cassandra different ? ● No master / slave / replica sets ● No config servers, zookeeper ● Data is partitioned around ring ● Data is replicated to RF=N servers ● All nodes hold data and can answer queries (both reads & writes) ● Location of data on ring is determined by partition key
  • 4. ● Cassandra chooses Availability & Partition Tolerance over Consistency ● Queries have tunable consistency level ● ALL, QUORUM, ONE ● Hinted Handoff to deal with failed nodes CAP Theorem
  • 6. Cassandra Writes ● Writes are written to any node in the cluster (coordinator) ● Writes are written to commit log, then to memtable ● Every write includes a timestamp ● Memtable flushed to disk periodically (sstable) ● Deletes are actually a special write case, called a “tombstone” ● Partition is spread across multiple SSTables ● Same column can be in multiple SSTables ● Merged through compaction, only latest timestamp is kept
  • 7. Cassandra Reads ● Any node may be queried, it acts as coordinator ● On each node, data is pulled from Memtable + SSTables and merged ● Each SSTable has a BloomFilter
  • 8. Cassandra is not... ● An In-Memory database ● A Key-Value storage (it has schema) ● A magical unicorn database that farts rainbow
  • 10. CQL vs SQL ● no Sequences ○ Natural Keys - email etc… ○ Surrogate Keys - UUID ● no Order By - during select
  • 11. Clustering Key // Comments for a given video CREATE TABLE comments_by_video ( videoid uuid, comment_time timestamp, userid uuid, comment text, PRIMARY KEY (videoid, comment_time) ) WITH CLUSTERING ORDER BY (comment_time DESC); SELECT * FROM comments_by_video videoid commendid userid comment Video1 Dec 13, 11:00 user1 comment3 Video1 Dec 12, 15:00 user2 comment2 Video1 Dec 12, 12:00 user1 comment1 ... ... ... ... more comments more columns ...Dec 13, 11:00 user1, comment3 Dec 12, 15:00 user2, comment2 Dec 12, 12:00 user1, comment3 Video1 Up to 2 Billion New Data
  • 12. Example: KillrVideo.com CREATE TABLE user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, priview_image_location text, PRIMARY KEY (userid, added_date, video_id) ) WITH CLUSTERING ORDER BY (added_date DESC, video_id ASC); // Index for tag keyword CREATE TABLE videos_by_tag ( tag text, videoid uuid, added_date timestamp, name text, priview_image_location text, tagged_date timestamp PRIMARY KEY (tag, video_id) ); // Index for tag keyword CREATE TABLE tags_by_letter ( first_letter text, tag text, PRIMARY KEY (first_letter, tag) );
  • 13. Secondary Indexes Country Continent Germany Europe China Asia France Europe NODE 1 Germany: Europe NODE 2 China: Asia NODE 2 France: Europe SELECT * FROM … WHERE Continent = ‘Europe’ SELECT * FROM … WHERE Country = ‘France’ ● Continent is Secondary Index.