SlideShare a Scribd company logo
1 of 48
Download to read offline
w/ @
MEET
Background
‣ White paper Blend
• Amazon Dynamo -07
• Google BigTable- 06
‣ What about Facebook?
‣ Open Source Successes
‣ & cue DataStax
‣ so What’s in a name?
why
cassandra?
Because the
world has
changed
Story Time
‣The network is reliable.
‣Latency is zero.
‣Bandwidth is infinite.
‣The network is secure.
‣Topology doesn't change.
‣There is one administrator.
‣Transport cost is zero.
‣The network is homogeneous.
http://en.wikipedia.org/wiki/Fallacies_of_distributed_computing
What we believe:
Hey, I’ve got some data!
I’ve got some bigger data!
lol wut?
uh oh…
big
data
???
high
availability
http://planetcassandra.org/nosql-performance-benchmarks/
Architecture
San
Francisco
New York
Stockholm
‣ Masterless - Peer to Peer
‣ High Availability
‣ No SPOFs
‣ Multi-dC
‣ Runs on Commodity Hardware
‣ Linear Scalability: More throughput?
‣ Fast
‣ Distributed
‣ Easy to operationally Manage
Architected w/
scale in mind
Node 1
Node 3
Node 2Node 4Node 8
Node 5
Node 6
Node 7
Data Center - East
Node 1
Node 3
Node 2Node 4
Data Center - West
Rack 1
Rack 2
Node 2
Rack 1
Rack 2
Cassandra Cluster
- 263+ 263
Token Range
(Murmur3)
A peer to peer set of nodes
‣ Node – one Cassandra instance
‣ Rack – a logical set of nodes
‣ Data Center – a logical set of racks
‣ Cluster – the full set of nodes which map to a single complete token ring
What is a c* Cluster?
Node 1
1st copy
Node 4
Node 5
Node 2
2nd copy
Node 3
3rd copy
Peer
to
Peer
Node 1
1st copy
Node 4
Node 5
Node 2
2nd copy
Node 3
3rd copy
Node 1
1st copy
Node 4
Node 5
Node 2
2nd copy
Node 3
DC: EUROPEDC: USA
Data Centers
Node 1
1st copy
Node 4
Node 5
Node 2
2nd copy
Node 3
3rd copy
Parallel
Write
Write
Consistency Level = QUORUM
Replication Factor = 3
5 µs ack
12 µs ack
500 µs ack
12 µs ack
Node 4
Node 2
2nd copy
Node 1
1st copy
Node 3
3rd copy
Node 4
Node 5
Tunable Consistency
Node 1
1st copy
Node 4
Node 5
Node 2
2nd copy
Parallel
Read
Read
Consistency Level = QUORUM
Replication Factor = 3
Node 4
Node 2
2nd copy
Node 1
1st copy
Node 3
3rd copy
Node 3
3rd copy
Node 4
Node 5
Hints
Continuous
availability
DataModel
andstoragemodel
Let’s think about a
music
database…
CQL
How to create, use and drop keyspaces/schemas?
•  To create a keyspace
•  To assign the working default keyspace for a cqlsh session
•  To delete a keyspace and all internal data objects
CREATE KEYSPACE musicdb
WITH replication = {
'class': 'SimpleStrategy',
'replication_factor' : 3
};
DROP KEYSPACE musicdb;
USE musicdb;
What is the syntax of the CREATE TABLE statement?
•  The CQL below creates a table in the current keyspace
CREATE TABLE performer (
name VARCHAR,
type VARCHAR,
country VARCHAR,
style VARCHAR,
founded INT,
born INT,
died INT,
PRIMARY KEY (name)
);
CREATE TABLE performer (
name VARCHAR PRIMARY KEY,
type VARCHAR,
country VARCHAR,
style VARCHAR,
founded INT,
born INT,
died INT
);
Primary key declared inline Primary key declared in separate clause
storage
model
What is a CQL table and how is it related to !
a column family?
•  A CQL table is a column family
•  CQL tables provide two-dimensional views of a column family, which contains
potentially multi-dimensional data, due to composite keys and collections
•  CQL table and column family are largely interchangeable terms
•  Not surprising when you recall tables and relations, columns and attributes, rows
and tuples in relational databases
•  Supported by declarative language Cassandra Query Language
•  Data Definition Language, subset of CQL
•  SQL-like syntax, but with somewhat different semantics
•  Convenient for defining and expressing Cassandra database schemas
What are row, row key, column key, !
and column value?
•  Row is the smallest unit that stores related data in Cassandra
•  Rows – individual rows constitute a column family
•  Row key – uniquely identifies a row in a column family
•  Row – stores pairs of column keys and column values
•  Column key – uniquely identifies a column value in a row
•  Column value – stores one value or a collection of values
©2014 DataStax Training. Use only with permission. Slide 4
row$
key
va
cola
vb
colb
vc
colc
vd
cold
Column'keys'(or'column'names)Row
Column'values'(or'cells)
partitions
What are partition, partition key, row, column, !
and cell?
•  Table with single-row partitions
•  Column family view
©2014 DataStax Training. Use only with permission. Slide 17
performer born country died founded style type
John Lennon 1940 England 1980 Rock artist
Paul McCartney 1942 England Rock artist
The Beatles England 1957 Rock band
rows
columns
partition key
cells
data
model
What is a data modeling framework?
•  Defines transitions between
models
•  Query-driven methodology
•  Formal analysis and validation
•  Defines a scientific approach to
data modeling
•  Modeling rules
•  Mapping patterns
•  Schema optimization techniques
Conceptual
Model
Logical
Model
Physical
Model
Query6Driven
Methodology
Analysis::&:
Validation
What is a conceptual data model?
•  Conceptual data model for music data
•  ER diagram (Chen notation)
•  Describes entities, relationships, roles, keys, cardinalities
•  What is possible and what is not in existing or future data
Album
title
year genre
releasesPerformername
founded
country
1 n
style
IsA
ArtistBand
disjoint5
covering
born
died
has3
member
n m
period
format
cover5image
number
title
1
n
Track
has
User
id
email
name
preferences
performs
m
1
involvedIn
1
n
IsA
RatePlay
disjoint5
not5covering
Activity
id
timestamp
rating
Q1
ACCESS	
  PATTERNS
Q1:	
  Find	
  performers	
  for	
  a	
  specified	
  style;	
  order	
  by	
  performer	
  (ASC).
Q2:	
  Find	
  information	
  for	
  a	
  specified	
  performer	
  (artist	
  or	
  band).
Q3:	
  Find	
  information	
  for	
  a	
  specified	
  album	
  (title	
  and	
  year).
Q4:	
  Find	
  albums	
  for	
  a	
  specified	
  performer;	
  order	
  by	
  album	
  release	
  year	
  (DESC)	
  and	
  title	
  (ASC).
Q5:	
  Find	
  albums	
  for	
  a	
  specified	
  genre;	
  order	
  by	
  performer	
  (ASC),	
  year	
  (DESC),	
  and	
  title	
  (ASC).
Q6:	
  Find	
  albums	
  and	
  performers	
  for	
  a	
  specified	
  track	
  title;	
  order	
  by	
  performer	
  (ASC),	
  year	
  (DESC),	
  and	
  title	
  (ASC).
Q7:	
  Find	
  tracks	
  for	
  a	
  specified	
  album	
  (title	
  and	
  year);	
  order	
  by	
  track	
  number	
  (ASC).
Q8:	
  Find	
  information	
  for	
  a	
  specified	
  user.
Q9:	
  Find	
  activities	
  for	
  a	
  specified	
  user;	
  order	
  by	
  activity	
  time	
  (DESC).
Q10:	
  Find	
  statistics	
  for	
  a	
  specified	
  track.
Q11:	
  Find	
  user	
  activities	
  for	
  a	
  specified	
  track;	
  order	
  by	
  activity	
  time	
  (DESC).
Q12:	
  Find	
  user	
  activities	
  for	
  a	
  specified	
  activity	
  type.
…
Performer
name K
type
country
style
founded
born
died
Performers_by_style
style K
name C↑
Albums_by_performer
performer 	
  	
  	
  	
  K
year 	
  	
  	
  	
  C↓
title 	
  	
  	
  	
  C↑
genre
Albums_by_genre
genre K
performer	
   C↑
year C↓
title C↑
Tracks_by_album
album K
year K
number 	
   C↑
performer	
   S
genre S
title
Albums_by_track
track K
performer	
   C↑
year C↓
title C↑
Album
title K
year K
performer
genre
tracks	
  (map)
Q2
Q2
Q4
Q3
Q3
Q4
Q5
Q5
Q6
Q1
Q3
Q3
Q7
Q7
Q7
User
id K
name	
  
email
preferences	
  (set)
Q8
Activities_by_user
user K
activity (timeuuid) C↓
type IDX
album_title
album_year
track_title
rating
Activities_by_track
album_title K
album_year K
track_title K
activity	
  (timeuuid) C↓
user
type
rating
Track_stats
album_title K
album_year K
track_title K
num_ratings	
  (counter)
sum_ratings	
  (counter)
num_plays	
  (counter)
Q9
Q8
Q10
Q11
Q12
A sample data
model
Writes
How does Cassandra write so fast?
•  Cassandra is a log-structured storage engine
•  Data is sequentially appended, not placed in pre-set locations
RDBMS
Seeks and writes values to !
various pre-set locations
CASSANDRA
Continuously appends to a log
?
?
What are the key components of the write path?
•  Each node implements four key components to handle its writes
•  Memtables – in-memory tables corresponding to CQL tables, with indexes
•  CommitLog – append-only log, replayed to restore downed node's Memtables
•  SSTables – Memtable snapshots periodically flushed to disk, clearing heap
•  Compaction – periodic process to merge and streamline SSTables
•  When any node receive any write request
1.  The record appends to the CommitLog, and
2.  The record appends to the Memtable for this record's target CQL table
3.  Periodically, Memtables flush to SSTables, clearing JVM heap and CommitLog
4.  Periodically, Compaction runs to merge and streamline SSTables
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
How does the write path flow on a node?
©2014 DataStax Training. Use only with permission. Slide 5
Node memory
Node file system
Client
partition key1 first:Oscar last:Orange level:42
partition key2 first:Ricky last:Red
Memtable (corresponds to a CQL table)
Coordinator
CommitLog
AppendOnly
… … … …
… … … …
… … … …
SSTables
Flush current state to SSTable
Compact related!
SSTables
W
rite !
<3, Betty, Blue, 63>
Acknowledge
partition key3 first:Betty last:Blue level:63
Compaction
Each write request …
Periodically …
Periodically …
What are Memtables and how are they flushed to disk?
•  Memtables are in-memory representations of a CQL table
•  Each node has a Memtable for each CQL table in the keyspace
•  Each Memtable accrues writes and provides reads for data not yet flushed
•  Updates to Memtables mutate the in-memory partition
•  When a Memtable flushes to disk
1.  Current Memtable data is written to a new immutable SSTable on disk
2.  JVM heap space is reclaimed from the flushed data
3.  Corresponding CommitLog entries are marked as flushed
partition key1 first:Oscar last:Orange level:42
partition key2 first:Ricky last:Red
Memtable
partition key3 first:Betty last:Blue level:63
What is a SSTable and what are its characteristics?
•  A SSTable ("sorted string table") is
•  an immutable file of sorted partitions
•  written to disk through fast, sequential i/o
•  contains the state of a Memtable when flushed
•  The current data state of a CQL table is comprised of
•  its corresponding Memtable plus
•  all current SSTables flushed from that Memtable
•  SSTables are periodically!
compacted from many to one
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
SSTables
What is a SSTable and what are its characteristics?
•  For each SSTable, two !
structures are created
•  Partition index – list of !
its primary keys and row !
start positions
•  Partition summary – in-
memory sample of its
partition index (default: 1
partition key of 128)
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
SSTables
partition key1 first:Oscar last:Orange level:42
partition key2 first:Ricky last:Red
Memtable (corresponds to a CQL table)
partition key3 first:Betty last:Blue level:63
… … … …
… … … …
… … … …
… … … …
Summary
Index
Summary
Index
Summary
Index
What is compaction?
•  Updates do mutate Memtable partitions, but
its SSTables are immutable
•  no SSTable seeks/overwrites
•  SSTables just accrue new !
timestamped updates
•  So, SSTables must be !
periodically compacted
•  related SSTables are merged
•  most recent version of each !
column is compiled to one !
partition in one new SSTable
•  partitions marked for !
deletion are evicted
•  old SSTables are deleted
SSTables
partition key1 first:Oscar last:Orange level:42
partition key2 first:Ricky last:Red
Memtable (corresponds to a CQL table)
partition key3 first:Betty last:Blue level:63
Summary
Index
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
… … … …
Compaction
Note, Compaction and the Read Path are discussed in !
further detail later in this course.
What is sstable2json?
•  bin/sstable2json is a utility which exports an SSTable in JSON
format, for testing and debugging
•  -k exclude a set of keys specified in HEX format (limit: 500)
•  -x exclude a specified set of keys (limit: 500)
•  -e enumerate keys only
©2014 DataStax Training. Use only with permission. Slide 22
./sstable2json [full_path_to_SSTable_Data_file] | more
READS
How does the read path flow on each node?
MemTable (e.g., player)
Coordinator
SSTables (e.g., player)
… … … …
pk7 … … level:42!
timestamp 1114
pk1 … … …
pk7 first:Betty!
timestamp 541
last:Blue!
timestamp 541
level:63!
timestamp 541
pk2 … … …
pk7 first:Elizabeth!
timestamp 994
pk7 first:Elizabeth last:Blue level:42
Row Cache (optional)
pk1 … … …
pk2 … … …
Read
<pk7> Hit
pk1, pk2pk1, pk2, pk7
Node memory
Node file system
Off Heap On HeapRow cache hit
How does the read path flow on each node?
MemTable (e.g., player)
SSTables (e.g., player)
… … … …
pk7 … … level:42!
timestamp 1114
pk1 … … …
pk7 first:Betty!
timestamp 541
last:Blue!
timestamp 541
level:63!
timestamp 541
pk2 … … …
pk7 first:Elizabeth!
timestamp 994
pk7 first:Elizabeth last:Blue level:42
Row Cache (optional)
pk1 … … …
pk2 … … …
pk1, pk2
Bloom
Filter
Bloom
Filter
Bloom
Filter
Miss
pk1, pk2, pk7
Node memory
Node file system
Hit
Hit
?
?
Key
Cache!
pk7!
Read
<pk7>
Off Heap On HeapKey cache hit
Coordinator
How does the read path flow on each node?
MemTable (e.g., player)
SSTables (e.g., player)
… … … …
pk7 … … level:42!
timestamp 1114
pk1 … … …
pk7 first:Betty!
timestamp 541
last:Blue!
timestamp 541
level:63!
timestamp 541
pk2 … … …
pk7 first:Elizabeth!
timestamp 994
pk7 first:Elizabeth last:Blue level:42
Row Cache (optional)
pk1 … … …
pk2 … … …
pk1, pk2
Bloom
Filter
Bloom
Filter
Bloom
Filter
Miss
pk1, pk2, pk7
Node memory
Node file system
Miss Partition!
Summary
Partition!
Index
Miss Partition!
Summary
Partition!
Index
?
?
Key
Cache!
!pk7
Read
<pk7>
Off Heap On HeapRow and Key miss
Coordinator
What is a Bloom filter and how does it optimize a read?
•  A probabilistic data structure testing if a key may be in a SSTable
•  each SSTable has a Bloom filter on disk, used from off-heap memory
•  false positives are possible, false negatives are not
•  larger tables have a higher possibility of false positives
•  1gb to 2gb per billion partitions in a SSTable
•  Eliminates seeking a partition key in any SSTable without it
Bloom
Filter
Bloom
Filter
Bloom
Filter
?
?
Key
Cache!
!
pk1 … … …
pk7 first:Betty!
timestamp 541
last:Blue!
timestamp 541
level:63!
timestamp 541
pk2 … … …
pk7 first:Elizabeth!
timestamp 994
pk1 … … …
pk2 … … …
Partition!
Index
Hit
Hit
How do you execute CQL queries in cqlsh?
Where do I Learn more?!
For the Dev: http://www.datastax.com/dev
Docs:http://docs.datastax.com/en/index.html
Planet C*: http://planetcassandra.org/
Driver Guide: http://planetcassandra.org/getting-
started-with-apache-cassandra-and-java/
My favorite blogs: http://tobert.github.io/
http://patrickmcfadin.com/
http://rustyrazorblade.com/
https://ahappyknockoutmouse.wordpress.com/author/
anukeus/
http://thelastpickle.com/blog/
My favorite C* book: http://www.amazon.com/
Cassandra-High-Availability-Robbie-Strickland/dp/
1783989122
DataStax Academy: https://academy.datastax.com/
Free Training: http://www.datastax.com/what-we-
offer/products-services/training
yo
u Cassandra
Dani Traphagen


http://datastax.com


dani.traphagen@datastax.com


@dtrapezoid

More Related Content

What's hot

An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...DataStax Academy
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...DataStax
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraFolio3 Software
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using KafkaKnoldus Inc.
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLJim Mlodgenski
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...DataStax
 
Conhecendo Apache Cassandra @Movile
Conhecendo Apache Cassandra  @MovileConhecendo Apache Cassandra  @Movile
Conhecendo Apache Cassandra @MovileEiti Kimura
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0J.B. Langston
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internalsnarsiman
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loadingalex_araujo
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScyllaDB
 

What's hot (20)

Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Cassandra
CassandraCassandra
Cassandra
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
 
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
The Missing Manual for Leveled Compaction Strategy (Wei Deng & Ryan Svihla, D...
 
Conhecendo Apache Cassandra @Movile
Conhecendo Apache Cassandra  @MovileConhecendo Apache Cassandra  @Movile
Conhecendo Apache Cassandra @Movile
 
Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0Cassandra Troubleshooting 3.0
Cassandra Troubleshooting 3.0
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internals
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loading
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
 

Viewers also liked

Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraDataStax
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013jbellis
 
A Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraA Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraDataStax Academy
 
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...DataStax Academy
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestDuyhai Doan
 
Cassandra Summit 2014: CQL Under the Hood
Cassandra Summit 2014: CQL Under the HoodCassandra Summit 2014: CQL Under the Hood
Cassandra Summit 2014: CQL Under the HoodDataStax Academy
 
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...DataStax
 
DataStax: A deep look at the CQL WHERE clause
DataStax: A deep look at the CQL WHERE clauseDataStax: A deep look at the CQL WHERE clause
DataStax: A deep look at the CQL WHERE clauseDataStax Academy
 
Cassandra for the ops dos and donts
Cassandra for the ops   dos and dontsCassandra for the ops   dos and donts
Cassandra for the ops dos and dontsDuyhai Doan
 
24 compliments for guys they’ll never forget
24 compliments for guys they’ll never forget24 compliments for guys they’ll never forget
24 compliments for guys they’ll never forgetwomansvibe
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)DataStax Academy
 
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesLorenzo Alberton
 

Viewers also liked (13)

Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013
 
A Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache CassandraA Deep Dive Into Understanding Apache Cassandra
A Deep Dive Into Understanding Apache Cassandra
 
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
Cassandra Day London 2015: Introduction to Apache Cassandra and DataStax Ente...
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
 
Cassandra Summit 2014: CQL Under the Hood
Cassandra Summit 2014: CQL Under the HoodCassandra Summit 2014: CQL Under the Hood
Cassandra Summit 2014: CQL Under the Hood
 
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
 
Ricki k
Ricki kRicki k
Ricki k
 
DataStax: A deep look at the CQL WHERE clause
DataStax: A deep look at the CQL WHERE clauseDataStax: A deep look at the CQL WHERE clause
DataStax: A deep look at the CQL WHERE clause
 
Cassandra for the ops dos and donts
Cassandra for the ops   dos and dontsCassandra for the ops   dos and donts
Cassandra for the ops dos and donts
 
24 compliments for guys they’ll never forget
24 compliments for guys they’ll never forget24 compliments for guys they’ll never forget
24 compliments for guys they’ll never forget
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle TreesModern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
Modern Algorithms and Data Structures - 1. Bloom Filters, Merkle Trees
 

Similar to Intro to Cassandra

SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftAmazon Web Services
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandrarantav
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataAmazon Web Services
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectMorningstar Tech Talks
 
Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3DataStax
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Web Services
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Jon Haddad
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
The Right Data for the Right Job
The Right Data for the Right JobThe Right Data for the Right Job
The Right Data for the Right JobEmily Curtin
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSCobus Bernard
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jugDuyhai Doan
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Matthias Niehoff
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 

Similar to Intro to Cassandra (20)

SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon Redshift
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big Data
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Le monde NOSQL pour les spécialistes du relationnel,
Le monde NOSQL pour les spécialistes du relationnel, Le monde NOSQL pour les spécialistes du relationnel,
Le monde NOSQL pour les spécialistes du relationnel,
 
Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech Talks
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
The Right Data for the Right Job
The Right Data for the Right JobThe Right Data for the Right Job
The Right Data for the Right Job
 
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWSAWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
AWS SSA Webinar 20 - Getting Started with Data Warehouses on AWS
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jug
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 

More from DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Recently uploaded

AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 

Recently uploaded (20)

AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 

Intro to Cassandra

  • 2. Background ‣ White paper Blend • Amazon Dynamo -07 • Google BigTable- 06 ‣ What about Facebook? ‣ Open Source Successes ‣ & cue DataStax ‣ so What’s in a name?
  • 6. ‣The network is reliable. ‣Latency is zero. ‣Bandwidth is infinite. ‣The network is secure. ‣Topology doesn't change. ‣There is one administrator. ‣Transport cost is zero. ‣The network is homogeneous. http://en.wikipedia.org/wiki/Fallacies_of_distributed_computing What we believe:
  • 7. Hey, I’ve got some data!
  • 8. I’ve got some bigger data!
  • 13. San Francisco New York Stockholm ‣ Masterless - Peer to Peer ‣ High Availability ‣ No SPOFs ‣ Multi-dC ‣ Runs on Commodity Hardware ‣ Linear Scalability: More throughput? ‣ Fast ‣ Distributed ‣ Easy to operationally Manage Architected w/ scale in mind
  • 14. Node 1 Node 3 Node 2Node 4Node 8 Node 5 Node 6 Node 7 Data Center - East Node 1 Node 3 Node 2Node 4 Data Center - West Rack 1 Rack 2 Node 2 Rack 1 Rack 2 Cassandra Cluster - 263+ 263 Token Range (Murmur3) A peer to peer set of nodes ‣ Node – one Cassandra instance ‣ Rack – a logical set of nodes ‣ Data Center – a logical set of racks ‣ Cluster – the full set of nodes which map to a single complete token ring What is a c* Cluster?
  • 15. Node 1 1st copy Node 4 Node 5 Node 2 2nd copy Node 3 3rd copy Peer to Peer
  • 16. Node 1 1st copy Node 4 Node 5 Node 2 2nd copy Node 3 3rd copy Node 1 1st copy Node 4 Node 5 Node 2 2nd copy Node 3 DC: EUROPEDC: USA Data Centers
  • 17. Node 1 1st copy Node 4 Node 5 Node 2 2nd copy Node 3 3rd copy Parallel Write Write Consistency Level = QUORUM Replication Factor = 3 5 µs ack 12 µs ack 500 µs ack 12 µs ack Node 4 Node 2 2nd copy Node 1 1st copy Node 3 3rd copy Node 4 Node 5 Tunable Consistency
  • 18. Node 1 1st copy Node 4 Node 5 Node 2 2nd copy Parallel Read Read Consistency Level = QUORUM Replication Factor = 3 Node 4 Node 2 2nd copy Node 1 1st copy Node 3 3rd copy Node 3 3rd copy Node 4 Node 5 Hints Continuous availability
  • 20. CQL
  • 21. How to create, use and drop keyspaces/schemas? •  To create a keyspace •  To assign the working default keyspace for a cqlsh session •  To delete a keyspace and all internal data objects CREATE KEYSPACE musicdb WITH replication = { 'class': 'SimpleStrategy', 'replication_factor' : 3 }; DROP KEYSPACE musicdb; USE musicdb;
  • 22. What is the syntax of the CREATE TABLE statement? •  The CQL below creates a table in the current keyspace CREATE TABLE performer ( name VARCHAR, type VARCHAR, country VARCHAR, style VARCHAR, founded INT, born INT, died INT, PRIMARY KEY (name) ); CREATE TABLE performer ( name VARCHAR PRIMARY KEY, type VARCHAR, country VARCHAR, style VARCHAR, founded INT, born INT, died INT ); Primary key declared inline Primary key declared in separate clause
  • 24. What is a CQL table and how is it related to ! a column family? •  A CQL table is a column family •  CQL tables provide two-dimensional views of a column family, which contains potentially multi-dimensional data, due to composite keys and collections •  CQL table and column family are largely interchangeable terms •  Not surprising when you recall tables and relations, columns and attributes, rows and tuples in relational databases •  Supported by declarative language Cassandra Query Language •  Data Definition Language, subset of CQL •  SQL-like syntax, but with somewhat different semantics •  Convenient for defining and expressing Cassandra database schemas
  • 25. What are row, row key, column key, ! and column value? •  Row is the smallest unit that stores related data in Cassandra •  Rows – individual rows constitute a column family •  Row key – uniquely identifies a row in a column family •  Row – stores pairs of column keys and column values •  Column key – uniquely identifies a column value in a row •  Column value – stores one value or a collection of values ©2014 DataStax Training. Use only with permission. Slide 4 row$ key va cola vb colb vc colc vd cold Column'keys'(or'column'names)Row Column'values'(or'cells)
  • 26. partitions What are partition, partition key, row, column, ! and cell? •  Table with single-row partitions •  Column family view ©2014 DataStax Training. Use only with permission. Slide 17 performer born country died founded style type John Lennon 1940 England 1980 Rock artist Paul McCartney 1942 England Rock artist The Beatles England 1957 Rock band rows columns partition key cells
  • 28. What is a data modeling framework? •  Defines transitions between models •  Query-driven methodology •  Formal analysis and validation •  Defines a scientific approach to data modeling •  Modeling rules •  Mapping patterns •  Schema optimization techniques Conceptual Model Logical Model Physical Model Query6Driven Methodology Analysis::&: Validation
  • 29. What is a conceptual data model? •  Conceptual data model for music data •  ER diagram (Chen notation) •  Describes entities, relationships, roles, keys, cardinalities •  What is possible and what is not in existing or future data Album title year genre releasesPerformername founded country 1 n style IsA ArtistBand disjoint5 covering born died has3 member n m period format cover5image number title 1 n Track has User id email name preferences performs m 1 involvedIn 1 n IsA RatePlay disjoint5 not5covering Activity id timestamp rating
  • 30. Q1 ACCESS  PATTERNS Q1:  Find  performers  for  a  specified  style;  order  by  performer  (ASC). Q2:  Find  information  for  a  specified  performer  (artist  or  band). Q3:  Find  information  for  a  specified  album  (title  and  year). Q4:  Find  albums  for  a  specified  performer;  order  by  album  release  year  (DESC)  and  title  (ASC). Q5:  Find  albums  for  a  specified  genre;  order  by  performer  (ASC),  year  (DESC),  and  title  (ASC). Q6:  Find  albums  and  performers  for  a  specified  track  title;  order  by  performer  (ASC),  year  (DESC),  and  title  (ASC). Q7:  Find  tracks  for  a  specified  album  (title  and  year);  order  by  track  number  (ASC). Q8:  Find  information  for  a  specified  user. Q9:  Find  activities  for  a  specified  user;  order  by  activity  time  (DESC). Q10:  Find  statistics  for  a  specified  track. Q11:  Find  user  activities  for  a  specified  track;  order  by  activity  time  (DESC). Q12:  Find  user  activities  for  a  specified  activity  type. … Performer name K type country style founded born died Performers_by_style style K name C↑ Albums_by_performer performer        K year        C↓ title        C↑ genre Albums_by_genre genre K performer   C↑ year C↓ title C↑ Tracks_by_album album K year K number   C↑ performer   S genre S title Albums_by_track track K performer   C↑ year C↓ title C↑ Album title K year K performer genre tracks  (map) Q2 Q2 Q4 Q3 Q3 Q4 Q5 Q5 Q6 Q1 Q3 Q3 Q7 Q7 Q7 User id K name   email preferences  (set) Q8 Activities_by_user user K activity (timeuuid) C↓ type IDX album_title album_year track_title rating Activities_by_track album_title K album_year K track_title K activity  (timeuuid) C↓ user type rating Track_stats album_title K album_year K track_title K num_ratings  (counter) sum_ratings  (counter) num_plays  (counter) Q9 Q8 Q10 Q11 Q12 A sample data model
  • 32. How does Cassandra write so fast? •  Cassandra is a log-structured storage engine •  Data is sequentially appended, not placed in pre-set locations RDBMS Seeks and writes values to ! various pre-set locations CASSANDRA Continuously appends to a log ? ?
  • 33. What are the key components of the write path? •  Each node implements four key components to handle its writes •  Memtables – in-memory tables corresponding to CQL tables, with indexes •  CommitLog – append-only log, replayed to restore downed node's Memtables •  SSTables – Memtable snapshots periodically flushed to disk, clearing heap •  Compaction – periodic process to merge and streamline SSTables •  When any node receive any write request 1.  The record appends to the CommitLog, and 2.  The record appends to the Memtable for this record's target CQL table 3.  Periodically, Memtables flush to SSTables, clearing JVM heap and CommitLog 4.  Periodically, Compaction runs to merge and streamline SSTables
  • 34. … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … How does the write path flow on a node? ©2014 DataStax Training. Use only with permission. Slide 5 Node memory Node file system Client partition key1 first:Oscar last:Orange level:42 partition key2 first:Ricky last:Red Memtable (corresponds to a CQL table) Coordinator CommitLog AppendOnly … … … … … … … … … … … … SSTables Flush current state to SSTable Compact related! SSTables W rite ! <3, Betty, Blue, 63> Acknowledge partition key3 first:Betty last:Blue level:63 Compaction Each write request … Periodically … Periodically …
  • 35. What are Memtables and how are they flushed to disk? •  Memtables are in-memory representations of a CQL table •  Each node has a Memtable for each CQL table in the keyspace •  Each Memtable accrues writes and provides reads for data not yet flushed •  Updates to Memtables mutate the in-memory partition •  When a Memtable flushes to disk 1.  Current Memtable data is written to a new immutable SSTable on disk 2.  JVM heap space is reclaimed from the flushed data 3.  Corresponding CommitLog entries are marked as flushed partition key1 first:Oscar last:Orange level:42 partition key2 first:Ricky last:Red Memtable partition key3 first:Betty last:Blue level:63
  • 36. What is a SSTable and what are its characteristics? •  A SSTable ("sorted string table") is •  an immutable file of sorted partitions •  written to disk through fast, sequential i/o •  contains the state of a Memtable when flushed •  The current data state of a CQL table is comprised of •  its corresponding Memtable plus •  all current SSTables flushed from that Memtable •  SSTables are periodically! compacted from many to one … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … SSTables
  • 37. What is a SSTable and what are its characteristics? •  For each SSTable, two ! structures are created •  Partition index – list of ! its primary keys and row ! start positions •  Partition summary – in- memory sample of its partition index (default: 1 partition key of 128) … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … SSTables partition key1 first:Oscar last:Orange level:42 partition key2 first:Ricky last:Red Memtable (corresponds to a CQL table) partition key3 first:Betty last:Blue level:63 … … … … … … … … … … … … … … … … Summary Index Summary Index Summary Index
  • 38. What is compaction? •  Updates do mutate Memtable partitions, but its SSTables are immutable •  no SSTable seeks/overwrites •  SSTables just accrue new ! timestamped updates •  So, SSTables must be ! periodically compacted •  related SSTables are merged •  most recent version of each ! column is compiled to one ! partition in one new SSTable •  partitions marked for ! deletion are evicted •  old SSTables are deleted SSTables partition key1 first:Oscar last:Orange level:42 partition key2 first:Ricky last:Red Memtable (corresponds to a CQL table) partition key3 first:Betty last:Blue level:63 Summary Index … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … Compaction Note, Compaction and the Read Path are discussed in ! further detail later in this course.
  • 39. What is sstable2json? •  bin/sstable2json is a utility which exports an SSTable in JSON format, for testing and debugging •  -k exclude a set of keys specified in HEX format (limit: 500) •  -x exclude a specified set of keys (limit: 500) •  -e enumerate keys only ©2014 DataStax Training. Use only with permission. Slide 22 ./sstable2json [full_path_to_SSTable_Data_file] | more
  • 40. READS
  • 41. How does the read path flow on each node? MemTable (e.g., player) Coordinator SSTables (e.g., player) … … … … pk7 … … level:42! timestamp 1114 pk1 … … … pk7 first:Betty! timestamp 541 last:Blue! timestamp 541 level:63! timestamp 541 pk2 … … … pk7 first:Elizabeth! timestamp 994 pk7 first:Elizabeth last:Blue level:42 Row Cache (optional) pk1 … … … pk2 … … … Read <pk7> Hit pk1, pk2pk1, pk2, pk7 Node memory Node file system Off Heap On HeapRow cache hit
  • 42. How does the read path flow on each node? MemTable (e.g., player) SSTables (e.g., player) … … … … pk7 … … level:42! timestamp 1114 pk1 … … … pk7 first:Betty! timestamp 541 last:Blue! timestamp 541 level:63! timestamp 541 pk2 … … … pk7 first:Elizabeth! timestamp 994 pk7 first:Elizabeth last:Blue level:42 Row Cache (optional) pk1 … … … pk2 … … … pk1, pk2 Bloom Filter Bloom Filter Bloom Filter Miss pk1, pk2, pk7 Node memory Node file system Hit Hit ? ? Key Cache! pk7! Read <pk7> Off Heap On HeapKey cache hit Coordinator
  • 43. How does the read path flow on each node? MemTable (e.g., player) SSTables (e.g., player) … … … … pk7 … … level:42! timestamp 1114 pk1 … … … pk7 first:Betty! timestamp 541 last:Blue! timestamp 541 level:63! timestamp 541 pk2 … … … pk7 first:Elizabeth! timestamp 994 pk7 first:Elizabeth last:Blue level:42 Row Cache (optional) pk1 … … … pk2 … … … pk1, pk2 Bloom Filter Bloom Filter Bloom Filter Miss pk1, pk2, pk7 Node memory Node file system Miss Partition! Summary Partition! Index Miss Partition! Summary Partition! Index ? ? Key Cache! !pk7 Read <pk7> Off Heap On HeapRow and Key miss Coordinator
  • 44. What is a Bloom filter and how does it optimize a read? •  A probabilistic data structure testing if a key may be in a SSTable •  each SSTable has a Bloom filter on disk, used from off-heap memory •  false positives are possible, false negatives are not •  larger tables have a higher possibility of false positives •  1gb to 2gb per billion partitions in a SSTable •  Eliminates seeking a partition key in any SSTable without it Bloom Filter Bloom Filter Bloom Filter ? ? Key Cache! ! pk1 … … … pk7 first:Betty! timestamp 541 last:Blue! timestamp 541 level:63! timestamp 541 pk2 … … … pk7 first:Elizabeth! timestamp 994 pk1 … … … pk2 … … … Partition! Index Hit Hit
  • 45. How do you execute CQL queries in cqlsh?
  • 46. Where do I Learn more?! For the Dev: http://www.datastax.com/dev Docs:http://docs.datastax.com/en/index.html Planet C*: http://planetcassandra.org/ Driver Guide: http://planetcassandra.org/getting- started-with-apache-cassandra-and-java/ My favorite blogs: http://tobert.github.io/ http://patrickmcfadin.com/ http://rustyrazorblade.com/ https://ahappyknockoutmouse.wordpress.com/author/ anukeus/ http://thelastpickle.com/blog/ My favorite C* book: http://www.amazon.com/ Cassandra-High-Availability-Robbie-Strickland/dp/ 1783989122 DataStax Academy: https://academy.datastax.com/ Free Training: http://www.datastax.com/what-we- offer/products-services/training