Introduction to NoSQL and Cassandra

Introduction to NoSQL and
Apache Cassandra
Patricio Echagüe
patricioe@gmail.com
@patricioe

About me

Present:
Relateiq (Data Processing and Scalability)
Hector committer
Past:
DataStax (The Cassandra Company)
Cassandra/Hadoop distribution (former Brisk)
Cassandra FS
CQL connection pool
Cassandra contributions

What is “NoSQL” ?

systems able to store and retrieve great
quantities of data with none or little
information about the relationships
between them.
Generally they don't have a SQL like
language for data manipulation and
their schema is more relaxed than
traditional RDBM systems.
Full ACID is not often guaranteed.

Brewer's CAP theorem

Consistency: all replicas agree on the
same value
Availability: always get an answer from
a replica
Partition Tolerance: the system works
even if replicas can't talk

You can have 2 of these

CAP Classification
Consistency

Availability Partitioning

Types

- Relationals
- Key-Value stores
- Columnar (column-oriented)
- Graph databases
- Document

What's eventual consistency?

It is a promise that eventually, in the
absence of new writes, all replicas that
are responsible for a data item will
agree on the same version

How eventual is eventual?
Write to 1 replica and Read from 1 replica of a total
of 3

How eventual is eventual?
Write to 2 replicas and Read from 2 replicas of a total
of 3

Why is it good?

because, by contacting fewer
replicas, read and write operations
complete more quickly, lowering
latency.

Cassandra is a distributed
, fault
tolerant, scalable, column
oriented and tunable
consistency data store.

Cassandra has
CAP
But C is tunable

Key Concepts

Multi-Master, Multi-DC

Linearly scalable

Integrated Caching

Performs well with Larger-than-memory Datasets

Tunable consistency

Idempotent (client clock)

Schema Optional

No ACID transactions, No Locking

Generally complements another system(s)
(Not intended to be one-size-fits-all)

You should always use the right tool for the right job

Data Model

“4-Dimensional Hash Table”

A Keyspace contains a collection of Column Families
(Controls replication)

A Column Family contains Rows

A Row have a key, and each row has columns
(No need to define the columns before hand)

Each column has a name and a value and a
timestamp
(TTL is optional)

Data Model – (RDBMS)

Keyspace (Schema)

Column Family(CF) (table)

Row (row)

Column (column*) → may not be present in all
rows

Data Model – Column Family

Static Column Family
- Model my object data

Dynamic Column Family
- Precalculated / Prematerialized query results

Nothing stopping you from mixing them!

Data Model – Static Column Family

Data Model – Dynamic CF

stats for a specific date

Data Model – Dynamic CF

Timeline of tweets by a user
Timeline of tweets by all of the people a user is
following
List of comments sorted by score
List of friends grouped by state
Metrics for a time bucket

...

Let's store “foo”

Foo

…

But if that node is down?

Foo

...

Let's store “foo” in 3 nodes.
This is the Replication Factor(N)

Foo
Foo

Foo

...

Now we need to know what nodes the key was written
to so we can read it later

...

The Initial Token specifies the upper value of the key
range each node is responsible for

#1
#5 <= 'd'
<= 'z' 'e f g h I j k '

#2
<= 'k'
#4
<= 'u'
#3
<= 'p'
a b c d e f g h I j k l m n …. z

...

Gossip is the protocol Cassandra uses to interchange
information with nodes in the cluster (a.k.a. Ring)

…


For example, what nodes owns the key “foo”

...


For example, what nodes owns the key “foo”
#1
Read 'foo'
#5 <= 'd'
Client 'e f g h I j k '
<= 'z'

#2
'foo'
<= 'k'
#4
<= 'u'
#3
<= 'p'

...

A Partitioner is used to transform the key.
“foo1” and “foo2” may end up in different nodes

...


The most commonly used is Random Partitioner

“foo1” md5(“foo1”) “A99A0B....”

...


The most commonly used is Random Partitioner

#1 'foo1'
#5

#2
'foo2'
#4

#3

...

A Replica Placement Strategy determines which
nodes contain replicas

...


Simple Strategy place them clockwise

'foo1'
#1
#5

'foo1'
#2

#4

#3 'foo1'

...


Network Topology Strategy place them in different
DCs
DC1:3 DC2:1
'foo1'
#1 #1 'foo1'
#5 #5
'foo1'
#2 #2
#4 #4
#3 #3
'foo1'

...

Consistency Level determines how many replicas to
contact to

...

contact to

CL = 1

#1 'foo1'
Client #
5

'foo1'
#2

#
4
#3 'foo1'

...

contact to

CL = QUORUM

#1 'foo1'
Client #
5

'foo1'
#2

#
4
#3 'foo1'

Consistency For Writes
ANY
ONE
TWO
THREE
QUORUM
LOCAL_QUORUM
EACH_QUORUM
ALL

Consistency For Reads
ONE
TWO
THREE
QUORUM
LOCAL_QUORUM
EACH_QUORUM
ALL

Consistency In Math Term

Cassandra guarantees strong consistency if

(nodes_written + nodes_read) >
replication_factor

R+W>N

Back to the example..

contact to

CL = QUORUM

#1 'foo1'
Client #
5

'foo1'
#2

#
4
#3 'foo1'

...

But what if node #3 is down?

...


hint
#1 'foo1'
Client #
5

'foo1'
#2

#
4
#3

...


The coordinator nodes will store a hint and will replay
that mutation when the down node comes back up.

This is known as Hinted Handoff

...

Node #5 will replay the hint to node #3 when it comes
back online

hint
'foo1'
#1
Client #5

'foo1'
#2

#4

#3 'foo1'

...

And if node #5 dies before sending the hints to node
#3?

hint
#1 'foo1'
Client #5

'foo1'
#2

#4

#3

...

If using Quorum, node #4 will request for 'foo' to all
the replicas

hint
#1 'foo1'
Client #5

'foo1'
#2

#4

#3 ''

...

If the result received do not match, a Read Repair
process is performed in the background

hint
#1 'foo1'
Client #5

'foo1'
#2

#4

#3 ''

...

And the missing or not up-to-date value is pushed to
the out of date node. #3 in this case

hint
#1 'foo1'
Client #5

'foo1'
#2

#4

'foo' != '' #3 'foo'

...

The last feature to achieve consistency is the Anti
Entropy Service (AES)

Should run periodically as part of the cluster
maintenance or when a node was down

Recap Consistency Features

Read Repair

Anti Entropy Service (AES)

Hinted Handoff

scaling

“e”
“z”

“j”

“t”

“o”

scaling

“e”
“?”
“z”

“j”

“t”

“o”

scaling

“e”
“z”
“g”

“j”

“t”

“o”

Nodetool move ?

Want 2x performance ?!

Add 2x nodes
'No downtime' included!


“e”
“z”

“j”

“t”

“o”


“b”
“e”
“z”
“g”

“v”
“j”

“t”
“l”
“q” “o”

With RF= 3 we could lose

“b”
“e”
“z”
X “g”

X
“v”
“j”

“t”

X
“l”
“q” “o”

With RF= 3 we could lose
?
“b”
“e”
X
“z”
X “g”

X“v”
“j”

“t”

X
“l”
“q” “o”

Vs others

b e
z
g

v
j

t l
q o

Recap

Replication Factor
Tokens
Gossip
Partitioner
Replica Placement
Consistency
Hinted Handoff
Read Repair
AES
Clustering

Performance

Reads on par with writes

Storage - SSTable

- SSTables are sorted

- Immutable (“Merge on read”)

- Newest timestamp wins

Storage – Compaction

Merges SSTables together into a larger SSTables

Removes Tombstones

Rebuild primary and secondary indexes


Two types:

- Size-tiered compaction

- Leveled compaction


Size-tiered compaction

Performance no guaranteed
Row may be across many SSTables
Waste of space
Good for write heavy ops
Rows are written once
100% more space than SSTables


Leveled compaction

Grouped into levels
No overlapping within a level
Each level is ten times as large
90% of reads satisfied with 1 SSTable
Twice as much I/O

Recap

SSTable
Memtable
Row Cache
Compaction

SSDs and caching
Before - 48 Cassandra on m2.4xlarge. 36 EVcache on
m2.xlarge
After - 12 Cassandra on hi1.4xlarge

Five general categories

Retrieving
Write/Update/Remove (all the same op!)
Increment counters

Meta Information
Schema Manipulation
CQL Execution

Insertion/Deletion => Mutation

Again: Every mutation is an insert!
- Merge on read
- Sstables are immutable
- Highest timestamp wins

CQL

INSERT INTO Hollywood.NerdMovies (user_uuid, fan)
VALUES ('cfd66ccc-d857-4e90-b1e5-df98a3d40cd6', 'johndoe')
USING CONSISTENCY LOCAL_QUORUM AND TTL 86400;

Using a Client

- Hector
http://hector-client.org
- Astyanax
https://github.com/Netflix/astyanax
- Pelops
https://github.com/s7/scale7-pelops

Using a Client → Hector

- Most popular Java client
- In use at very large installations
- A number of tools and utilities built on top
- Very active community
- MIT Licensed

Features

- High Level API
- Failover behavior
- High performant connection pool
- JMX counters for management
- Discoverability of new nodes
- Automatic retry of downed hosts
- Suspension of nodes after several timeouts
- Load Balancing: Configurable and extensible
- Locking (Beta)

vs JDBC

Hector is operation-oriented

Whereas

JDBC is connection-oriented

API Abstractions

Templates

Mutator

Thrift

ColumnFamilyTemplate

Familiar, type-safe approach
- based on template-method design pattern
- generic: ColumnFamilyTemplate<K,N>
(K is the key type, N the column name type)

ColumnFamilyTemplate template =
new ThriftColumnFamilyTemplate(keyspaceName,
columnFamilyName,
StringSerializer.get(),
StringSerializer.get());

*** (no generics for clarity)


new ThriftColumnFamilyTemplate(
keyspaceName,

columnFamilyName,

StringSerializer.get(),
Key Format
StringSerializer.get());

Column Name Format
- Cassandra calls this a “comparator”
- Remember: defines column order in on-disk format


ColumnFamilyResult<String, String> res =
cft.queryColumns("patricioe");

String value = res.getString("email");

Date startDate = res.getDate(“DateOfBirth”);

Key Format

Column Name Format

Inserting data with ColumnFamilyUpdater

ColumnFamilyUpdater updater = template.createUpdater(”pato");

updater.setString("companyName",”Relateiq");
updater.addKey(”sabina");
updater.setString("companyName",”Globant");

template.update(updater);

Deleting Data with ColumnFamilyTemplate

template.deleteColumn("zznate", "notNeededStuff");
template.deleteColumn("zznate", "somethingElse");
template.deleteColumn("patricioe", "aDifferentColumnName");
...
template.deleteRow(“someuser”);

template.executeBatch();

Integrating with existing patterns

Hector Object Mapper -> Apache Gora
https://github.com/hector-client/hector/tree/master/object-mapper

Hector JPA*:
https://github.com/riptano/hector-jpa

Spring IOC

CQL: JDBC Driver and Pool in 1.0!

JdbcTemplate FTW!

Development Resources

Hector Documentation (http://hector-client.org)
Cassandra Unit
https://github.com/jsevellec/cassandra-unit

Cassandra Maven Plugin
http://mojo.codehaus.org/cassandra-maven-plugin/

CCM localhost cassandra cluster
https://github.com/pcmanus/ccm

OpsCenter
http://www.datastax.com/products/opscenter

Cassandra AMIs
https://github.com/riptano/CassandraClusterAMI

Want to contribute?

git clone git@github.com:hector-client/hector.git

Summary

- Take advantage of strengths
- idempotence and asynchronicity are your friends
- If it's not in the API, you are probably doing it wrong
- Seek death is still possible if you model incorrectly
- Try Denormalizing (append-only model ?)

Patricio Echagüe
patricioe@gmail.com
@patricioe

Credits
Nate McCall
Aaron Morton (http://thelastpickle.com)
Datastax (http://www.datastax.com)
http://www.slideshare.net/mikiobraun/cassandra-an-introduction

Additional Resources
DataStax Documentation: http://www.datastax.com/docs

Apache Cassandra project wiki: http://wiki.apache.org/cassandra/

“The Dynamo Paper”
http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf

P. Helland. Building on Quicksand
http://arxiv.org/pdf/0909.1788

P. Helland. Life Beyond Distributed Transactions
http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf

S. Anand. “Netflix's Transition to High-Availability Storage Systems”
http://media.amazonwebservices.com/Netflix_Transition_to_a_Key_v3.pdf

“The Megastore Paper”
http://research.google.com/pubs/archive/36971.pdf

Introduction to NoSQL and Cassandra

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Introduction to NoSQL and Cassandra

Similar to Introduction to NoSQL and Cassandra (7)

Recently uploaded

Recently uploaded (20)

Introduction to NoSQL and Cassandra