SlideShare a Scribd company logo
1 of 104
Introduction to NoSQL and
    Apache Cassandra
         Patricio Echagüe
         patricioe@gmail.com
               @patricioe
About me

Present:
 Relateiq (Data Processing and Scalability)
 Hector committer
Past:
 DataStax (The Cassandra Company)
    Cassandra/Hadoop distribution (former Brisk)
    Cassandra FS
    CQL connection pool
    Cassandra contributions
Trends: “NoSQL”
2011
2012
What is “NoSQL” ?

systems able to store and retrieve great
  quantities of data with none or little
  information about the relationships
  between them.
Generally they don't have a SQL like
  language for data manipulation and
  their schema is more relaxed than
  traditional RDBM systems.
Full ACID is not often guaranteed.
Brewer's CAP theorem

Consistency: all replicas agree on the
 same value
Availability: always get an answer from
 a replica
Partition Tolerance: the system works
 even if replicas can't talk



        You can have 2 of these
Brewer's CAP theorem
CAP Classification
                    Consistency




Availability                      Partitioning
Types

-   Relationals
-   Key-Value stores
-   Columnar (column-oriented)
-   Graph databases
-   Document
What's eventual consistency?


It is a promise that eventually, in the
  absence of new writes, all replicas that
  are responsible for a data item will
  agree on the same version
How eventual is eventual?
Write to 1 replica and Read from 1 replica of a total
                        of 3
How eventual is eventual?
Write to 2 replicas and Read from 2 replicas of a total
                         of 3
Why is it good?


  because, by contacting fewer
replicas, read and write operations
 complete more quickly, lowering
              latency.
Cassandra is a distributed
            , fault
 tolerant, scalable, column
    oriented and tunable
  consistency data store.
Cassandra has
     CAP
But C is tunable
What is Apache Cassandra?
Key Concepts

Multi-Master, Multi-DC

Linearly scalable

Integrated Caching

Performs well with Larger-than-memory Datasets

Tunable consistency

Idempotent (client clock)

Schema Optional

No ACID transactions, No Locking
Generally complements another system(s)
(Not intended to be one-size-fits-all)


You should always use the right tool for the right job
Speaking Cassandra
Data Model

“4-Dimensional Hash Table”

A Keyspace contains a collection of Column Families
(Controls replication)

A Column Family contains Rows

A Row have a key, and each row has columns
(No need to define the columns before hand)


Each column has a name and a value and a
  timestamp
(TTL is optional)
Data Model – (RDBMS)

Keyspace (Schema)

Column Family(CF) (table)

Row (row)

Column (column*) → may not be present in all
 rows
Data Model – Column Family

Static Column Family
- Model my object data

Dynamic Column Family
- Precalculated / Prematerialized query results

Nothing stopping you from mixing them!
Data Model – Static Column Family
Data Model – Dynamic CF




           stats for a specific date
Data Model – Dynamic CF

Timeline of tweets by a user
Timeline of tweets by all of the people a user is
following
List of comments sorted by score
List of friends grouped by state
Metrics for a time bucket
...

Let's store “foo”
...

Let's store “foo”




                    Foo
…

But if that node is down?




                            Foo
...

Let's store “foo” in 3 nodes.
This is the Replication Factor(N)




                                    Foo
                  Foo

                            Foo
...

Now we need to know what nodes the key was written
 to so we can read it later
...

The Initial Token specifies the upper value of the key
  range each node is responsible for

                                     #1
                       #5           <= 'd'
                      <= 'z'                   'e f g h I j k '

                                              #2
                                             <= 'k'
                     #4
                    <= 'u'
                                    #3
                                   <= 'p'
a b c d e f g h I j k l m n …. z
...

Gossip is the protocol Cassandra uses to interchange
 information with nodes in the cluster (a.k.a. Ring)
…

Gossip is the protocol Cassandra uses to interchange
 information with nodes in the cluster (a.k.a. Ring)

For example, what nodes owns the key “foo”
...

Gossip is the protocol Cassandra uses to interchange
 information with nodes in the cluster (a.k.a. Ring)

For example, what nodes owns the key “foo”
                                   #1
          Read 'foo'
                        #5        <= 'd'
 Client                                       'e f g h I j k '
                       <= 'z'

                                            #2
                                                     'foo'
                                           <= 'k'
                        #4
                       <= 'u'
                                 #3
                                <= 'p'
...

A Partitioner is used to transform the key.
“foo1” and “foo2” may end up in different nodes
...

A Partitioner is used to transform the key.
“foo1” and “foo2” may end up in different nodes

The most commonly used is Random Partitioner




         “foo1”     md5(“foo1”)   “A99A0B....”
...

A Partitioner is used to transform the key.
“foo1” and “foo2” may end up in different nodes

The most commonly used is Random Partitioner

                               #1     'foo1'
                   #5


                                     #2
        'foo2'
                  #4

                             #3
...

A Replica Placement Strategy determines which
  nodes contain replicas
...

A Replica Placement Strategy determines which
  nodes contain replicas

Simple Strategy place them clockwise

                                   'foo1'
                             #1
                 #5


                                           'foo1'
                                   #2

                 #4

                           #3     'foo1'
...

A Replica Placement Strategy determines which
  nodes contain replicas

Network Topology Strategy place them in different
 DCs
                       DC1:3 DC2:1
                'foo1'
         #1                            #1   'foo1'
 #5                              #5
                       'foo1'
               #2                           #2
 #4                             #4
        #3                            #3
              'foo1'
...

Consistency Level determines how many replicas to
 contact to
...

Consistency Level determines how many replicas to
 contact to

CL = 1

                              #1    'foo1'
 Client           #
                  5

                                            'foo1'
                                     #2

                  #
                  4
                             #3    'foo1'
...

Consistency Level determines how many replicas to
 contact to

CL = QUORUM

                              #1    'foo1'
 Client           #
                  5

                                            'foo1'
                                     #2

                  #
                  4
                             #3    'foo1'
Consistency For Writes
ANY
ONE
TWO
THREE
QUORUM
LOCAL_QUORUM
EACH_QUORUM
ALL
Consistency For Reads
ONE
TWO
THREE
QUORUM
LOCAL_QUORUM
EACH_QUORUM
ALL
Consistency In Math Term

   Cassandra guarantees strong consistency if


   (nodes_written + nodes_read) >
          replication_factor


            R+W>N
Back to the example..

Consistency Level determines how many replicas to
 contact to

CL = QUORUM

                              #1    'foo1'
 Client           #
                  5

                                            'foo1'
                                     #2

                  #
                  4
                             #3    'foo1'
...

But what if node #3 is down?
...

But what if node #3 is down?



             hint
                               #1   'foo1'
 Client             #
                    5

                                         'foo1'
                                    #2

                    #
                    4
                               #3
...

But what if node #3 is down?

The coordinator nodes will store a hint and will replay
  that mutation when the down node comes back up.

This is known as Hinted Handoff
...

Node #5 will replay the hint to node #3 when it comes
 back online


             hint
                                     'foo1'
                               #1
 Client             #5


                                             'foo1'
                                      #2

                    #4

                             #3     'foo1'
...

And if node #5 dies before sending the hints to node
 #3?


             hint
                               #1     'foo1'
 Client             #5


                                            'foo1'
                                      #2

                    #4

                             #3
...

If using Quorum, node #4 will request for 'foo' to all
   the replicas


              hint
                                 #1     'foo1'
 Client              #5


                                                'foo1'
                                           #2

                     #4

                               #3     ''
...

If the result received do not match, a Read Repair
   process is performed in the background


             hint
                               #1    'foo1'
 Client             #5


                                              'foo1'
                                         #2

                    #4

                             #3     ''
...

And the missing or not up-to-date value is pushed to
 the out of date node. #3 in this case


                  hint
                               #1     'foo1'
 Client                  #5


                                            'foo1'
                                      #2

                         #4

          'foo' != ''         #3    'foo'
...

The last feature to achieve consistency is the Anti
  Entropy Service (AES)

Should run periodically as part of the cluster
 maintenance or when a node was down
Recap Consistency Features

Read Repair

Anti Entropy Service (AES)

Hinted Handoff
scaling

                   “e”
           “z”




                         “j”


          “t”

                 “o”
scaling

                   “e”
                               “?”
           “z”




                         “j”


          “t”

                 “o”
scaling

                     “e”
            “z”
                           “g”



                             “j”


           “t”

                   “o”

 Nodetool move ?
Want 2x performance ?!



Add 2x nodes
'No downtime' included!
Want 2x performance ?!

                   “e”
         “z”




                         “j”


        “t”

                 “o”
Want 2x performance ?!

                     “b”
                             “e”
             “z”
                                     “g”


      “v”
                                         “j”


            “t”
                                   “l”
                   “q”     “o”
With RF= 3 we could lose

                     “b”
                             “e”
             “z”
                                   X “g”


      X
      “v”
                                         “j”


            “t”


                           X
                                   “l”
                   “q”     “o”
With RF= 3 we could lose
                       ?
                     “b”
                             “e”
                                 X
             “z”
                                     X “g”


     X“v”
                                           “j”


            “t”


                           X
                                     “l”
                   “q”     “o”
Vs others




         b       e
     z
                         g

 v
                         j

     t               l
         q   o
Recap

Replication Factor
Tokens
Gossip
Partitioner
Replica Placement
Consistency
Hinted Handoff
Read Repair
AES
Clustering
Performance

Reads on par with writes
Scalability
Internals
Read and Write path
Storage - SSTable

- SSTables are sorted

- Immutable (“Merge on read”)

- Newest timestamp wins
Storage – Compaction
Storage – Compaction

Merges SSTables together into a larger SSTables

Removes Tombstones

Rebuild primary and secondary indexes
Storage – Compaction

Two types:

- Size-tiered compaction

- Leveled compaction
Storage – Compaction

Size-tiered compaction

Performance no guaranteed
Row may be across many SSTables
Waste of space
Good for write heavy ops
Rows are written once
100% more space than SSTables
Storage – Compaction

Leveled compaction


Grouped into levels
No overlapping within a level
Each level is ten times as large
90% of reads satisfied with 1 SSTable
Twice as much I/O
Recap

SSTable
Memtable
Row Cache
Compaction
SSDs and caching
Before - 48 Cassandra on m2.4xlarge. 36 EVcache on
  m2.xlarge
After - 12 Cassandra on hi1.4xlarge
API Operations
Five general categories

 Retrieving
 Write/Update/Remove (all the same op!)
    Increment counters

 Meta Information
 Schema Manipulation
 CQL Execution
Insertion/Deletion => Mutation


Again: Every mutation is an insert!
- Merge on read
- Sstables are immutable
- Highest timestamp wins
CQL


INSERT INTO Hollywood.NerdMovies (user_uuid, fan)
  VALUES ('cfd66ccc-d857-4e90-b1e5-df98a3d40cd6', 'johndoe')
  USING CONSISTENCY LOCAL_QUORUM AND TTL 86400;
Hadoop
Using a Client


 - Hector
     http://hector-client.org
 - Astyanax
     https://github.com/Netflix/astyanax
 - Pelops
     https://github.com/s7/scale7-pelops
Using a Client → Hector

 - Most popular Java client
 - In use at very large installations
 - A number of tools and utilities built on top
 - Very active community
 - MIT Licensed
Features

 - High Level API
 - Failover behavior
 - High performant connection pool
 - JMX counters for management
 - Discoverability of new nodes
 - Automatic retry of downed hosts
 - Suspension of nodes after several timeouts
 - Load Balancing: Configurable and extensible
 - Locking (Beta)
Hector's Architecture
vs JDBC

   Hector is operation-oriented


 Whereas


   JDBC is connection-oriented
API Abstractions



                   Templates


                    Mutator


                     Thrift
ColumnFamilyTemplate

   Familiar, type-safe approach
   - based on template-method design pattern
   - generic: ColumnFamilyTemplate<K,N>
     (K is the key type, N the column name type)


ColumnFamilyTemplate template =
     new ThriftColumnFamilyTemplate(keyspaceName,
                                    columnFamilyName,
                                    StringSerializer.get(),
                                    StringSerializer.get());


*** (no generics for clarity)
ColumnFamilyTemplate

new ThriftColumnFamilyTemplate(
                         keyspaceName,

columnFamilyName,

StringSerializer.get(),
        Key Format
StringSerializer.get());

       Column Name Format
       - Cassandra calls this a “comparator”
       - Remember: defines column order in on-disk format
ColumnFamilyTemplate

ColumnFamilyResult<String, String> res =
cft.queryColumns("patricioe");

String value = res.getString("email");

Date startDate = res.getDate(“DateOfBirth”);



        Key Format

        Column Name Format
ColumnFamilyTemplate
Inserting data with ColumnFamilyUpdater

ColumnFamilyUpdater updater = template.createUpdater(”pato");

updater.setString("companyName",”Relateiq");
updater.addKey(”sabina");
updater.setString("companyName",”Globant");

template.update(updater);
ColumnFamilyTemplate
Deleting Data with ColumnFamilyTemplate

template.deleteColumn("zznate", "notNeededStuff");
template.deleteColumn("zznate", "somethingElse");
template.deleteColumn("patricioe", "aDifferentColumnName");
...
template.deleteRow(“someuser”);

template.executeBatch();
Integrating with existing patterns

Hector Object Mapper -> Apache Gora
https://github.com/hector-client/hector/tree/master/object-mapper


Hector JPA*:
https://github.com/riptano/hector-jpa

Spring IOC

CQL: JDBC Driver and Pool in 1.0!

JdbcTemplate FTW!
Development Resources

 Hector Documentation (http://hector-client.org)
 Cassandra Unit
 https://github.com/jsevellec/cassandra-unit


 Cassandra Maven Plugin
 http://mojo.codehaus.org/cassandra-maven-plugin/


 CCM localhost cassandra cluster
 https://github.com/pcmanus/ccm


 OpsCenter
 http://www.datastax.com/products/opscenter


 Cassandra AMIs
 https://github.com/riptano/CassandraClusterAMI
Want to contribute?




git clone git@github.com:hector-client/hector.git
Summary

-   Take advantage of strengths
-   idempotence and asynchronicity are your friends
-   If it's not in the API, you are probably doing it wrong
-   Seek death is still possible if you model incorrectly
-   Try Denormalizing (append-only model ?)
Patricio Echagüe
patricioe@gmail.com
      @patricioe
Credits
Nate McCall
Aaron Morton (http://thelastpickle.com)
Datastax (http://www.datastax.com)
http://www.slideshare.net/mikiobraun/cassandra-an-introduction
Additional Resources
DataStax Documentation: http://www.datastax.com/docs

Apache Cassandra project wiki: http://wiki.apache.org/cassandra/

“The Dynamo Paper”
http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf

P. Helland. Building on Quicksand
http://arxiv.org/pdf/0909.1788

P. Helland. Life Beyond Distributed Transactions
http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf

S. Anand. “Netflix's Transition to High-Availability Storage Systems”
http://media.amazonwebservices.com/Netflix_Transition_to_a_Key_v3.pdf

“The Megastore Paper”
http://research.google.com/pubs/archive/36971.pdf

More Related Content

What's hot

Classic Information encryption techniques
Classic Information encryption techniquesClassic Information encryption techniques
Classic Information encryption techniquesJay Nagar
 
Cryptography (Revised Edition)
Cryptography (Revised Edition)Cryptography (Revised Edition)
Cryptography (Revised Edition)Somaditya Basak
 
Caesar Cipher , Substitution Cipher, PlayFair and Vigenere Cipher
Caesar Cipher , Substitution Cipher, PlayFair and Vigenere CipherCaesar Cipher , Substitution Cipher, PlayFair and Vigenere Cipher
Caesar Cipher , Substitution Cipher, PlayFair and Vigenere CipherMona Rajput
 
Secure Encyrption Systems Chapter 2
Secure Encyrption Systems Chapter 2Secure Encyrption Systems Chapter 2
Secure Encyrption Systems Chapter 2AfiqEfendy Zaen
 
Computer Security Lecture 3: Classical Encryption Techniques 2
Computer Security Lecture 3: Classical Encryption Techniques 2Computer Security Lecture 3: Classical Encryption Techniques 2
Computer Security Lecture 3: Classical Encryption Techniques 2Mohamed Loey
 
Basic Encryption Decryption Chapter 2
Basic Encryption Decryption Chapter 2Basic Encryption Decryption Chapter 2
Basic Encryption Decryption Chapter 2AfiqEfendy Zaen
 
Cypher technique
Cypher techniqueCypher technique
Cypher techniqueZubair CH
 
Elementary cryptography
Elementary cryptographyElementary cryptography
Elementary cryptographyG Prachi
 
Substitution techniques
Substitution techniquesSubstitution techniques
Substitution techniquesvinitha96
 
History of Cipher System
History of Cipher SystemHistory of Cipher System
History of Cipher SystemAsad Ali
 
Classical encryption techniques
Classical encryption techniquesClassical encryption techniques
Classical encryption techniquesdhivyakesavan3
 
Computer Security (Cryptography) Ch03
Computer Security (Cryptography) Ch03Computer Security (Cryptography) Ch03
Computer Security (Cryptography) Ch03Saif Kassim
 

What's hot (20)

Classic Information encryption techniques
Classic Information encryption techniquesClassic Information encryption techniques
Classic Information encryption techniques
 
Cryptography (Revised Edition)
Cryptography (Revised Edition)Cryptography (Revised Edition)
Cryptography (Revised Edition)
 
Unit i
Unit iUnit i
Unit i
 
Edward Schaefer
Edward SchaeferEdward Schaefer
Edward Schaefer
 
Caesar Cipher , Substitution Cipher, PlayFair and Vigenere Cipher
Caesar Cipher , Substitution Cipher, PlayFair and Vigenere CipherCaesar Cipher , Substitution Cipher, PlayFair and Vigenere Cipher
Caesar Cipher , Substitution Cipher, PlayFair and Vigenere Cipher
 
Secure Encyrption Systems Chapter 2
Secure Encyrption Systems Chapter 2Secure Encyrption Systems Chapter 2
Secure Encyrption Systems Chapter 2
 
Computer Security Lecture 3: Classical Encryption Techniques 2
Computer Security Lecture 3: Classical Encryption Techniques 2Computer Security Lecture 3: Classical Encryption Techniques 2
Computer Security Lecture 3: Classical Encryption Techniques 2
 
Cryptography
Cryptography Cryptography
Cryptography
 
Basic Encryption Decryption Chapter 2
Basic Encryption Decryption Chapter 2Basic Encryption Decryption Chapter 2
Basic Encryption Decryption Chapter 2
 
Cypher technique
Cypher techniqueCypher technique
Cypher technique
 
Elementary cryptography
Elementary cryptographyElementary cryptography
Elementary cryptography
 
Ch02
Ch02Ch02
Ch02
 
Substitution techniques
Substitution techniquesSubstitution techniques
Substitution techniques
 
Caesar cipher
Caesar cipherCaesar cipher
Caesar cipher
 
Product Cipher
Product CipherProduct Cipher
Product Cipher
 
History of Cipher System
History of Cipher SystemHistory of Cipher System
History of Cipher System
 
Ch03
Ch03Ch03
Ch03
 
Ch02...1
Ch02...1Ch02...1
Ch02...1
 
Classical encryption techniques
Classical encryption techniquesClassical encryption techniques
Classical encryption techniques
 
Computer Security (Cryptography) Ch03
Computer Security (Cryptography) Ch03Computer Security (Cryptography) Ch03
Computer Security (Cryptography) Ch03
 

Viewers also liked

Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)zznate
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorialmubarakss
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremGrisha Weintraub
 
Dynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and ComparisonDynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and ComparisonGrisha Weintraub
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflixnkorla1share
 

Viewers also liked (7)

Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorial
 
Cassandra+Hadoop
Cassandra+HadoopCassandra+Hadoop
Cassandra+Hadoop
 
Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
 
Dynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and ComparisonDynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and Comparison
 
Cassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ NetflixCassandra Data Modeling - Practical Considerations @ Netflix
Cassandra Data Modeling - Practical Considerations @ Netflix
 

Similar to Introduction to NoSQL and Cassandra

Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandraaaronmorton
 
Learning Cassandra
Learning CassandraLearning Cassandra
Learning CassandraDave Gardner
 
Cassandra does what ? Code Mania 2012
Cassandra does what ? Code Mania 2012Cassandra does what ? Code Mania 2012
Cassandra does what ? Code Mania 2012aaronmorton
 
C*ollege Credit: An Introduction to Apache Cassandra
C*ollege Credit: An Introduction to Apache CassandraC*ollege Credit: An Introduction to Apache Cassandra
C*ollege Credit: An Introduction to Apache CassandraDataStax
 
Lifting variability from C to mbeddr-C
Lifting variability from C to mbeddr-CLifting variability from C to mbeddr-C
Lifting variability from C to mbeddr-CFederico Tomassetti
 

Similar to Introduction to NoSQL and Cassandra (7)

Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Learning Cassandra
Learning CassandraLearning Cassandra
Learning Cassandra
 
Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
 
Cassandra does what ? Code Mania 2012
Cassandra does what ? Code Mania 2012Cassandra does what ? Code Mania 2012
Cassandra does what ? Code Mania 2012
 
C*ollege Credit: An Introduction to Apache Cassandra
C*ollege Credit: An Introduction to Apache CassandraC*ollege Credit: An Introduction to Apache Cassandra
C*ollege Credit: An Introduction to Apache Cassandra
 
Lifting variability from C to mbeddr-C
Lifting variability from C to mbeddr-CLifting variability from C to mbeddr-C
Lifting variability from C to mbeddr-C
 
No comment
No commentNo comment
No comment
 

Recently uploaded

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Introduction to NoSQL and Cassandra

  • 1. Introduction to NoSQL and Apache Cassandra Patricio Echagüe patricioe@gmail.com @patricioe
  • 2. About me Present: Relateiq (Data Processing and Scalability) Hector committer Past: DataStax (The Cassandra Company) Cassandra/Hadoop distribution (former Brisk) Cassandra FS CQL connection pool Cassandra contributions
  • 6. What is “NoSQL” ? systems able to store and retrieve great quantities of data with none or little information about the relationships between them. Generally they don't have a SQL like language for data manipulation and their schema is more relaxed than traditional RDBM systems. Full ACID is not often guaranteed.
  • 7. Brewer's CAP theorem Consistency: all replicas agree on the same value Availability: always get an answer from a replica Partition Tolerance: the system works even if replicas can't talk You can have 2 of these
  • 9. CAP Classification Consistency Availability Partitioning
  • 10. Types - Relationals - Key-Value stores - Columnar (column-oriented) - Graph databases - Document
  • 11. What's eventual consistency? It is a promise that eventually, in the absence of new writes, all replicas that are responsible for a data item will agree on the same version
  • 12. How eventual is eventual? Write to 1 replica and Read from 1 replica of a total of 3
  • 13. How eventual is eventual? Write to 2 replicas and Read from 2 replicas of a total of 3
  • 14. Why is it good? because, by contacting fewer replicas, read and write operations complete more quickly, lowering latency.
  • 15. Cassandra is a distributed , fault tolerant, scalable, column oriented and tunable consistency data store.
  • 16. Cassandra has CAP But C is tunable
  • 17. What is Apache Cassandra?
  • 18. Key Concepts Multi-Master, Multi-DC Linearly scalable Integrated Caching Performs well with Larger-than-memory Datasets Tunable consistency Idempotent (client clock) Schema Optional No ACID transactions, No Locking
  • 19. Generally complements another system(s) (Not intended to be one-size-fits-all) You should always use the right tool for the right job
  • 21. Data Model “4-Dimensional Hash Table” A Keyspace contains a collection of Column Families (Controls replication) A Column Family contains Rows A Row have a key, and each row has columns (No need to define the columns before hand) Each column has a name and a value and a timestamp (TTL is optional)
  • 22. Data Model – (RDBMS) Keyspace (Schema) Column Family(CF) (table) Row (row) Column (column*) → may not be present in all rows
  • 23. Data Model – Column Family Static Column Family - Model my object data Dynamic Column Family - Precalculated / Prematerialized query results Nothing stopping you from mixing them!
  • 24. Data Model – Static Column Family
  • 25. Data Model – Dynamic CF stats for a specific date
  • 26. Data Model – Dynamic CF Timeline of tweets by a user Timeline of tweets by all of the people a user is following List of comments sorted by score List of friends grouped by state Metrics for a time bucket
  • 29. … But if that node is down? Foo
  • 30. ... Let's store “foo” in 3 nodes. This is the Replication Factor(N) Foo Foo Foo
  • 31. ... Now we need to know what nodes the key was written to so we can read it later
  • 32. ... The Initial Token specifies the upper value of the key range each node is responsible for #1 #5 <= 'd' <= 'z' 'e f g h I j k ' #2 <= 'k' #4 <= 'u' #3 <= 'p' a b c d e f g h I j k l m n …. z
  • 33. ... Gossip is the protocol Cassandra uses to interchange information with nodes in the cluster (a.k.a. Ring)
  • 34. … Gossip is the protocol Cassandra uses to interchange information with nodes in the cluster (a.k.a. Ring) For example, what nodes owns the key “foo”
  • 35. ... Gossip is the protocol Cassandra uses to interchange information with nodes in the cluster (a.k.a. Ring) For example, what nodes owns the key “foo” #1 Read 'foo' #5 <= 'd' Client 'e f g h I j k ' <= 'z' #2 'foo' <= 'k' #4 <= 'u' #3 <= 'p'
  • 36. ... A Partitioner is used to transform the key. “foo1” and “foo2” may end up in different nodes
  • 37. ... A Partitioner is used to transform the key. “foo1” and “foo2” may end up in different nodes The most commonly used is Random Partitioner “foo1” md5(“foo1”) “A99A0B....”
  • 38. ... A Partitioner is used to transform the key. “foo1” and “foo2” may end up in different nodes The most commonly used is Random Partitioner #1 'foo1' #5 #2 'foo2' #4 #3
  • 39. ... A Replica Placement Strategy determines which nodes contain replicas
  • 40. ... A Replica Placement Strategy determines which nodes contain replicas Simple Strategy place them clockwise 'foo1' #1 #5 'foo1' #2 #4 #3 'foo1'
  • 41. ... A Replica Placement Strategy determines which nodes contain replicas Network Topology Strategy place them in different DCs DC1:3 DC2:1 'foo1' #1 #1 'foo1' #5 #5 'foo1' #2 #2 #4 #4 #3 #3 'foo1'
  • 42. ... Consistency Level determines how many replicas to contact to
  • 43. ... Consistency Level determines how many replicas to contact to CL = 1 #1 'foo1' Client # 5 'foo1' #2 # 4 #3 'foo1'
  • 44. ... Consistency Level determines how many replicas to contact to CL = QUORUM #1 'foo1' Client # 5 'foo1' #2 # 4 #3 'foo1'
  • 47. Consistency In Math Term Cassandra guarantees strong consistency if (nodes_written + nodes_read) > replication_factor R+W>N
  • 48. Back to the example.. Consistency Level determines how many replicas to contact to CL = QUORUM #1 'foo1' Client # 5 'foo1' #2 # 4 #3 'foo1'
  • 49. ... But what if node #3 is down?
  • 50. ... But what if node #3 is down? hint #1 'foo1' Client # 5 'foo1' #2 # 4 #3
  • 51. ... But what if node #3 is down? The coordinator nodes will store a hint and will replay that mutation when the down node comes back up. This is known as Hinted Handoff
  • 52. ... Node #5 will replay the hint to node #3 when it comes back online hint 'foo1' #1 Client #5 'foo1' #2 #4 #3 'foo1'
  • 53. ... And if node #5 dies before sending the hints to node #3? hint #1 'foo1' Client #5 'foo1' #2 #4 #3
  • 54. ... If using Quorum, node #4 will request for 'foo' to all the replicas hint #1 'foo1' Client #5 'foo1' #2 #4 #3 ''
  • 55. ... If the result received do not match, a Read Repair process is performed in the background hint #1 'foo1' Client #5 'foo1' #2 #4 #3 ''
  • 56. ... And the missing or not up-to-date value is pushed to the out of date node. #3 in this case hint #1 'foo1' Client #5 'foo1' #2 #4 'foo' != '' #3 'foo'
  • 57. ... The last feature to achieve consistency is the Anti Entropy Service (AES) Should run periodically as part of the cluster maintenance or when a node was down
  • 58. Recap Consistency Features Read Repair Anti Entropy Service (AES) Hinted Handoff
  • 59. scaling “e” “z” “j” “t” “o”
  • 60. scaling “e” “?” “z” “j” “t” “o”
  • 61. scaling “e” “z” “g” “j” “t” “o” Nodetool move ?
  • 62. Want 2x performance ?! Add 2x nodes 'No downtime' included!
  • 63. Want 2x performance ?! “e” “z” “j” “t” “o”
  • 64. Want 2x performance ?! “b” “e” “z” “g” “v” “j” “t” “l” “q” “o”
  • 65. With RF= 3 we could lose “b” “e” “z” X “g” X “v” “j” “t” X “l” “q” “o”
  • 66. With RF= 3 we could lose ? “b” “e” X “z” X “g” X“v” “j” “t” X “l” “q” “o”
  • 67. Vs others b e z g v j t l q o
  • 69.
  • 74. Storage - SSTable - SSTables are sorted - Immutable (“Merge on read”) - Newest timestamp wins
  • 76. Storage – Compaction Merges SSTables together into a larger SSTables Removes Tombstones Rebuild primary and secondary indexes
  • 77. Storage – Compaction Two types: - Size-tiered compaction - Leveled compaction
  • 78. Storage – Compaction Size-tiered compaction Performance no guaranteed Row may be across many SSTables Waste of space Good for write heavy ops Rows are written once 100% more space than SSTables
  • 79. Storage – Compaction Leveled compaction Grouped into levels No overlapping within a level Each level is ten times as large 90% of reads satisfied with 1 SSTable Twice as much I/O
  • 81. SSDs and caching Before - 48 Cassandra on m2.4xlarge. 36 EVcache on m2.xlarge After - 12 Cassandra on hi1.4xlarge
  • 83. Five general categories Retrieving Write/Update/Remove (all the same op!) Increment counters Meta Information Schema Manipulation CQL Execution
  • 84. Insertion/Deletion => Mutation Again: Every mutation is an insert! - Merge on read - Sstables are immutable - Highest timestamp wins
  • 85. CQL INSERT INTO Hollywood.NerdMovies (user_uuid, fan) VALUES ('cfd66ccc-d857-4e90-b1e5-df98a3d40cd6', 'johndoe') USING CONSISTENCY LOCAL_QUORUM AND TTL 86400;
  • 87. Using a Client - Hector http://hector-client.org - Astyanax https://github.com/Netflix/astyanax - Pelops https://github.com/s7/scale7-pelops
  • 88. Using a Client → Hector - Most popular Java client - In use at very large installations - A number of tools and utilities built on top - Very active community - MIT Licensed
  • 89. Features - High Level API - Failover behavior - High performant connection pool - JMX counters for management - Discoverability of new nodes - Automatic retry of downed hosts - Suspension of nodes after several timeouts - Load Balancing: Configurable and extensible - Locking (Beta)
  • 91. vs JDBC Hector is operation-oriented Whereas JDBC is connection-oriented
  • 92. API Abstractions Templates Mutator Thrift
  • 93. ColumnFamilyTemplate Familiar, type-safe approach - based on template-method design pattern - generic: ColumnFamilyTemplate<K,N> (K is the key type, N the column name type) ColumnFamilyTemplate template = new ThriftColumnFamilyTemplate(keyspaceName, columnFamilyName, StringSerializer.get(), StringSerializer.get()); *** (no generics for clarity)
  • 94. ColumnFamilyTemplate new ThriftColumnFamilyTemplate( keyspaceName, columnFamilyName, StringSerializer.get(), Key Format StringSerializer.get()); Column Name Format - Cassandra calls this a “comparator” - Remember: defines column order in on-disk format
  • 95. ColumnFamilyTemplate ColumnFamilyResult<String, String> res = cft.queryColumns("patricioe"); String value = res.getString("email"); Date startDate = res.getDate(“DateOfBirth”); Key Format Column Name Format
  • 96. ColumnFamilyTemplate Inserting data with ColumnFamilyUpdater ColumnFamilyUpdater updater = template.createUpdater(”pato"); updater.setString("companyName",”Relateiq"); updater.addKey(”sabina"); updater.setString("companyName",”Globant"); template.update(updater);
  • 97. ColumnFamilyTemplate Deleting Data with ColumnFamilyTemplate template.deleteColumn("zznate", "notNeededStuff"); template.deleteColumn("zznate", "somethingElse"); template.deleteColumn("patricioe", "aDifferentColumnName"); ... template.deleteRow(“someuser”); template.executeBatch();
  • 98. Integrating with existing patterns Hector Object Mapper -> Apache Gora https://github.com/hector-client/hector/tree/master/object-mapper Hector JPA*: https://github.com/riptano/hector-jpa Spring IOC CQL: JDBC Driver and Pool in 1.0! JdbcTemplate FTW!
  • 99. Development Resources Hector Documentation (http://hector-client.org) Cassandra Unit https://github.com/jsevellec/cassandra-unit Cassandra Maven Plugin http://mojo.codehaus.org/cassandra-maven-plugin/ CCM localhost cassandra cluster https://github.com/pcmanus/ccm OpsCenter http://www.datastax.com/products/opscenter Cassandra AMIs https://github.com/riptano/CassandraClusterAMI
  • 100. Want to contribute? git clone git@github.com:hector-client/hector.git
  • 101. Summary - Take advantage of strengths - idempotence and asynchronicity are your friends - If it's not in the API, you are probably doing it wrong - Seek death is still possible if you model incorrectly - Try Denormalizing (append-only model ?)
  • 103. Credits Nate McCall Aaron Morton (http://thelastpickle.com) Datastax (http://www.datastax.com) http://www.slideshare.net/mikiobraun/cassandra-an-introduction
  • 104. Additional Resources DataStax Documentation: http://www.datastax.com/docs Apache Cassandra project wiki: http://wiki.apache.org/cassandra/ “The Dynamo Paper” http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf P. Helland. Building on Quicksand http://arxiv.org/pdf/0909.1788 P. Helland. Life Beyond Distributed Transactions http://www.ics.uci.edu/~cs223/papers/cidr07p15.pdf S. Anand. “Netflix's Transition to High-Availability Storage Systems” http://media.amazonwebservices.com/Netflix_Transition_to_a_Key_v3.pdf “The Megastore Paper” http://research.google.com/pubs/archive/36971.pdf