SlideShare a Scribd company logo
1 of 58
Learning Cassandra
Dave Gardner
@davegardnerisme
What I’m going to cover


   • How to NoSQL
   • Cassandra basics (dynamo and
     big table)
   • How to use the data model in
     real life
How to NoSQL

 1.    Find data store that doesn’t use SQL
 2.    Anything
 3.    Cram all the things into it
 4.    Triumphantly blog this success
 5.    Complain a month later when it
       bursts into flames
 http://www.slideshare.net/rbranson/how-do-i-cassandra/4
Choosing NoSQL


  “NoSQL DBs trade off traditional
  features to better support new and
  emerging use cases”

  http://www.slideshare.net/argv0/riak-use-cases-dissecting-the-
  solutions-to-hard-problems
Choosing Cassandra: Tradeoffs


   More widely used, tested and
   documented software
   MySQL first OS release 1998


   For a relatively immature product
   Cassandra first open-sourced in 2008
Choosing Cassandra: Tradeoffs


   Ad-hoc querying
   SQL join, group by, having, order



   For a rich data model with limited
   ad-hoc querying ability
   Cassandra makes you denormalise
Choosing NoSQL

“they say … I can’t decide between this project and
this project even though they look nothing like each
other. And the fact that you can’t decide indicates that
you don’t actually have a problem that requires
them.”

Benjamin Black – NoSQL Tapes (at 30:15)
http://nosqltapes.com/video/benjamin-black-on-nosql-cloud-computing-
and-fast_ip
What do we get in return?


   Proven horizontal scalability

   Cassandra scales reads and writes
   linearly as new nodes are added
Netflix benchmark: linear scaling




  http://techblog.netflix.com/2011/11/benchmarking-
  cassandra-scalability-on.html
What do we get in return?


   High availability

   Cassandra is fault-resistant with
   tunable consistency levels
What do we get in return?


   Low latency, solid
   performance

   Cassandra has very good write
   performance
Performance benchmark *


                         http://blog.cubrid.org/dev-
                     platform/nosql-benchmarking/




                                    * Add pinch of salt
What do we get in return?


   Operational simplicity

   Homogenous cluster, no “master”
   node, no SPOF
What do we get in return?


   Rich data model

   Cassandra is more than simple key-
   value – columns, composites,
   counters, secondary indexes
How to NoSQL version 2

 Learn about each solution

 • What tradeoffs are you making?
 • How is it designed?
 • What algorithms does it use?
 http://www.alberton.info/nosql_databases_what_when_why_phpuk201
 1.html
Amazon Dynamo                      +       Google Big Table

Consistent hashing                                 Columnar
Vector clocks *                               SSTable storage
Gossip protocol                                 Append-only
Hinted handoff                                     Memtable
Read repair                                      Compaction

http://www.allthingsdistributed.com/fi http://labs.google.com/papers/big
les/amazon-dynamo-sosp2007.pdf                           table-osdi06.pdf
* not in Cassandra
The dynamo paper
                   #       tokens are
                   1       integers from
                           0 to 2127
         #             #
         6             2




         #             #
         5             3

Client
                   #
                   4
The dynamo paper
                          #
                          1


                   #                #
                   6                2




                       consistent
                       hashing
     Coordinator
                   #                #
                   5                3

Client
                          #
                          4
Consistency levels

 How many replicas must respond to
 declare success?
Consistency levels: read operations

  Level                Description
  ONE                  1st Response
  QUORUM               N/2 + 1 replicas
  LOCAL_QUORUM N/2 + 1 replicas in local data centre
  EACH_QUORUM          N/2 + 1 replicas in each data centre
  ALL                  All replicas


 http://wiki.apache.org/cassandra/API#Read
Consistency levels: write operations

  Level                Description
  ANY                  One node, including hinted handoff
  ONE                  One node
  QUORUM               N/2 + 1 replicas
  LOCAL_QUORUM N/2 + 1 replicas in local data centre
  EACH_QUORUM          N/2 + 1 replicas in each data centre
  ALL                  All replicas

 http://wiki.apache.org/cassandra/API#Write
The dynamo paper
                       #
                       1       RF = 3
                               CL = One
                   #       #
                   6       2




     Coordinator
                   #       #
                   5       3

Client
                       #
                       4
The dynamo paper
                       #
                       1       RF = 3
                               CL = Quorum
                   #       #
                   6       2




     Coordinator
                   #       #
                   5       3

Client
                       #
                       4
The dynamo paper
                       #
                       1                RF = 3
                                        CL = One
                   #       + hint   #
                   6                2




     Coordinator
                   #                #
                   5                3

Client
                       #
                       4
The dynamo paper
                       #
                       1                RF = 3
                                        CL = One
                   #        Read    #
                   6                2
                           repair



     Coordinator
                   #                #
                   5                3

Client
                       #
                       4
The big table paper

 •   Sparse "columnar" data model
 •   SSTable disk storage
 •   Append-only commit log
 •   Memtable (buffer and sort)
 •   Immutable SSTable files
 •   Compaction
 http://labs.google.com/papers/bigtable-osdi06.pdf
 http://www.slideshare.net/geminimobile/bigtable-4820829
The big table paper


                      + timestamp


             Name


             Value

             Column
The big table paper

we can have millions
        of columns *

                       Name     Name              Name


                       Value    Value             Value

                       Column   Column           Column



                                        * theoretically up to 2 billion
The big table paper

                       Row



             Name     Name     Name
   Row Key
             Value    Value    Value

             Column   Column   Column
The big table paper

                      Column Family


   Row Key   Column      Column         Column



   Row Key   Column      Column        Column



   Row Key   Column      Column        Column


                            we can have billions of rows
The big table paper

Write             Memtable


                          Flushed on
                       time/size trigger    Memory
                                               Disk
    Commit Log     SSTable        SSTable



                   SSTable        SSTable


                         Immutable
Data model basics: conflict resolution

 Per-column timestamp-based conflict
 resolution
 {                              {
     column: foo,                   column: foo,
     value: bar,                    value: zing,
     timestamp: 1000                timestamp: 1001
 }                              }

 http://cassandra.apache.org/
Data model basics: conflict resolution

 Per-column timestamp-based conflict
 resolution
 {                              {
     column: foo,                   column: foo,
     value: bar,                    value: zing,
     timestamp: 1000                timestamp: 1001
 }                              }
                                     bigger timestamp

 http://cassandra.apache.org/
Data model basics: column ordering

 Columns ordered at time of writing,
 according to Column Family schema
 {                              {
     column: zebra,                 column: badger,
     value: foo,                    value: foo,
     timestamp: 1000                timestamp: 1001
 }                              }

 http://cassandra.apache.org/
Data model basics: column ordering

 Columns ordered at time of writing,
 according to Column Family schema
 {
     badger: foo,               with AsciiType column
     zebra: foo                 schema
 }


 http://cassandra.apache.org/
Key point

 Each “query” can be answered from a
 single slice of disk

 (once compaction has finished)
Data modeling – 1000ft introduction

 • Start from your queries and work
   backwards
 • Denormalise in the application
   (store data more than once)


 http://www.slideshare.net/mattdennis/cassandra-data-modeling
 http://blip.tv/datastax/data-modeling-workshop-5496906
Pattern 1: not using the value

 Storing that user X is in bucket Y

 Row key:                  f97be9cc-5255-457…
 Column name:              foo
 Value:                    1
                                  we don’t really care about this


 https://github.com/davegardnerisme/we-have-your-
 kidneys/blob/master/www/add.php#L53-58
Pattern 1: not using the value

 Q: is user X in bucket foo?
 f97be9cc-5255-4578-8813-76701c0945bd
    bar: 1
                                        A: single column
    foo: 1
                                        fetch
 06a6f1b0-fcf2-41d9-8949-fe2d416bde8e
    baz: 1
    zoo: 1
 503778bc-246f-4041-ac5a-fd944176b26d
    aaa: 1
Pattern 1: not using the value

 Q: which buckets is user X in?
 f97be9cc-5255-4578-8813-76701c0945bd
    bar: 1                              A: column slice
    foo: 1                              fetch
 06a6f1b0-fcf2-41d9-8949-fe2d416bde8e
    baz: 1
    zoo: 1
 503778bc-246f-4041-ac5a-fd944176b26d
    aaa: 1
Pattern 1: not using the value

 We could also use expiring columns to
 automatically delete columns N seconds
 after insertion

 UPDATE users
 USING TTL = 3600
 SET 'foo' = 1
 WHERE KEY =
     'f97be9cc-5255-4578-8813-76701c0945bd'
Pattern 2: counters

 Real-time analytics to count
 clicks/impressions of ads in hourly
 buckets

 Row key:                  1
 Column name:              2011103015-click
 Value:                    34


 https://github.com/davegardnerisme/we-have-your-
 kidneys/blob/master/www/adClick.php
Pattern 2: counters

 Increment by 1 using CQL

 UPDATE ads
 SET '2011103015-impression'
     = '2011103015-impression' + 1
 WHERE KEY = '1’
Pattern 2: counters

 Q: how many clicks/impressions for ad 1
 over time range?
 1
     2011103015-click: 1
     2011103015-impression: 3434
                                   A: column slice
     2011103016-click: 12
                                   fetch, between
     2011103016-impression: 5411
                                   column X and Y
     2011103017-click: 2
     2011103017-impression: 345
Pattern 3: time series

 Store canonical reference of impressions
 and clicks

 Row key:                    20111030
 Column name:                <time UUID>
 Value:                      {json}                  Cassandra can
                                                     order columns by
                                                     time


 http://rubyscale.com/2011/basic-time-series-with-cassandra/
Pattern 4: object properties as columns

 Store user properties such as name,
 email, etc.

 Row key:                 f97be9cc-5255-457…
 Column name:             name
 Value:                   Bob Foo-Bar



 http://www.wehaveyourkidneys.com/adPerformance.php?ad=1
Anti-pattern 1: read-before-write

 Instead store as independent columns
 and mutate individually

 (see pattern 4)
Anti-pattern 2: super columns

 Friends don’t let friends use super
 columns.




 http://rubyscale.com/2010/beware-the-supercolumn-its-a-trap-for-
 the-unwary/
Anti-pattern 3: OPP

 The Order Preserving Partitioner
 unbalances your load and makes your
 life harder



 http://ria101.wordpress.com/2010/02/22/cassandra-
 randompartitioner-vs-orderpreservingpartitioner/
Recap: Data modeling

 • Think about the queries, work
   backwards
 • Don’t overuse single rows; try to
   spread the load
 • Don’t use super columns
 • Ask on IRC! #cassandra
There’s more: Brisk

 Integrated Hadoop distribution (without
 HDFS installed). Run Hive and Pig queries
 directly against Cassandra

 DataStax offer this functionality in their
 “Enterprise” product

 http://www.datastax.com/products/enterprise
Hive: SQL-like interface to Hadoop

CREATE EXTERNAL TABLE tempUsers
    (userUuid string, segmentId string, value string)
STORED BY
'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES (
    "cassandra.columns.mapping" = ":key,:column,:value",
    "cassandra.cf.name" = "users"
    );


SELECT segmentId, count(1) AS total
FROM tempUsers
GROUP BY segmentId
ORDER BY total DESC;
In conclusion


 Cassandra is founded on
 sound design principles
In conclusion


 The data model is incredibly
 powerful
In conclusion


 CQL and a new breed of
 clients are making it easier
 to use
In conclusion


 Hadoop integration means we
 can analyse data directly from
 a Cassandra cluster
In conclusion


 There is a strong community
 and multiple companies
 offering professional support
Thanks
                                          looking for a job?


Learn more about Cassandra
meetup.com/Cassandra-London
Sample ad-targeting project on Github
https://github.com/davegardnerisme/we-have-your-kidneys

Watch videos from Cassandra SF 2011
http://www.datastax.com/events/cassandrasf2011/presentations

More Related Content

What's hot

Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraDataStax
 
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...DataStax Academy
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorialmubarakss
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache CassandraStu Hood
 
Introduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache CassandraIntroduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache CassandraChetan Baheti
 
Bulk Loading Data into Cassandra
Bulk Loading Data into CassandraBulk Loading Data into Cassandra
Bulk Loading Data into CassandraDataStax
 
Cassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUGCassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUGgdusbabek
 
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...DataStax Academy
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Rick Branson
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loadingalex_araujo
 
SignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series DatabaseSignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series DatabaseDataStax Academy
 
Cassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathCassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathJoshua McKenzie
 
Bay area Cassandra Meetup 2011
Bay area Cassandra Meetup 2011Bay area Cassandra Meetup 2011
Bay area Cassandra Meetup 2011mubarakss
 
Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3DataStax
 
Introduction to apache_cassandra_for_develope
Introduction to apache_cassandra_for_developeIntroduction to apache_cassandra_for_develope
Introduction to apache_cassandra_for_developezznate
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...DataStax
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
Node.js and Cassandra
Node.js and CassandraNode.js and Cassandra
Node.js and CassandraStratio
 

What's hot (20)

Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache Cassandra
 
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
Cassandra Day Atlanta 2015: Introduction to Apache Cassandra & DataStax Enter...
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorial
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Introduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache CassandraIntroduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache Cassandra
 
Bulk Loading Data into Cassandra
Bulk Loading Data into CassandraBulk Loading Data into Cassandra
Bulk Loading Data into Cassandra
 
Cassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUGCassandra Presentation for San Antonio JUG
Cassandra Presentation for San Antonio JUG
 
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
Cassandra Summit 2014: Reading Cassandra SSTables Directly for Offline Data A...
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loading
 
SignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series DatabaseSignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series Database
 
Cassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write pathCassandra 2.1 boot camp, Read/Write path
Cassandra 2.1 boot camp, Read/Write path
 
Bay area Cassandra Meetup 2011
Bay area Cassandra Meetup 2011Bay area Cassandra Meetup 2011
Bay area Cassandra Meetup 2011
 
Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3Cassandra Community Webinar: Back to Basics with CQL3
Cassandra Community Webinar: Back to Basics with CQL3
 
Introduction to apache_cassandra_for_develope
Introduction to apache_cassandra_for_developeIntroduction to apache_cassandra_for_develope
Introduction to apache_cassandra_for_develope
 
Cassandra+Hadoop
Cassandra+HadoopCassandra+Hadoop
Cassandra+Hadoop
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Node.js and Cassandra
Node.js and CassandraNode.js and Cassandra
Node.js and Cassandra
 

Viewers also liked

An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache CassandraDataStax
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Eric Evans
 
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzC* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzDataStax Academy
 
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache CassandraCassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache CassandraDave Gardner
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Modelebenhewitt
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache CassandraRobert Stupp
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra Nikiforos Botis
 
Cassandra Anti-Patterns
Cassandra Anti-PatternsCassandra Anti-Patterns
Cassandra Anti-PatternsMatthew Dennis
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsMatthew Dennis
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsDave Gardner
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in CassandraEd Anuff
 
Migration from MySQL to Cassandra for millions of active users
Migration from MySQL to Cassandra for millions of active usersMigration from MySQL to Cassandra for millions of active users
Migration from MySQL to Cassandra for millions of active usersAndrey Panasyuk
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraVictor Coustenoble
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Intro to Relational Databases
Intro to Relational DatabasesIntro to Relational Databases
Intro to Relational DatabasesPatricia Gorla
 
NDC London 2014: Thinking Like an Erlanger
NDC London 2014: Thinking Like an ErlangerNDC London 2014: Thinking Like an Erlanger
NDC London 2014: Thinking Like an ErlangerTorben Hoffmann
 
Webinar Cassandra Anti-Patterns
Webinar Cassandra Anti-PatternsWebinar Cassandra Anti-Patterns
Webinar Cassandra Anti-PatternsChristopher Batey
 
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...Fears, misconceptions, and accepted anti patterns of a first time cassandra a...
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...Kinetic Data
 

Viewers also liked (20)

An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3
 
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzC* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
 
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache CassandraCassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache Cassandra
 
Cassandra Data Model
Cassandra Data ModelCassandra Data Model
Cassandra Data Model
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
 
Cassandra Anti-Patterns
Cassandra Anti-PatternsCassandra Anti-Patterns
Cassandra Anti-Patterns
 
strangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patternsstrangeloop 2012 apache cassandra anti patterns
strangeloop 2012 apache cassandra anti patterns
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in Cassandra
 
Migration from MySQL to Cassandra for millions of active users
Migration from MySQL to Cassandra for millions of active usersMigration from MySQL to Cassandra for millions of active users
Migration from MySQL to Cassandra for millions of active users
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
 
Intro to Relational Databases
Intro to Relational DatabasesIntro to Relational Databases
Intro to Relational Databases
 
NDC London 2014: Thinking Like an Erlanger
NDC London 2014: Thinking Like an ErlangerNDC London 2014: Thinking Like an Erlanger
NDC London 2014: Thinking Like an Erlanger
 
Webinar Cassandra Anti-Patterns
Webinar Cassandra Anti-PatternsWebinar Cassandra Anti-Patterns
Webinar Cassandra Anti-Patterns
 
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...Fears, misconceptions, and accepted anti patterns of a first time cassandra a...
Fears, misconceptions, and accepted anti patterns of a first time cassandra a...
 

Similar to Learning Cassandra

Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandraaaronmorton
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraRobbie Strickland
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataRoger Xia
 
AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...
AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...
AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...Amazon Web Services
 
Measuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS InstancesMeasuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS InstancesScyllaDB
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandraAaron Ploetz
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesHaohui Mai
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresDataWorks Summit
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsJulien Anguenot
 
NoSQL Data Stores: Introduzione alle Basi di Dati Non Relazionali
NoSQL Data Stores: Introduzione alle Basi di Dati Non RelazionaliNoSQL Data Stores: Introduzione alle Basi di Dati Non Relazionali
NoSQL Data Stores: Introduzione alle Basi di Dati Non RelazionaliSteve Maraspin
 
Nzpug welly-cassandra-02-12-2010
Nzpug welly-cassandra-02-12-2010Nzpug welly-cassandra-02-12-2010
Nzpug welly-cassandra-02-12-2010aaronmorton
 
DevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoMDevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoMGuillaume Arnaud
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra ExplainedEric Evans
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010jbellis
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalabilityjbellis
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Johnny Miller
 
QuadIron An open source library for number theoretic transform-based erasure ...
QuadIron An open source library for number theoretic transform-based erasure ...QuadIron An open source library for number theoretic transform-based erasure ...
QuadIron An open source library for number theoretic transform-based erasure ...Scality
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestDuyhai Doan
 

Similar to Learning Cassandra (20)

Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
 
Spring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_dataSpring one2gx2010 spring-nonrelational_data
Spring one2gx2010 spring-nonrelational_data
 
AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...
AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...
AWS re:Invent 2016: Cross-Region Replication with Amazon DynamoDB Streams (DA...
 
Measuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS InstancesMeasuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS Instances
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Scaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of FilesScaling HDFS to Manage Billions of Files
Scaling HDFS to Manage Billions of Files
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentials
 
NoSQL Data Stores: Introduzione alle Basi di Dati Non Relazionali
NoSQL Data Stores: Introduzione alle Basi di Dati Non RelazionaliNoSQL Data Stores: Introduzione alle Basi di Dati Non Relazionali
NoSQL Data Stores: Introduzione alle Basi di Dati Non Relazionali
 
Nzpug welly-cassandra-02-12-2010
Nzpug welly-cassandra-02-12-2010Nzpug welly-cassandra-02-12-2010
Nzpug welly-cassandra-02-12-2010
 
DevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoMDevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoM
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Cassandra Explained
Cassandra ExplainedCassandra Explained
Cassandra Explained
 
What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010What every developer should know about database scalability, PyCon 2010
What every developer should know about database scalability, PyCon 2010
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
 
QuadIron An open source library for number theoretic transform-based erasure ...
QuadIron An open source library for number theoretic transform-based erasure ...QuadIron An open source library for number theoretic transform-based erasure ...
QuadIron An open source library for number theoretic transform-based erasure ...
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
 
Cassandra
CassandraCassandra
Cassandra
 

More from Dave Gardner

Cabs, Cassandra, and Hailo (at Cassandra EU)
Cabs, Cassandra, and Hailo (at Cassandra EU)Cabs, Cassandra, and Hailo (at Cassandra EU)
Cabs, Cassandra, and Hailo (at Cassandra EU)Dave Gardner
 
Cabs, Cassandra, and Hailo
Cabs, Cassandra, and HailoCabs, Cassandra, and Hailo
Cabs, Cassandra, and HailoDave Gardner
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13Dave Gardner
 
Planning to Fail #phpuk13
Planning to Fail #phpuk13Planning to Fail #phpuk13
Planning to Fail #phpuk13Dave Gardner
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systemsDave Gardner
 
Intro slides from Cassandra London July 2011
Intro slides from Cassandra London July 2011Intro slides from Cassandra London July 2011
Intro slides from Cassandra London July 2011Dave Gardner
 
2011.07.18 cassandrameetup
2011.07.18 cassandrameetup2011.07.18 cassandrameetup
2011.07.18 cassandrameetupDave Gardner
 
Cassandra + Hadoop = Brisk
Cassandra + Hadoop = BriskCassandra + Hadoop = Brisk
Cassandra + Hadoop = BriskDave Gardner
 
Introduction to Cassandra at London Web Meetup
Introduction to Cassandra at London Web MeetupIntroduction to Cassandra at London Web Meetup
Introduction to Cassandra at London Web MeetupDave Gardner
 
Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2Dave Gardner
 

More from Dave Gardner (11)

Cabs, Cassandra, and Hailo (at Cassandra EU)
Cabs, Cassandra, and Hailo (at Cassandra EU)Cabs, Cassandra, and Hailo (at Cassandra EU)
Cabs, Cassandra, and Hailo (at Cassandra EU)
 
Cabs, Cassandra, and Hailo
Cabs, Cassandra, and HailoCabs, Cassandra, and Hailo
Cabs, Cassandra, and Hailo
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13
 
Planning to Fail #phpuk13
Planning to Fail #phpuk13Planning to Fail #phpuk13
Planning to Fail #phpuk13
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systems
 
Intro slides from Cassandra London July 2011
Intro slides from Cassandra London July 2011Intro slides from Cassandra London July 2011
Intro slides from Cassandra London July 2011
 
2011.07.18 cassandrameetup
2011.07.18 cassandrameetup2011.07.18 cassandrameetup
2011.07.18 cassandrameetup
 
Cassandra + Hadoop = Brisk
Cassandra + Hadoop = BriskCassandra + Hadoop = Brisk
Cassandra + Hadoop = Brisk
 
Introduction to Cassandra at London Web Meetup
Introduction to Cassandra at London Web MeetupIntroduction to Cassandra at London Web Meetup
Introduction to Cassandra at London Web Meetup
 
Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2Running Cassandra on Amazon EC2
Running Cassandra on Amazon EC2
 
PHP and Cassandra
PHP and CassandraPHP and Cassandra
PHP and Cassandra
 

Recently uploaded

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Learning Cassandra

  • 2. What I’m going to cover • How to NoSQL • Cassandra basics (dynamo and big table) • How to use the data model in real life
  • 3. How to NoSQL 1. Find data store that doesn’t use SQL 2. Anything 3. Cram all the things into it 4. Triumphantly blog this success 5. Complain a month later when it bursts into flames http://www.slideshare.net/rbranson/how-do-i-cassandra/4
  • 4. Choosing NoSQL “NoSQL DBs trade off traditional features to better support new and emerging use cases” http://www.slideshare.net/argv0/riak-use-cases-dissecting-the- solutions-to-hard-problems
  • 5. Choosing Cassandra: Tradeoffs More widely used, tested and documented software MySQL first OS release 1998 For a relatively immature product Cassandra first open-sourced in 2008
  • 6. Choosing Cassandra: Tradeoffs Ad-hoc querying SQL join, group by, having, order For a rich data model with limited ad-hoc querying ability Cassandra makes you denormalise
  • 7. Choosing NoSQL “they say … I can’t decide between this project and this project even though they look nothing like each other. And the fact that you can’t decide indicates that you don’t actually have a problem that requires them.” Benjamin Black – NoSQL Tapes (at 30:15) http://nosqltapes.com/video/benjamin-black-on-nosql-cloud-computing- and-fast_ip
  • 8. What do we get in return? Proven horizontal scalability Cassandra scales reads and writes linearly as new nodes are added
  • 9. Netflix benchmark: linear scaling http://techblog.netflix.com/2011/11/benchmarking- cassandra-scalability-on.html
  • 10. What do we get in return? High availability Cassandra is fault-resistant with tunable consistency levels
  • 11. What do we get in return? Low latency, solid performance Cassandra has very good write performance
  • 12. Performance benchmark * http://blog.cubrid.org/dev- platform/nosql-benchmarking/ * Add pinch of salt
  • 13. What do we get in return? Operational simplicity Homogenous cluster, no “master” node, no SPOF
  • 14. What do we get in return? Rich data model Cassandra is more than simple key- value – columns, composites, counters, secondary indexes
  • 15. How to NoSQL version 2 Learn about each solution • What tradeoffs are you making? • How is it designed? • What algorithms does it use? http://www.alberton.info/nosql_databases_what_when_why_phpuk201 1.html
  • 16. Amazon Dynamo + Google Big Table Consistent hashing Columnar Vector clocks * SSTable storage Gossip protocol Append-only Hinted handoff Memtable Read repair Compaction http://www.allthingsdistributed.com/fi http://labs.google.com/papers/big les/amazon-dynamo-sosp2007.pdf table-osdi06.pdf * not in Cassandra
  • 17. The dynamo paper # tokens are 1 integers from 0 to 2127 # # 6 2 # # 5 3 Client # 4
  • 18. The dynamo paper # 1 # # 6 2 consistent hashing Coordinator # # 5 3 Client # 4
  • 19. Consistency levels How many replicas must respond to declare success?
  • 20. Consistency levels: read operations Level Description ONE 1st Response QUORUM N/2 + 1 replicas LOCAL_QUORUM N/2 + 1 replicas in local data centre EACH_QUORUM N/2 + 1 replicas in each data centre ALL All replicas http://wiki.apache.org/cassandra/API#Read
  • 21. Consistency levels: write operations Level Description ANY One node, including hinted handoff ONE One node QUORUM N/2 + 1 replicas LOCAL_QUORUM N/2 + 1 replicas in local data centre EACH_QUORUM N/2 + 1 replicas in each data centre ALL All replicas http://wiki.apache.org/cassandra/API#Write
  • 22. The dynamo paper # 1 RF = 3 CL = One # # 6 2 Coordinator # # 5 3 Client # 4
  • 23. The dynamo paper # 1 RF = 3 CL = Quorum # # 6 2 Coordinator # # 5 3 Client # 4
  • 24. The dynamo paper # 1 RF = 3 CL = One # + hint # 6 2 Coordinator # # 5 3 Client # 4
  • 25. The dynamo paper # 1 RF = 3 CL = One # Read # 6 2 repair Coordinator # # 5 3 Client # 4
  • 26. The big table paper • Sparse "columnar" data model • SSTable disk storage • Append-only commit log • Memtable (buffer and sort) • Immutable SSTable files • Compaction http://labs.google.com/papers/bigtable-osdi06.pdf http://www.slideshare.net/geminimobile/bigtable-4820829
  • 27. The big table paper + timestamp Name Value Column
  • 28. The big table paper we can have millions of columns * Name Name Name Value Value Value Column Column Column * theoretically up to 2 billion
  • 29. The big table paper Row Name Name Name Row Key Value Value Value Column Column Column
  • 30. The big table paper Column Family Row Key Column Column Column Row Key Column Column Column Row Key Column Column Column we can have billions of rows
  • 31. The big table paper Write Memtable Flushed on time/size trigger Memory Disk Commit Log SSTable SSTable SSTable SSTable Immutable
  • 32. Data model basics: conflict resolution Per-column timestamp-based conflict resolution { { column: foo, column: foo, value: bar, value: zing, timestamp: 1000 timestamp: 1001 } } http://cassandra.apache.org/
  • 33. Data model basics: conflict resolution Per-column timestamp-based conflict resolution { { column: foo, column: foo, value: bar, value: zing, timestamp: 1000 timestamp: 1001 } } bigger timestamp http://cassandra.apache.org/
  • 34. Data model basics: column ordering Columns ordered at time of writing, according to Column Family schema { { column: zebra, column: badger, value: foo, value: foo, timestamp: 1000 timestamp: 1001 } } http://cassandra.apache.org/
  • 35. Data model basics: column ordering Columns ordered at time of writing, according to Column Family schema { badger: foo, with AsciiType column zebra: foo schema } http://cassandra.apache.org/
  • 36. Key point Each “query” can be answered from a single slice of disk (once compaction has finished)
  • 37. Data modeling – 1000ft introduction • Start from your queries and work backwards • Denormalise in the application (store data more than once) http://www.slideshare.net/mattdennis/cassandra-data-modeling http://blip.tv/datastax/data-modeling-workshop-5496906
  • 38. Pattern 1: not using the value Storing that user X is in bucket Y Row key: f97be9cc-5255-457… Column name: foo Value: 1 we don’t really care about this https://github.com/davegardnerisme/we-have-your- kidneys/blob/master/www/add.php#L53-58
  • 39. Pattern 1: not using the value Q: is user X in bucket foo? f97be9cc-5255-4578-8813-76701c0945bd bar: 1 A: single column foo: 1 fetch 06a6f1b0-fcf2-41d9-8949-fe2d416bde8e baz: 1 zoo: 1 503778bc-246f-4041-ac5a-fd944176b26d aaa: 1
  • 40. Pattern 1: not using the value Q: which buckets is user X in? f97be9cc-5255-4578-8813-76701c0945bd bar: 1 A: column slice foo: 1 fetch 06a6f1b0-fcf2-41d9-8949-fe2d416bde8e baz: 1 zoo: 1 503778bc-246f-4041-ac5a-fd944176b26d aaa: 1
  • 41. Pattern 1: not using the value We could also use expiring columns to automatically delete columns N seconds after insertion UPDATE users USING TTL = 3600 SET 'foo' = 1 WHERE KEY = 'f97be9cc-5255-4578-8813-76701c0945bd'
  • 42. Pattern 2: counters Real-time analytics to count clicks/impressions of ads in hourly buckets Row key: 1 Column name: 2011103015-click Value: 34 https://github.com/davegardnerisme/we-have-your- kidneys/blob/master/www/adClick.php
  • 43. Pattern 2: counters Increment by 1 using CQL UPDATE ads SET '2011103015-impression' = '2011103015-impression' + 1 WHERE KEY = '1’
  • 44. Pattern 2: counters Q: how many clicks/impressions for ad 1 over time range? 1 2011103015-click: 1 2011103015-impression: 3434 A: column slice 2011103016-click: 12 fetch, between 2011103016-impression: 5411 column X and Y 2011103017-click: 2 2011103017-impression: 345
  • 45. Pattern 3: time series Store canonical reference of impressions and clicks Row key: 20111030 Column name: <time UUID> Value: {json} Cassandra can order columns by time http://rubyscale.com/2011/basic-time-series-with-cassandra/
  • 46. Pattern 4: object properties as columns Store user properties such as name, email, etc. Row key: f97be9cc-5255-457… Column name: name Value: Bob Foo-Bar http://www.wehaveyourkidneys.com/adPerformance.php?ad=1
  • 47. Anti-pattern 1: read-before-write Instead store as independent columns and mutate individually (see pattern 4)
  • 48. Anti-pattern 2: super columns Friends don’t let friends use super columns. http://rubyscale.com/2010/beware-the-supercolumn-its-a-trap-for- the-unwary/
  • 49. Anti-pattern 3: OPP The Order Preserving Partitioner unbalances your load and makes your life harder http://ria101.wordpress.com/2010/02/22/cassandra- randompartitioner-vs-orderpreservingpartitioner/
  • 50. Recap: Data modeling • Think about the queries, work backwards • Don’t overuse single rows; try to spread the load • Don’t use super columns • Ask on IRC! #cassandra
  • 51. There’s more: Brisk Integrated Hadoop distribution (without HDFS installed). Run Hive and Pig queries directly against Cassandra DataStax offer this functionality in their “Enterprise” product http://www.datastax.com/products/enterprise
  • 52. Hive: SQL-like interface to Hadoop CREATE EXTERNAL TABLE tempUsers (userUuid string, segmentId string, value string) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( "cassandra.columns.mapping" = ":key,:column,:value", "cassandra.cf.name" = "users" ); SELECT segmentId, count(1) AS total FROM tempUsers GROUP BY segmentId ORDER BY total DESC;
  • 53. In conclusion Cassandra is founded on sound design principles
  • 54. In conclusion The data model is incredibly powerful
  • 55. In conclusion CQL and a new breed of clients are making it easier to use
  • 56. In conclusion Hadoop integration means we can analyse data directly from a Cassandra cluster
  • 57. In conclusion There is a strong community and multiple companies offering professional support
  • 58. Thanks looking for a job? Learn more about Cassandra meetup.com/Cassandra-London Sample ad-targeting project on Github https://github.com/davegardnerisme/we-have-your-kidneys Watch videos from Cassandra SF 2011 http://www.datastax.com/events/cassandrasf2011/presentations

Editor's Notes

  1. This is the way that NoSQL is often approachedA light-hearted take on both how people approach NoSQL and to some extent the tools themselves
  2. A better approach is to consider NoSQL in terms of tradeoffs
  3. Sums it up
  4. 1st
  5. 2nd
  6. 3rd
  7. 4th
  8. 5th and last
  9. A better approach
  10. Last slide