SlideShare a Scribd company logo
1 of 30
Download to read offline
Cassandra Architecture
Cassandra
 

  SPOF
 
  Read                                 Write



          Cassandra- A Decentralized Structured Storage System, LADIS 09’

                           Avinash Lakshman,Prashant Malik(Facebook)
: Cassandra Architecture
1.                                Read
    Consistent Hashing              Read Repair
                                     Bloom Filters
2. 
                                  Delete
      Anti-Entropy                  Tombstones
3.                            5. 
      Quorum Protocol            Gossip Protocol
4.                            6. 
    Write                              SEDA
       HintedHandoff
                              7. 
       Compaction
: Client Side
  Cassandra API
  Client Tools
                     : loadbalance, compact, flush, …
   
       http://lunarium.info/arc/index.php/Cassandra
    GUI          Google Code
             import/export JSON

  RPC
    Thrift
    Avro Cassandra0.7
  Cassandra- A Decentralized Structured Storage System, LADIS
  09’
   
                                       (cassandra0.5   )

  Apache Cassandra Glossary
    Cassandra
    http://io.typepad.com/glossary.html
                 :http://mocchira.posterous.com/apache-
       cassandra-glossarys-japanese-translati

  Cassandra
    http://www.publickey1.jp/blog/10/cassandra.html
          Slideshare
   
1.                                   
 
     1.  Random Partitioning

       ×
     2.  Order Preserving Partitioning
     3.  CollatingOrder-PreservingPartitioning

       ×
1.                                              :Random Partioning
  Consistent Hashing
       Token:               (MD5 hash)
       0~2127 hash ring                Token
              Token < (                   )            Token
       Data Token               ring
          Data

  Zero-hop DHT                                                  A          
      
                                   OK                           
                                                                       A
                   Data: ‘key’
md5(‘key’)=>
                                            


                                        Replication:2
Consistent Hashing
  Consistent Hashing



     ×                                                        
     ×

 
     1.                     position               (like in Dynamo)
     2.  ring          load
                          position        (like in Chord)

  Cassandra
          
          
          
                                    loadbalance
               
Ring, Loadbalance
1.                               :Order Partioning
  Order Preserving Partitioning(OPP)
    Hash
    Token UTF8
    Range Slice
   

   
  CollatingOrder-PreservingPartitioning(COPP)
         OPP
                       English(US)

                                             0.5
2.                                
                             Coordinator

  Coordinator                  N-1       Successor            

  3                                  
       Rack Unaware
          coordinator ring         N-1
       Rack Aware
          1      DC           N-2        DC          Rack 
       Datacenter Aware
               DC
          conf/datacentors.properties
2.                           : Anti-Entropy
  Anti-Entropy(                   )
      
      
  CF         Merkle Tree
      
          Leaf             Row   (Hash   )
                                                   Hash
       I/O
                                             Merkle Tree
                     check
      
2.                                              : ZooKeeper
  Apache ZooKeeper                  (Facebook?)
              Cassandra
       Facebook                 (        )
           
                
                          N-1
                                             local disk Zookeeper   cache
            ZooKeeper fault-Tolerance


  Zookeeper                     Cassandra             Transaction

                   Cassandra
      
                      contrib/mutex/README
3.           :Consistency Level(0.6                 )


  Write                     Read
    ANY                       ONE
            1                 QUORUM
    ONE                         
              ×1                                Return
    QUORUM                      
              ×    /2+1
    DCQUORUM                    
                                     ReadRepair
       QUORUM
        DC                     ALL
    ALL                         
3.                 : Quorum Protocol
         System    Eventual Consitency

  W + R > N
              :N
                   :W
                   :R

          Quorum
       W=R=Quorum(=N/2+1)
       W=ONE(=1), R=ALL(=N)
       W=ALL, R=ONE
4.                     :                           
                 Data


                                               Proxy
  Client
   1.        Proxy
   2.                                                        Data
   3.                                          Client
  Proxy
   1.  Key     Date
                          Network Proximity
   2.    Data                   Message
   3.  Consistency Level                            Client
  Data
   1.          Message                    service.StorageService

     2.              Proxy
4.               :                                            
  RowMutationVerbHandler: Write
  ReadVerbHandler: Read
         RangeSlice,Read Repair,Bootstarp,Gossip
                                org.apache.cassandra.service.StorageService
4.                             : Write                     
                                        ,   
     RowMutationVerbHandler
                                   (           I/O   )

     “Always Writable” Disk I/O                  Lock free

         Data Node             commit log
                         
     <RowKey, CF> Map (ConcurrentSkiplistMap)
                                                                   async flush
                                            MemTable
                          sync
                                    • 
         Memory
                                                                   •                        
         Disk
                                                                    RowKey              
                            • 
                     Commit • Serialized RowMutation
                      Log
 •      SSTable                     SSTable   Read Only   
                                     Flush
                            •            SSTable   Flush
        • Indexes
                                                                 • Row Data
Proxy
                                                           • Bloom Filter
4.                           : Hinted Handoff
   
    Write
              (                 Node,        Proxy Node)
                     Gossip
         Hint        SystemTable                 CF
         Consistency Level any(         )
          
         any                   Hint

    Hinted Handoff                                    Read Repair
Write Msg
                                         Commit
                                No
                 Log
             Mem/SSTable
                  With Hint?



                       Yes
                                 Gossip
                                  Write Hint
4.                                  : Compaction
  Compaction:                        SSTable File    File

 
      
       Read             (                               )
                  (         )

  2
       Minor Compaction
                          SSTable
          P[bytes]×4[ ] Q, Q×4 R, R×4 S(P=Memtable          )
       Major Compaction
               CF               SSTable
            tombstone

 
                                          JVM GC

 
4.                             : Read                            
   ReadVerbHandler
   Lock            Mem&SSTable
      SSTable Read Only                   Write Lock

   0.6
      Row Cache: 1               CF
      Key Cache: SSTable
                                               ClosestDataNode
           SSTable
     Proxy
                                                                          (Key Cache)

                             Real Data
            Row                               Disk
                                                   Data
         Merge
                                                                      MemTable
 Mem
• Closest                                         (Row Cache)
•         Digest Query
• Consistency Level                                              DataNode
                         Return           Digest Query
• Digest(MD5)
         Read Repair
                                                                      Row
                                                                     Cache
4.                     :Read Repair
  Read Repair(           )
              Digest

               (on ProxyNode)
       1.                    Read Repair
       2.  (   )
       3.  (   )

  Eventual Consistency
    Closest Node               Version
               …
              Read Repair
4.                        : Bloom Filter
  Bloom Filter
   

          W         D              “           ” or D
       “       ”   false positive
   

  Cassandra        …
    SSTable    Row Key
    Key lookup   disk                  check      IO
Bloom Filter                                                       
         W         D            “            ” or         "        "
  Step0
       k    hash     F1~Fk
       m          ArrayW, ArrayD  (0         )
       ArrayD[Fi(d)mod(m)] 1 foreach(D as d, i=1,…,k)

  Step1
       ArrayW[Fi(W) mod(m) ]   1 foreach(i = 1,…,k)

  Step2
       Arrayw ArrayD           Arrayw                     ArrayD
           D    W       ”           ”
                                        D   W         ”       ”

  O(k)
4.               : Delete      
 
     1. 

     2. 
           tombstone & JVM GC
                         tombstone
       Tombstone                        GC
       (GC Time                  :10 )
               2.
         1.
       GC Time
5.                      /   
  Gossip Protocol
      
      
          (JOIN,DEAD,AVAIL)

 
     1. 

     2. 
     3. 
5.                                /                      
  Cassandra Gossip
  1.             1   Gossip
  2. 
              endpoint1    Gossip
           : unreachableN /(liveN + 1) Gossip
                                                Gossip
  3.  1 Gossip       Seed       or liveN < SeedN          Seed
                            Gossip
      Seed      :
        static          .

  Gossip
    ApplicationState(JOIN,DEAD,AVAIL)
    HeartBeatState
6.                         : SEDA[1/2]
  SEDA(Staged Event-Driven Architecture)
   
   
                         Message Passing
    =>                                     



                                 ×




                                 ×
6.                                                      : SEDA[2/2]
  Cassandra
     Event Queue+Thread Pool
        StageManager Thread Pool Executor
               public final static String READ_STAGE = "ROW-READ-STAGE";
               public final static String MUTATION_STAGE = "ROW-MUTATION-STAGE";
               public final static String STREAM_STAGE = "STREAM-STAGE";
               public final static String GOSSIP_STAGE = "GS";
               public static final String RESPONSE_STAGE = "RESPONSE-STAGE";
               public final static String AE_SERVICE_STAGE = "AE-SERVICE-STAGE";
               private static final String LOADBALANCE_STAGE = "LOAD-BALANCER-STAGE”;

     Event Handler
                           VerbHandler
       
                       TCP
                      UDP

  java.util.concurrent
7.Cassandra                             
         ) YCSB(Yahoo Cloud Serving Benchmark)
       Benchmarking Cloud Serving Systems with YCSB, SOCC’ 10
       http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf
      
            Tier1.Performance:             v.s.
            Tier2.Scalability:      v.s.
      
            Operation        (        ,           ,…)
                         (       ,Zipf     ,…)
      
              Cassandra
              Hbase(Google)
              MySQL Sharding
              PNUTS(Yahoo)

More Related Content

What's hot

LizardFS-WhitePaper-Eng-v4.0 (1)
LizardFS-WhitePaper-Eng-v4.0 (1)LizardFS-WhitePaper-Eng-v4.0 (1)
LizardFS-WhitePaper-Eng-v4.0 (1)Pekka Männistö
 
The Automation Factory
The Automation FactoryThe Automation Factory
The Automation FactoryNathan Milford
 
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...DataStax Academy
 
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...DataStax
 
Setting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSetting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSudheer Kondla
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph clusterMirantis
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...DataStax
 
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...DataStax
 
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...DataStax
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsJulien Anguenot
 
MySQL HA with Pacemaker
MySQL HA with  PacemakerMySQL HA with  Pacemaker
MySQL HA with PacemakerKris Buytaert
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyBenjamin Black
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXzznate
 
Compaction, Compaction Everywhere
Compaction, Compaction EverywhereCompaction, Compaction Everywhere
Compaction, Compaction EverywhereDataStax Academy
 
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1DataStax Academy
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraDataStax
 
Building Apache Cassandra clusters for massive scale
Building Apache Cassandra clusters for massive scaleBuilding Apache Cassandra clusters for massive scale
Building Apache Cassandra clusters for massive scaleAlex Thompson
 
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012Chris Richardson
 

What's hot (20)

LizardFS-WhitePaper-Eng-v4.0 (1)
LizardFS-WhitePaper-Eng-v4.0 (1)LizardFS-WhitePaper-Eng-v4.0 (1)
LizardFS-WhitePaper-Eng-v4.0 (1)
 
The Automation Factory
The Automation FactoryThe Automation Factory
The Automation Factory
 
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
C* Summit 2013: Practice Makes Perfect: Extreme Cassandra Optimization by Alb...
 
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
 
Setting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSetting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutes
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph cluster
 
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
Cassandra Backups and Restorations Using Ansible (Joshua Wickman, Knewton) | ...
 
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentials
 
MySQL HA with Pacemaker
MySQL HA with  PacemakerMySQL HA with  Pacemaker
MySQL HA with Pacemaker
 
HIGH AVAILABLE CLUSTER IN WEB SERVER WITH HEARTBEAT + DRBD + OCFS2
HIGH AVAILABLE CLUSTER IN WEB SERVER WITH  HEARTBEAT + DRBD + OCFS2HIGH AVAILABLE CLUSTER IN WEB SERVER WITH  HEARTBEAT + DRBD + OCFS2
HIGH AVAILABLE CLUSTER IN WEB SERVER WITH HEARTBEAT + DRBD + OCFS2
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
 
Compaction, Compaction Everywhere
Compaction, Compaction EverywhereCompaction, Compaction Everywhere
Compaction, Compaction Everywhere
 
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
Cassandra Summit 2014: Lesser Known Features of Cassandra 2.1
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache Cassandra
 
Building Apache Cassandra clusters for massive scale
Building Apache Cassandra clusters for massive scaleBuilding Apache Cassandra clusters for massive scale
Building Apache Cassandra clusters for massive scale
 
Bluestore
BluestoreBluestore
Bluestore
 
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
 

Similar to Cassandra勉強会

Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for SysadminsNathan Milford
 
第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandraShun Nakamura
 
読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)
読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)
読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)Shun Nakamura
 
Dynamo cassandra
Dynamo cassandraDynamo cassandra
Dynamo cassandraWu Liang
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoopsrisatish ambati
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSAdrian Cockcroft
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoopsrisatish ambati
 
Acunu & OCaml: Experience Report, CUFP
Acunu & OCaml: Experience Report, CUFPAcunu & OCaml: Experience Report, CUFP
Acunu & OCaml: Experience Report, CUFPAcunu
 
Galera cluster for MySQL - Introduction Slides
Galera cluster for MySQL - Introduction SlidesGalera cluster for MySQL - Introduction Slides
Galera cluster for MySQL - Introduction SlidesSeveralnines
 
Flume-Cassandra Log Processor
Flume-Cassandra Log ProcessorFlume-Cassandra Log Processor
Flume-Cassandra Log ProcessorCLOUDIAN KK
 
Node.js Explained
Node.js ExplainedNode.js Explained
Node.js ExplainedJeff Kunkle
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Benoit Perroud
 
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...buildacloud
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupAdam Hutson
 
Cassandra talk @JUG Lausanne, 2012.06.14
Cassandra talk @JUG Lausanne, 2012.06.14Cassandra talk @JUG Lausanne, 2012.06.14
Cassandra talk @JUG Lausanne, 2012.06.14Benoit Perroud
 
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoopJava one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoopsrisatish ambati
 
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache RatisNoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache RatisAnkit Singhal
 

Similar to Cassandra勉強会 (20)

Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 
第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra第17回Cassandra勉強会: MyCassandra
第17回Cassandra勉強会: MyCassandra
 
Cassandra at no_sql
Cassandra at no_sqlCassandra at no_sql
Cassandra at no_sql
 
Taming Cassandra
Taming CassandraTaming Cassandra
Taming Cassandra
 
読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)
読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)
読み出し性能と書き込み性能を両立させるクラウドストレージ (SACSIS2011-A6-1)
 
Dynamo cassandra
Dynamo cassandraDynamo cassandra
Dynamo cassandra
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoop
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWS
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoop
 
Acunu & OCaml: Experience Report, CUFP
Acunu & OCaml: Experience Report, CUFPAcunu & OCaml: Experience Report, CUFP
Acunu & OCaml: Experience Report, CUFP
 
NoSQL @ Qbranch -2010-04-15
NoSQL @ Qbranch -2010-04-15NoSQL @ Qbranch -2010-04-15
NoSQL @ Qbranch -2010-04-15
 
Galera cluster for MySQL - Introduction Slides
Galera cluster for MySQL - Introduction SlidesGalera cluster for MySQL - Introduction Slides
Galera cluster for MySQL - Introduction Slides
 
Flume-Cassandra Log Processor
Flume-Cassandra Log ProcessorFlume-Cassandra Log Processor
Flume-Cassandra Log Processor
 
Node.js Explained
Node.js ExplainedNode.js Explained
Node.js Explained
 
Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26Apache Cassandra @Geneva JUG 2013.02.26
Apache Cassandra @Geneva JUG 2013.02.26
 
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
 
Cassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User GroupCassandra & Python - Springfield MO User Group
Cassandra & Python - Springfield MO User Group
 
Cassandra talk @JUG Lausanne, 2012.06.14
Cassandra talk @JUG Lausanne, 2012.06.14Cassandra talk @JUG Lausanne, 2012.06.14
Cassandra talk @JUG Lausanne, 2012.06.14
 
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoopJava one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
 
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache RatisNoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
 

More from Shun Nakamura

MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...Shun Nakamura
 
シリコンバレーに行ってきた!
シリコンバレーに行ってきた!シリコンバレーに行ってきた!
シリコンバレーに行ってきた!Shun Nakamura
 
MyCassandra (Full English Version)
MyCassandra (Full English Version)MyCassandra (Full English Version)
MyCassandra (Full English Version)Shun Nakamura
 
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)Shun Nakamura
 
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)Shun Nakamura
 

More from Shun Nakamura (8)

HBase at LINE
HBase at LINEHBase at LINE
HBase at LINE
 
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
MyCassandra: A Cloud Storage Supporting both Read Heavy and Write Heavy Workl...
 
シリコンバレーに行ってきた!
シリコンバレーに行ってきた!シリコンバレーに行ってきた!
シリコンバレーに行ってきた!
 
MyCassandra (Full English Version)
MyCassandra (Full English Version)MyCassandra (Full English Version)
MyCassandra (Full English Version)
 
MyCassandra
MyCassandraMyCassandra
MyCassandra
 
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
読み出し性能と書き込み性能を両立させるクラウドストレージ (OS-117-24)
 
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
読み出し性能と書き込み性能を選択可能なクラウドストレージ (DEIM2011-C3-3)
 
ComSys WIP
ComSys WIPComSys WIP
ComSys WIP
 

Recently uploaded

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Recently uploaded (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Cassandra勉強会

  • 2. Cassandra     SPOF     Read Write Cassandra- A Decentralized Structured Storage System, LADIS 09’ Avinash Lakshman,Prashant Malik(Facebook)
  • 3. : Cassandra Architecture 1.    Read   Consistent Hashing   Read Repair   Bloom Filters 2.    Delete   Anti-Entropy   Tombstones 3.  5.    Quorum Protocol   Gossip Protocol 4.  6.    Write   SEDA   HintedHandoff 7.    Compaction
  • 4. : Client Side   Cassandra API   Client Tools   : loadbalance, compact, flush, …     http://lunarium.info/arc/index.php/Cassandra   GUI Google Code   import/export JSON   RPC   Thrift   Avro Cassandra0.7
  • 5.   Cassandra- A Decentralized Structured Storage System, LADIS 09’     (cassandra0.5 )   Apache Cassandra Glossary   Cassandra   http://io.typepad.com/glossary.html   :http://mocchira.posterous.com/apache- cassandra-glossarys-japanese-translati   Cassandra   http://www.publickey1.jp/blog/10/cassandra.html   Slideshare  
  • 6. 1.   1.  Random Partitioning × 2.  Order Preserving Partitioning 3.  CollatingOrder-PreservingPartitioning ×
  • 7. 1. :Random Partioning   Consistent Hashing   Token: (MD5 hash)   0~2127 hash ring Token   Token < ( ) Token   Data Token ring Data   Zero-hop DHT A     OK A Data: ‘key’ md5(‘key’)=> Replication:2
  • 8. Consistent Hashing   Consistent Hashing × ×   1.  position (like in Dynamo) 2.  ring load position (like in Chord)   Cassandra         loadbalance  
  • 10. 1. :Order Partioning   Order Preserving Partitioning(OPP)   Hash   Token UTF8   Range Slice       CollatingOrder-PreservingPartitioning(COPP)   OPP   English(US) 0.5
  • 11. 2.   Coordinator   Coordinator N-1 Successor   3   Rack Unaware   coordinator ring N-1   Rack Aware   1 DC N-2 DC Rack   Datacenter Aware   DC   conf/datacentors.properties
  • 12. 2. : Anti-Entropy   Anti-Entropy( )       CF Merkle Tree     Leaf Row (Hash )   Hash   I/O   Merkle Tree check  
  • 13. 2. : ZooKeeper   Apache ZooKeeper (Facebook?)   Cassandra   Facebook ( )       N-1   local disk Zookeeper cache   ZooKeeper fault-Tolerance   Zookeeper Cassandra Transaction   Cassandra     contrib/mutex/README
  • 14. 3. :Consistency Level(0.6 )   Write   Read   ANY   ONE   1   QUORUM   ONE     ×1   Return   QUORUM     × /2+1   DCQUORUM   ReadRepair   QUORUM DC   ALL   ALL  
  • 15. 3. : Quorum Protocol   System Eventual Consitency   W + R > N   :N   :W   :R   Quorum   W=R=Quorum(=N/2+1)   W=ONE(=1), R=ALL(=N)   W=ALL, R=ONE
  • 16. 4. : Data Proxy   Client 1.  Proxy 2.  Data 3.  Client   Proxy 1.  Key Date   Network Proximity 2.  Data Message 3.  Consistency Level Client   Data 1.  Message service.StorageService 2.  Proxy
  • 17. 4. :   RowMutationVerbHandler: Write   ReadVerbHandler: Read   RangeSlice,Read Repair,Bootstarp,Gossip org.apache.cassandra.service.StorageService
  • 18. 4. : Write ,   RowMutationVerbHandler   ( I/O )   “Always Writable” Disk I/O Lock free Data Node commit log <RowKey, CF> Map (ConcurrentSkiplistMap) async flush MemTable sync •  Memory •  Disk RowKey •  Commit • Serialized RowMutation Log •  SSTable SSTable Read Only Flush •  SSTable Flush • Indexes • Row Data Proxy • Bloom Filter
  • 19. 4. : Hinted Handoff     Write ( Node, Proxy Node)   Gossip   Hint SystemTable CF   Consistency Level any( )     any Hint   Hinted Handoff Read Repair Write Msg Commit No Log Mem/SSTable With Hint? Yes Gossip Write Hint
  • 20. 4. : Compaction   Compaction: SSTable File File       Read ( )   ( )   2   Minor Compaction   SSTable   P[bytes]×4[ ] Q, Q×4 R, R×4 S(P=Memtable )   Major Compaction   CF SSTable   tombstone     JVM GC  
  • 21. 4. : Read   ReadVerbHandler   Lock Mem&SSTable   SSTable Read Only Write Lock   0.6   Row Cache: 1 CF   Key Cache: SSTable ClosestDataNode SSTable Proxy (Key Cache) Real Data Row Disk Data Merge MemTable Mem • Closest (Row Cache) •  Digest Query • Consistency Level DataNode Return Digest Query • Digest(MD5) Read Repair Row Cache
  • 22. 4. :Read Repair   Read Repair( )   Digest   (on ProxyNode) 1.  Read Repair 2.  ( ) 3.  ( )   Eventual Consistency   Closest Node Version …   Read Repair
  • 23. 4. : Bloom Filter   Bloom Filter     W D “ ” or D “ ” false positive     Cassandra …   SSTable Row Key   Key lookup disk check IO
  • 24. Bloom Filter   W D “ ” or " "   Step0   k hash F1~Fk   m ArrayW, ArrayD (0 )   ArrayD[Fi(d)mod(m)] 1 foreach(D as d, i=1,…,k)   Step1   ArrayW[Fi(W) mod(m) ] 1 foreach(i = 1,…,k)   Step2   Arrayw ArrayD Arrayw ArrayD D W ” ”   D W ” ”   O(k)
  • 25. 4. : Delete   1.  2.    tombstone & JVM GC   tombstone   Tombstone GC (GC Time :10 )   2.   1. GC Time
  • 26. 5. /   Gossip Protocol     (JOIN,DEAD,AVAIL)   1.  2.  3. 
  • 27. 5. /   Cassandra Gossip 1.  1 Gossip 2.  endpoint1 Gossip   : unreachableN /(liveN + 1) Gossip   Gossip 3.  1 Gossip Seed or liveN < SeedN Seed Gossip   Seed : static .   Gossip   ApplicationState(JOIN,DEAD,AVAIL)   HeartBeatState
  • 28. 6. : SEDA[1/2]   SEDA(Staged Event-Driven Architecture)       Message Passing   => × ×
  • 29. 6. : SEDA[2/2]   Cassandra   Event Queue+Thread Pool   StageManager Thread Pool Executor   public final static String READ_STAGE = "ROW-READ-STAGE";   public final static String MUTATION_STAGE = "ROW-MUTATION-STAGE";   public final static String STREAM_STAGE = "STREAM-STAGE";   public final static String GOSSIP_STAGE = "GS";   public static final String RESPONSE_STAGE = "RESPONSE-STAGE";   public final static String AE_SERVICE_STAGE = "AE-SERVICE-STAGE";   private static final String LOADBALANCE_STAGE = "LOAD-BALANCER-STAGE”;   Event Handler   VerbHandler     TCP   UDP   java.util.concurrent
  • 30. 7.Cassandra   ) YCSB(Yahoo Cloud Serving Benchmark)   Benchmarking Cloud Serving Systems with YCSB, SOCC’ 10   http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf     Tier1.Performance: v.s.   Tier2.Scalability: v.s.     Operation ( , ,…)   ( ,Zipf ,…)     Cassandra   Hbase(Google)   MySQL Sharding   PNUTS(Yahoo)