SlideShare a Scribd company logo
1 of 20
Download to read offline
Scaling Cassandra for Big Data


                     Al Tobey
                     Tech Lead, Data and Compute
                     Services
                     Ooyala, Inc.
                     al@ooyala.com
                     @AlTobey
Why Cassandra?

Highly available

Commodity hardware

Horizontal scale

Columnar data model

Open source
What does Ooyala use it for?
Fast access to data generated by Map/Reduce

High availability key/value out of Storm

Cross-device resume (playhead tracking)

ML predictions

Time-series data, raw events & application metrics
The Beginning

Our data is doubling every year

Cluster size: 18 nodes

Biggest CF: 2TB

Repairs becoming a problem

Expired tombstones
First Migration

Upgrade to C* 0.6 to 0.8

Remove expired tombstones

Scrub data and rebuild indexes

Lots of Linux performance tuning

Map/Reduce
Second Migration

Upgrade to Cassandra 1.0

Remove expired tombstones

Update schema

More Linux performance tuning

Map/Reduce - this time using DSE Hadoop
Tuning Highlights
Bloom filter false-positive chance (schema)

Index density (schema)

LeveledCompaction ssTable size (schema)

XFS filesystem bugs (Linux)
  Stick with ext4 if you like to sleep.

NO SWAP!
More Information

cassandra-users mailing list

irc.freenode.net #cassandra / #cassandra-ops

http://www.datastax.com/docs/1.1/index

@AlTobey / al@ooyala.com

Contact me about open positions at Ooyala.
Rejected Slides follow:

An old version of this deck was a lot more
technical. I've added them back for online
posting since people have asked about the
specifics.
Linux: General Observations
●   use a modern kernel, 2.6.32 is ancient
    ○   Running 3.4.11 on new production hardware
    ○   default Ubuntu Lucid / Oneiric kernels in EC2
●   I have yet to use XFS bug-free
    ○   2.6.38 has an especially fun bug
    ○   allocsize=64m allocates 64m always & forever
    ○   echo 1 > /proc/sys/vm/drop_caches
●   Put commit log on a different filesystem
●   btrfs works fine in production
●   Block alignment is hard
    ○   use GPT disk labels and it's generally not an issue
    ○   or just skip disk labels and RAID whole disks
Linux: almost a server OS
/etc/security/limits.conf

*   -   memlock unlimited
*   -   nofile 1048576
*   -   fsize unlimited
*   -   nproc 999999
Linux: I love bufferbloat
/etc/sysctl.conf

kernel.sysrq = 1
kernel.panic = 300
fs.file-max = 1048576
kernel.pid_max = 999999
vm.max_map_count = 1048576
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 65536 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
Ubuntu: FFFFFFFUUUUUUUUUUU
/etc/fstab

/dev/md4 /commit ext4 nobootwait,barrier=1,journal_ioprio=0,rw 0 0
/dev/md7 /srv xfs nobootwait,rw 0 0


 ●   force barriers for journal
 ●   noatime & relatime aren't necessary anymore
      ○  since ~ 2.6.31
 ●   nobootwait is an upstart option
      ○  set this or upstart will troll you at 4am
      ○  mountall hangs on boot for any error without this
      ○  use on both hardware and EC2 unless you love
         using OOB consoles
 ●   As noted, XFS is buggy, so consider ext4.
Linux: Final Adjustments
/etc/rc.local (or whatever you prefer)
●  CFQ disk scheduler
    ○   deadline is still faster, but no cgroup support
    ○   noop is a popular choice in EC2, SSD, and HW
        RAID
●   Tune readahead
    ○   don't go crazy, 64k is a decent choice
    ○   big RA will inflate your bandwidth numbers, but
        really large values will waste IO on unused data
●   If running MD RAID5/6
    ○   echo 16384 >
        /sys/block/$md/md/stripe_cache_size
JVM: ALL THE MEMORY
●   Use Oracle JVM 1.6 for Cassandra
    ○   OpenJDK works, still not recommended
    ○   Use fpm to create packages if you don't have them
●   Default Cassandra GC settings are OK
    ○   -XX:+UseNUMA
         ■ works fine in production
         ■ Apache scripts will use numactl if installed
            ●   DSE does not! (yet)
    ○   Bigger data will need bigger heaps.
         ■ 12G seems to work OK
         ■ 24G works, but approaching limits of JVM
         ■ too little free memory causes excessive
           memtable flushing (more on this later)
Cassandra.(?:ya|f)ml
●   index_interval: 512
     ○ save some memory on indexes
●   compaction_throughput_mb_per_sec: 0
     ○ this can hurt your read latency, but in my experience
       leveled compaction falls behind under very high
       insert loads without this, use a bigger heap to
       compensate?
●   rpc_server_type: hsha
     ○ if you have lots & lots of connections, e.g. from
       Hadoop, saves memory
Cassandra: Schema Tuning
●   Enable compression
    ○   compression_options = {'sstable_compression': 'org.
        apache.cassandra.io.compress.SnappyCompressor'};
●   Examine bloom filter false-positives
    ○   nodetool -h localhost cfstats |grep Bloom
    ○   bloom_filter_fp_chance = 0.1 # diminishing returns
●   Reduce ssTable count
    ○   memory pressure caused frequent memtable flushes
    ○   compaction throttling made it worse
    ○   compaction_strategy_options = {'sstable_size_in_mb':
        256}
●   Give yourself time to repair
    ○   gc_grace = 5184000 # 60 days
    ○   shoot for (node_count * 86400 * 3) to be safe
Future
●   Upgrade all clusters to DSE 2.2
●   Chef cookbook (likely open)
●   Mixing CQL3 and Thrift API access
    ○   all lower case CF names
    ○   WITH COMPACT STORAGE
●   Cassandra 1.2
    ○   native protocol
    ○   JBOD support
    ○   vnodes
    ○   compound row key support in CQL3
MOAR
●   Freenode IRC is a great resource
    ○   #cassandra, #cassandra-ops
●   cassandra-users mailing list
●   DataStax Enterprise
    ○   The Hadoop integration works and is useful
    ○   Still playing with Solr
    ○   OpsCenter is really nice
●   Me:
    ○   @AlTobey on Twitter
    ○   tobert on irc.freenode.net
    ○   https://gist.github.com/tobert
●
More Information (again)

cassandra-users mailing list

irc.freenode.net #cassandra / #cassandra-ops

http://www.datastax.com/docs/1.1/index

@AlTobey / al@ooyala.com

Contact me about open positions at Ooyala.

More Related Content

What's hot

Setting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSetting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSudheer Kondla
 
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Danielle Womboldt
 
Integration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpanaIntegration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpanaGluster.org
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSDataStax Academy
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed_Hat_Storage
 
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...DataStax Academy
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Community
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudPatrick McGarry
 
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast EnoughScylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast EnoughScyllaDB
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing GuideJose De La Rosa
 
Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data Ceph Community
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightGluster.org
 
Compaction, Compaction Everywhere
Compaction, Compaction EverywhereCompaction, Compaction Everywhere
Compaction, Compaction EverywhereDataStax Academy
 
Exploiting Your File System to Build Robust & Efficient Workflows
Exploiting Your File System to Build Robust & Efficient WorkflowsExploiting Your File System to Build Robust & Efficient Workflows
Exploiting Your File System to Build Robust & Efficient Workflowsjasonajohnson
 
SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)Lars Marowsky-Brée
 

What's hot (19)

Setting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutesSetting up mongodb sharded cluster in 30 minutes
Setting up mongodb sharded cluster in 30 minutes
 
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
 
Integration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpanaIntegration of Glusterfs in to commvault simpana
Integration of Glusterfs in to commvault simpana
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
 
Cassandra at teads
Cassandra at teadsCassandra at teads
Cassandra at teads
 
ceph-barcelona-v-1.2
ceph-barcelona-v-1.2ceph-barcelona-v-1.2
ceph-barcelona-v-1.2
 
Red Hat Gluster Storage Performance
Red Hat Gluster Storage PerformanceRed Hat Gluster Storage Performance
Red Hat Gluster Storage Performance
 
Stabilizing Ceph
Stabilizing CephStabilizing Ceph
Stabilizing Ceph
 
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast EnoughScylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
Scylla Summit 2018: In-Memory Scylla - When Fast Storage is Not Fast Enough
 
Ceph Performance and Sizing Guide
Ceph Performance and Sizing GuideCeph Performance and Sizing Guide
Ceph Performance and Sizing Guide
 
Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
 
Compaction, Compaction Everywhere
Compaction, Compaction EverywhereCompaction, Compaction Everywhere
Compaction, Compaction Everywhere
 
Exploiting Your File System to Build Robust & Efficient Workflows
Exploiting Your File System to Build Robust & Efficient WorkflowsExploiting Your File System to Build Robust & Efficient Workflows
Exploiting Your File System to Build Robust & Efficient Workflows
 
SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)SUSE Storage: Sizing and Performance (Ceph)
SUSE Storage: Sizing and Performance (Ceph)
 

Viewers also liked

Infrastructure modeling with chef
Infrastructure modeling with chefInfrastructure modeling with chef
Infrastructure modeling with chefCharles Johnson
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)jbellis
 
Inside the Chef Push Jobs Service - ChefConf 2015
Inside the Chef Push Jobs Service - ChefConf 2015 Inside the Chef Push Jobs Service - ChefConf 2015
Inside the Chef Push Jobs Service - ChefConf 2015 Chef
 
Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?Johnny Miller
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Rick Branson
 
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...DataStax
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)DataStax Academy
 
DataStax: Backup and Restore in Cassandra and OpsCenter
DataStax: Backup and Restore in Cassandra and OpsCenterDataStax: Backup and Restore in Cassandra and OpsCenter
DataStax: Backup and Restore in Cassandra and OpsCenterDataStax Academy
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataRostislav Pashuto
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...Animesh Singh
 
Deploying Docker (Provisioning /w Docker + Chef/Puppet) - DevopsDaysPGH
Deploying Docker (Provisioning /w Docker + Chef/Puppet) - DevopsDaysPGHDeploying Docker (Provisioning /w Docker + Chef/Puppet) - DevopsDaysPGH
Deploying Docker (Provisioning /w Docker + Chef/Puppet) - DevopsDaysPGHErica Windisch
 

Viewers also liked (12)

Infrastructure modeling with chef
Infrastructure modeling with chefInfrastructure modeling with chef
Infrastructure modeling with chef
 
What's new in chef 12
What's new in chef 12 What's new in chef 12
What's new in chef 12
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
 
Inside the Chef Push Jobs Service - ChefConf 2015
Inside the Chef Push Jobs Service - ChefConf 2015 Inside the Chef Push Jobs Service - ChefConf 2015
Inside the Chef Push Jobs Service - ChefConf 2015
 
Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
 
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
DataStax: Backup and Restore in Cassandra and OpsCenter
DataStax: Backup and Restore in Cassandra and OpsCenterDataStax: Backup and Restore in Cassandra and OpsCenter
DataStax: Backup and Restore in Cassandra and OpsCenter
 
Aggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of dataAggregated queries with Druid on terrabytes and petabytes of data
Aggregated queries with Druid on terrabytes and petabytes of data
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
 
Deploying Docker (Provisioning /w Docker + Chef/Puppet) - DevopsDaysPGH
Deploying Docker (Provisioning /w Docker + Chef/Puppet) - DevopsDaysPGHDeploying Docker (Provisioning /w Docker + Chef/Puppet) - DevopsDaysPGH
Deploying Docker (Provisioning /w Docker + Chef/Puppet) - DevopsDaysPGH
 

Similar to Scaling Cassandra for Big Data

Building Apache Cassandra clusters for massive scale
Building Apache Cassandra clusters for massive scaleBuilding Apache Cassandra clusters for massive scale
Building Apache Cassandra clusters for massive scaleAlex Thompson
 
Top 10 Perl Performance Tips
Top 10 Perl Performance TipsTop 10 Perl Performance Tips
Top 10 Perl Performance TipsPerrin Harkins
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems confluent
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedHostedbyConfluent
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightScyllaDB
 
Comparison of foss distributed storage
Comparison of foss distributed storageComparison of foss distributed storage
Comparison of foss distributed storageMarian Marinov
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache CassandraSaeid Zebardast
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedJ On The Beach
 
Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)Jérôme Petazzoni
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringShapeBlue
 
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at DropboxOptimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at DropboxScyllaDB
 
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB,  or how we implemented a 10-times faster CassandraSeastar / ScyllaDB,  or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB, or how we implemented a 10-times faster CassandraTzach Livyatan
 
Comparison of-foss-distributed-storage
Comparison of-foss-distributed-storageComparison of-foss-distributed-storage
Comparison of-foss-distributed-storageMarian Marinov
 
The Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL DatabasesThe Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL DatabasesDave Stokes
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia DatabasesJaime Crespo
 

Similar to Scaling Cassandra for Big Data (20)

Building Apache Cassandra clusters for massive scale
Building Apache Cassandra clusters for massive scaleBuilding Apache Cassandra clusters for massive scale
Building Apache Cassandra clusters for massive scale
 
Top 10 Perl Performance Tips
Top 10 Perl Performance TipsTop 10 Perl Performance Tips
Top 10 Perl Performance Tips
 
Linux Huge Pages
Linux Huge PagesLinux Huge Pages
Linux Huge Pages
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
 
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, VectorizedData Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
Data Policies for the Kafka-API with WebAssembly | Alexander Gallego, Vectorized
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
 
Advances in GPU Computing
Advances in GPU ComputingAdvances in GPU Computing
Advances in GPU Computing
 
Comparison of foss distributed storage
Comparison of foss distributed storageComparison of foss distributed storage
Comparison of foss distributed storage
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache Cassandra
 
Introduction to Galera Cluster
Introduction to Galera ClusterIntroduction to Galera Cluster
Introduction to Galera Cluster
 
ScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous SpeedScyllaDB: NoSQL at Ludicrous Speed
ScyllaDB: NoSQL at Ludicrous Speed
 
Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uring
 
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at DropboxOptimizing Servers for High-Throughput and Low-Latency at Dropbox
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
 
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB,  or how we implemented a 10-times faster CassandraSeastar / ScyllaDB,  or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
 
Comparison of-foss-distributed-storage
Comparison of-foss-distributed-storageComparison of-foss-distributed-storage
Comparison of-foss-distributed-storage
 
The Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL DatabasesThe Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL Databases
 
Backing up Wikipedia Databases
Backing up Wikipedia DatabasesBacking up Wikipedia Databases
Backing up Wikipedia Databases
 

More from DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Scaling Cassandra for Big Data

  • 1. Scaling Cassandra for Big Data Al Tobey Tech Lead, Data and Compute Services Ooyala, Inc. al@ooyala.com @AlTobey
  • 2. Why Cassandra? Highly available Commodity hardware Horizontal scale Columnar data model Open source
  • 3. What does Ooyala use it for? Fast access to data generated by Map/Reduce High availability key/value out of Storm Cross-device resume (playhead tracking) ML predictions Time-series data, raw events & application metrics
  • 4. The Beginning Our data is doubling every year Cluster size: 18 nodes Biggest CF: 2TB Repairs becoming a problem Expired tombstones
  • 5. First Migration Upgrade to C* 0.6 to 0.8 Remove expired tombstones Scrub data and rebuild indexes Lots of Linux performance tuning Map/Reduce
  • 6. Second Migration Upgrade to Cassandra 1.0 Remove expired tombstones Update schema More Linux performance tuning Map/Reduce - this time using DSE Hadoop
  • 7. Tuning Highlights Bloom filter false-positive chance (schema) Index density (schema) LeveledCompaction ssTable size (schema) XFS filesystem bugs (Linux) Stick with ext4 if you like to sleep. NO SWAP!
  • 8. More Information cassandra-users mailing list irc.freenode.net #cassandra / #cassandra-ops http://www.datastax.com/docs/1.1/index @AlTobey / al@ooyala.com Contact me about open positions at Ooyala.
  • 9. Rejected Slides follow: An old version of this deck was a lot more technical. I've added them back for online posting since people have asked about the specifics.
  • 10. Linux: General Observations ● use a modern kernel, 2.6.32 is ancient ○ Running 3.4.11 on new production hardware ○ default Ubuntu Lucid / Oneiric kernels in EC2 ● I have yet to use XFS bug-free ○ 2.6.38 has an especially fun bug ○ allocsize=64m allocates 64m always & forever ○ echo 1 > /proc/sys/vm/drop_caches ● Put commit log on a different filesystem ● btrfs works fine in production ● Block alignment is hard ○ use GPT disk labels and it's generally not an issue ○ or just skip disk labels and RAID whole disks
  • 11. Linux: almost a server OS /etc/security/limits.conf * - memlock unlimited * - nofile 1048576 * - fsize unlimited * - nproc 999999
  • 12. Linux: I love bufferbloat /etc/sysctl.conf kernel.sysrq = 1 kernel.panic = 300 fs.file-max = 1048576 kernel.pid_max = 999999 vm.max_map_count = 1048576 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_rmem = 4096 65536 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216
  • 13. Ubuntu: FFFFFFFUUUUUUUUUUU /etc/fstab /dev/md4 /commit ext4 nobootwait,barrier=1,journal_ioprio=0,rw 0 0 /dev/md7 /srv xfs nobootwait,rw 0 0 ● force barriers for journal ● noatime & relatime aren't necessary anymore ○ since ~ 2.6.31 ● nobootwait is an upstart option ○ set this or upstart will troll you at 4am ○ mountall hangs on boot for any error without this ○ use on both hardware and EC2 unless you love using OOB consoles ● As noted, XFS is buggy, so consider ext4.
  • 14. Linux: Final Adjustments /etc/rc.local (or whatever you prefer) ● CFQ disk scheduler ○ deadline is still faster, but no cgroup support ○ noop is a popular choice in EC2, SSD, and HW RAID ● Tune readahead ○ don't go crazy, 64k is a decent choice ○ big RA will inflate your bandwidth numbers, but really large values will waste IO on unused data ● If running MD RAID5/6 ○ echo 16384 > /sys/block/$md/md/stripe_cache_size
  • 15. JVM: ALL THE MEMORY ● Use Oracle JVM 1.6 for Cassandra ○ OpenJDK works, still not recommended ○ Use fpm to create packages if you don't have them ● Default Cassandra GC settings are OK ○ -XX:+UseNUMA ■ works fine in production ■ Apache scripts will use numactl if installed ● DSE does not! (yet) ○ Bigger data will need bigger heaps. ■ 12G seems to work OK ■ 24G works, but approaching limits of JVM ■ too little free memory causes excessive memtable flushing (more on this later)
  • 16. Cassandra.(?:ya|f)ml ● index_interval: 512 ○ save some memory on indexes ● compaction_throughput_mb_per_sec: 0 ○ this can hurt your read latency, but in my experience leveled compaction falls behind under very high insert loads without this, use a bigger heap to compensate? ● rpc_server_type: hsha ○ if you have lots & lots of connections, e.g. from Hadoop, saves memory
  • 17. Cassandra: Schema Tuning ● Enable compression ○ compression_options = {'sstable_compression': 'org. apache.cassandra.io.compress.SnappyCompressor'}; ● Examine bloom filter false-positives ○ nodetool -h localhost cfstats |grep Bloom ○ bloom_filter_fp_chance = 0.1 # diminishing returns ● Reduce ssTable count ○ memory pressure caused frequent memtable flushes ○ compaction throttling made it worse ○ compaction_strategy_options = {'sstable_size_in_mb': 256} ● Give yourself time to repair ○ gc_grace = 5184000 # 60 days ○ shoot for (node_count * 86400 * 3) to be safe
  • 18. Future ● Upgrade all clusters to DSE 2.2 ● Chef cookbook (likely open) ● Mixing CQL3 and Thrift API access ○ all lower case CF names ○ WITH COMPACT STORAGE ● Cassandra 1.2 ○ native protocol ○ JBOD support ○ vnodes ○ compound row key support in CQL3
  • 19. MOAR ● Freenode IRC is a great resource ○ #cassandra, #cassandra-ops ● cassandra-users mailing list ● DataStax Enterprise ○ The Hadoop integration works and is useful ○ Still playing with Solr ○ OpsCenter is really nice ● Me: ○ @AlTobey on Twitter ○ tobert on irc.freenode.net ○ https://gist.github.com/tobert ●
  • 20. More Information (again) cassandra-users mailing list irc.freenode.net #cassandra / #cassandra-ops http://www.datastax.com/docs/1.1/index @AlTobey / al@ooyala.com Contact me about open positions at Ooyala.