SlideShare a Scribd company logo
1 of 32
Myths of big partitions
Robert Stupp
Solution Architect @ DataStax, C*-Committer
@snazy
Issues with big partitions before 3.6
• Slow reads
• Compaction failures
• Repair failures
• java.lang.OutOfMemoryError
 fail fast  node down
(Lot of org.apache.cassandra.io.sstable.IndexInfo on heap)
© DataStax, All Rights Reserved. 2
SSTable Components
© DataStax, All Rights Reserved. 3
Data
Primary
Index
Summary
Bloom
Filter
Determine whether an SSTable contains a partition
 bloomFilterFpChance
Partition samples
 minIndexInterval / maxIndexInterval
All partition keys + index samples
 column_index_size_in_kb
All the data
Read from an SSTable
© DataStax, All Rights Reserved. 4
Data
Primary
Index
Summary
Bloom
Filter
1. Check whether partition is in SSTable
2. Find “nearest” partition key
3. Return offset in primary index
4. Find partition
5. Find clustering key
6. Return offset in data file
7. Find, read and return data
Before CASSANDRA-11206
Evaluation of SSTable Components
© DataStax, All Rights Reserved. 6
Data
Primary
Index
Summary
Bloom
Filter
Off-Heap, small  fine
Off-Heap, small-ish  fine
On-Heap,
many small objects, nested structure
 problematic
For CQL since #8099  fine
Primary Index File Layout
© DataStax, All Rights Reserved.
Partition Index SamplesPartition Key Partition Index SamplesPartition Key
es Partition Index SamplesPartition Key Partition Index SPartition Key
Samples Partition Index SamplesPartition Key PartitionPartition Key
Index Samples Partition Index SamplesPartition Key PPartition Key
artition Index Samples Partition Index SamplesPartition Key Partition Key
Partition Index Samples
”from”
Summary
Sampling the Primary Index
© DataStax, All Rights Reserved.
Partition in Data file
Partition Key
Offset in SSTable Data File
column_index_size_in_kb (default: 64kB)
First
Key
Last
Key
First
Key
Last
Key
First
Key
Last
Key
First
Key
Last
Key
First
Key
Last
Key
First
Key
Last
Key
First
Key
Last
Key
How it looks on-heap
© DataStax, All Rights Reserved. 10
IndexedEntry
IndexInfo
firstKey, lastKey, offset, width, deletionInfo
patitionKey*, offset, deletionInfo
* = technically not in IndexedEntry
IndexInfo
firstKey, lastKey, offset, width, deletionInfo
IndexInfo
firstKey, lastKey, offset, width, deletionInfo
…
Primary Index
Structure
© DataStax, All Rights Reserved. 11
IndexedEntry extends RowIndexEntry
DeletionTime
ArrayList
IndexInfo  per 64kB
DeletionTime
BufferClustering
Kind
ByteBuffer[]
ByteBuffer
byte[]
…
BufferClustering
Kind
ByteBuffer[]
ByteBuffer
byte[]
…
# of Java objects:
IndexedEntry
4
IndexInfo (per 64kB)
8 + 4 * clust-key-components
(primitive fields omitted)
Primary Index - some numbers
© DataStax, All Rights Reserved. 12
Approximation on one 16 byte clustering-value:
Partition Size Index Size (heap) # of objects
1MB 3kB > 200 objects
4MB 11kB > 800 objects
64MB 180kB > 13,000 objects
512MB 1.4MB > 106,000 objects
2048MB 5.6MB > 424,000 objects
Disclaimer: numbers are examples and not representative
Reads
• Reads IndexedEntry w/ all IndexInfo
• 2GB partition means: 32,768 IndexInfo,
424,000 objects
• Binary search just needs: 15 IndexInfo (max),
O(log n) ~200 objects
© DataStax, All Rights Reserved. 14 Disclaimer: numbers are examples and not representative
SELECT foo, bar
FROM big_partition_table
WHERE ...
Writes – Flushes & Compactions
IndexedEntry constructed with all IndexInfo
as Java object structure on heap first,
then serialized to disk
© DataStax, All Rights Reserved. 15
106,000
objects
106,000
objects
106,000
objects
106,000
objects
Compacting a 2GB partition
© DataStax, All Rights Reserved. 16
SSTable SSTable SSTable SSTable
SSTable
Key
Cache
Remove
106,000 objects
Remove
106,000 objects
Remove
106,000 objects
Remove
106,000 objects
Add
424,000 objects
Construct
424,000
objects
Disclaimer: numbers are examples and not representative
Reads of big partitions – on heap
• Primary index data deserialized
• Object structure added to key cache
• Other entries evicted from key cache
• Also applies to compaction & repair
© DataStax, All Rights Reserved. 17
Flushes with big partitions – on heap
• Primary index data constructed
• Object structure added to key cache
(for compactions)
• Also applies to compactions
© DataStax, All Rights Reserved. 18
Trivia
How many 2GB partitions fit in the key cache?
© DataStax, All Rights Reserved. 19
2GB partition  5.6MB
100MB
 100/6 = 16
Disclaimer: numbers are examples and not representative
Issues w/ big partitions – TL;DR
• Amount of Java objects
• Additions and evictions to/from key cache
© DataStax, All Rights Reserved. 20
Necessities – TL;DR
• Reduce amount of Java objects
• Reduce GC pressure
• No change in sstable format
i.e. files need to be binary compatible
© DataStax, All Rights Reserved. 22
Approach
• Omit (most) IndexInfo on heap
• Read IndexInfo only when needed
• Serialize primary index via byte buffer
• Objects “never” promoted to Java old gen
(hope so ;) )
© DataStax, All Rights Reserved. 23
Small heap (3GB) test
© DataStax, All Rights Reserved. 24
Before #11206 – duration: 3h, lots of GC, exhausted heap
With #11206 – duration: 1h10, few GC, moderate heap usage
java.lang.
OutOfMemoryError
org.apache.cassandra.io.sstable.LargePartitionsTest
Results
• Promising!
• But:
Performance regression w/ some workloads
© DataStax, All Rights Reserved. 25
Better Approach
• Keep IndexInfo objects for “nicely” sized
partitions on-heap
• Controlled via c.yaml
© DataStax, All Rights Reserved. 26
Doesn’t this mean more disk I/O?
• “Hot” data already in buffer cache
• No change for “cold” partitions
© DataStax, All Rights Reserved. 27
#11206 Benefits
• Reduced heap usage
• Reduced GC pressure
• Improved read and write paths
• Key cache can hold “more” entries
• Moved the bad partition size “barrier”
© DataStax, All Rights Reserved. 28
#11206 Metrics
org.apache.cassandra.metrics:
type=Index,scope=RowIndexEntry
• name=IndexInfoCount
Histogram - # of IndexInfo per IndexedEntry
• name=IndexInfoGets
Histogram - # of ”gets” against single IndexedEntry
• name=IndexedEntrySize
Histogram - serialized size of IndexedEntry
© DataStax, All Rights Reserved. 29
„After #11206, what‘s the
recommended partition size?“
• It still depends – sorry
• IMO we moved the “barrier”
Test with your
data model
and workload
© DataStax, All Rights Reserved. 30
Bad usage of large partitions
• CQL SELECT without clustering key
• i.e. materialize a large partition in memory
• Using the same partition key over a long time
• i.e. access many sstables
© DataStax, All Rights Reserved. 31
#9754
• Changes on-disk primary index format
• Efficient on-disk representation
• Optimized for OS page size
• WIP !
• Fix-Version: 4.x
© DataStax, All Rights Reserved. 33
Thank You!
Q & A
Come to the “experts stand”

More Related Content

What's hot

AWS 환경에서 MySQL BMT
AWS 환경에서 MySQL BMTAWS 환경에서 MySQL BMT
AWS 환경에서 MySQL BMTI Goo Lee
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Caching solutions with Redis
Caching solutions   with RedisCaching solutions   with Redis
Caching solutions with RedisGeorge Platon
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on KubernetesDatabricks
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...DataStax
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at ScaleMongoDB
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookThe Hive
 
Redis persistence in practice
Redis persistence in practiceRedis persistence in practice
Redis persistence in practiceEugene Fidelin
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesYoshinori Matsunobu
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Cassandra Operations at Netflix
Cassandra Operations at NetflixCassandra Operations at Netflix
Cassandra Operations at Netflixgreggulrich
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger InternalsNorberto Leite
 

What's hot (20)

AWS 환경에서 MySQL BMT
AWS 환경에서 MySQL BMTAWS 환경에서 MySQL BMT
AWS 환경에서 MySQL BMT
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Caching solutions with Redis
Caching solutions   with RedisCaching solutions   with Redis
Caching solutions with Redis
 
Redis database
Redis databaseRedis database
Redis database
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
 
Redis persistence in practice
Redis persistence in practiceRedis persistence in practice
Redis persistence in practice
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Cassandra Operations at Netflix
Cassandra Operations at NetflixCassandra Operations at Netflix
Cassandra Operations at Netflix
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger Internals
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 

Similar to Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016

Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchJoe Alex
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]Speedment, Inc.
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]Malin Weiss
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBJason Terpko
 
Managing data and operation distribution in MongoDB
Managing data and operation distribution in MongoDBManaging data and operation distribution in MongoDB
Managing data and operation distribution in MongoDBAntonios Giannopoulos
 
Fast Big Data Analytics with Spark on Tachyon
Fast Big Data Analytics with Spark on TachyonFast Big Data Analytics with Spark on Tachyon
Fast Big Data Analytics with Spark on TachyonAlluxio, Inc.
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...DataStax
 
Optimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningOptimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningDatabricks
 
Tachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBMTachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBMShaoshan Liu
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentSpeedment, Inc.
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrRahul Jain
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Spark Summit
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudMongoDB
 
JavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient JavaJavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient JavaChris Bailey
 

Similar to Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016 (20)

Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDB
 
Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDB
 
Managing data and operation distribution in MongoDB
Managing data and operation distribution in MongoDBManaging data and operation distribution in MongoDB
Managing data and operation distribution in MongoDB
 
Fast Big Data Analytics with Spark on Tachyon
Fast Big Data Analytics with Spark on TachyonFast Big Data Analytics with Spark on Tachyon
Fast Big Data Analytics with Spark on Tachyon
 
Percona FT / TokuDB
Percona FT / TokuDBPercona FT / TokuDB
Percona FT / TokuDB
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
 
Optimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File PruningOptimising Geospatial Queries with Dynamic File Pruning
Optimising Geospatial Queries with Dynamic File Pruning
 
Tachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBMTachyon_meetup_5-28-2015-IBM
Tachyon_meetup_5-28-2015-IBM
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ Speedment
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal Cloud
 
JavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient JavaJavaOne 2013: Memory Efficient Java
JavaOne 2013: Memory Efficient Java
 

More from DataStax

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?DataStax
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...DataStax
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsDataStax
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphDataStax
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...DataStax
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache KafkaDataStax
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseDataStax
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0DataStax
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...DataStax
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDataStax
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudDataStax
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceDataStax
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...DataStax
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...DataStax
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...DataStax
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)DataStax
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsDataStax
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingDataStax
 

More from DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Recently uploaded

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 

Recently uploaded (20)

Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 

Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016

  • 1. Myths of big partitions Robert Stupp Solution Architect @ DataStax, C*-Committer @snazy
  • 2. Issues with big partitions before 3.6 • Slow reads • Compaction failures • Repair failures • java.lang.OutOfMemoryError  fail fast  node down (Lot of org.apache.cassandra.io.sstable.IndexInfo on heap) © DataStax, All Rights Reserved. 2
  • 3. SSTable Components © DataStax, All Rights Reserved. 3 Data Primary Index Summary Bloom Filter Determine whether an SSTable contains a partition  bloomFilterFpChance Partition samples  minIndexInterval / maxIndexInterval All partition keys + index samples  column_index_size_in_kb All the data
  • 4. Read from an SSTable © DataStax, All Rights Reserved. 4 Data Primary Index Summary Bloom Filter 1. Check whether partition is in SSTable 2. Find “nearest” partition key 3. Return offset in primary index 4. Find partition 5. Find clustering key 6. Return offset in data file 7. Find, read and return data
  • 6. Evaluation of SSTable Components © DataStax, All Rights Reserved. 6 Data Primary Index Summary Bloom Filter Off-Heap, small  fine Off-Heap, small-ish  fine On-Heap, many small objects, nested structure  problematic For CQL since #8099  fine
  • 7. Primary Index File Layout © DataStax, All Rights Reserved. Partition Index SamplesPartition Key Partition Index SamplesPartition Key es Partition Index SamplesPartition Key Partition Index SPartition Key Samples Partition Index SamplesPartition Key PartitionPartition Key Index Samples Partition Index SamplesPartition Key PPartition Key artition Index Samples Partition Index SamplesPartition Key Partition Key Partition Index Samples ”from” Summary
  • 8. Sampling the Primary Index © DataStax, All Rights Reserved. Partition in Data file Partition Key Offset in SSTable Data File column_index_size_in_kb (default: 64kB) First Key Last Key First Key Last Key First Key Last Key First Key Last Key First Key Last Key First Key Last Key First Key Last Key
  • 9. How it looks on-heap © DataStax, All Rights Reserved. 10 IndexedEntry IndexInfo firstKey, lastKey, offset, width, deletionInfo patitionKey*, offset, deletionInfo * = technically not in IndexedEntry IndexInfo firstKey, lastKey, offset, width, deletionInfo IndexInfo firstKey, lastKey, offset, width, deletionInfo …
  • 10. Primary Index Structure © DataStax, All Rights Reserved. 11 IndexedEntry extends RowIndexEntry DeletionTime ArrayList IndexInfo  per 64kB DeletionTime BufferClustering Kind ByteBuffer[] ByteBuffer byte[] … BufferClustering Kind ByteBuffer[] ByteBuffer byte[] … # of Java objects: IndexedEntry 4 IndexInfo (per 64kB) 8 + 4 * clust-key-components (primitive fields omitted)
  • 11. Primary Index - some numbers © DataStax, All Rights Reserved. 12 Approximation on one 16 byte clustering-value: Partition Size Index Size (heap) # of objects 1MB 3kB > 200 objects 4MB 11kB > 800 objects 64MB 180kB > 13,000 objects 512MB 1.4MB > 106,000 objects 2048MB 5.6MB > 424,000 objects Disclaimer: numbers are examples and not representative
  • 12. Reads • Reads IndexedEntry w/ all IndexInfo • 2GB partition means: 32,768 IndexInfo, 424,000 objects • Binary search just needs: 15 IndexInfo (max), O(log n) ~200 objects © DataStax, All Rights Reserved. 14 Disclaimer: numbers are examples and not representative SELECT foo, bar FROM big_partition_table WHERE ...
  • 13. Writes – Flushes & Compactions IndexedEntry constructed with all IndexInfo as Java object structure on heap first, then serialized to disk © DataStax, All Rights Reserved. 15
  • 14. 106,000 objects 106,000 objects 106,000 objects 106,000 objects Compacting a 2GB partition © DataStax, All Rights Reserved. 16 SSTable SSTable SSTable SSTable SSTable Key Cache Remove 106,000 objects Remove 106,000 objects Remove 106,000 objects Remove 106,000 objects Add 424,000 objects Construct 424,000 objects Disclaimer: numbers are examples and not representative
  • 15. Reads of big partitions – on heap • Primary index data deserialized • Object structure added to key cache • Other entries evicted from key cache • Also applies to compaction & repair © DataStax, All Rights Reserved. 17
  • 16. Flushes with big partitions – on heap • Primary index data constructed • Object structure added to key cache (for compactions) • Also applies to compactions © DataStax, All Rights Reserved. 18
  • 17. Trivia How many 2GB partitions fit in the key cache? © DataStax, All Rights Reserved. 19 2GB partition  5.6MB 100MB  100/6 = 16 Disclaimer: numbers are examples and not representative
  • 18. Issues w/ big partitions – TL;DR • Amount of Java objects • Additions and evictions to/from key cache © DataStax, All Rights Reserved. 20
  • 19.
  • 20. Necessities – TL;DR • Reduce amount of Java objects • Reduce GC pressure • No change in sstable format i.e. files need to be binary compatible © DataStax, All Rights Reserved. 22
  • 21. Approach • Omit (most) IndexInfo on heap • Read IndexInfo only when needed • Serialize primary index via byte buffer • Objects “never” promoted to Java old gen (hope so ;) ) © DataStax, All Rights Reserved. 23
  • 22. Small heap (3GB) test © DataStax, All Rights Reserved. 24 Before #11206 – duration: 3h, lots of GC, exhausted heap With #11206 – duration: 1h10, few GC, moderate heap usage java.lang. OutOfMemoryError org.apache.cassandra.io.sstable.LargePartitionsTest
  • 23. Results • Promising! • But: Performance regression w/ some workloads © DataStax, All Rights Reserved. 25
  • 24. Better Approach • Keep IndexInfo objects for “nicely” sized partitions on-heap • Controlled via c.yaml © DataStax, All Rights Reserved. 26
  • 25. Doesn’t this mean more disk I/O? • “Hot” data already in buffer cache • No change for “cold” partitions © DataStax, All Rights Reserved. 27
  • 26. #11206 Benefits • Reduced heap usage • Reduced GC pressure • Improved read and write paths • Key cache can hold “more” entries • Moved the bad partition size “barrier” © DataStax, All Rights Reserved. 28
  • 27. #11206 Metrics org.apache.cassandra.metrics: type=Index,scope=RowIndexEntry • name=IndexInfoCount Histogram - # of IndexInfo per IndexedEntry • name=IndexInfoGets Histogram - # of ”gets” against single IndexedEntry • name=IndexedEntrySize Histogram - serialized size of IndexedEntry © DataStax, All Rights Reserved. 29
  • 28. „After #11206, what‘s the recommended partition size?“ • It still depends – sorry • IMO we moved the “barrier” Test with your data model and workload © DataStax, All Rights Reserved. 30
  • 29. Bad usage of large partitions • CQL SELECT without clustering key • i.e. materialize a large partition in memory • Using the same partition key over a long time • i.e. access many sstables © DataStax, All Rights Reserved. 31
  • 30.
  • 31. #9754 • Changes on-disk primary index format • Efficient on-disk representation • Optimized for OS page size • WIP ! • Fix-Version: 4.x © DataStax, All Rights Reserved. 33
  • 32. Thank You! Q & A Come to the “experts stand”

Editor's Notes

  1. POLL: C* 2.x C* 3.0.x C* < 3.6 C* >= 3.6 Agenda Why big partitions hurt before 3.6 What CASSANDRA-11206 changed Things to consider
  2. Big means many MB - couple 100s and more - like GB Going to explain why this is an issue w/ C* < 3.6
  3. Compression-Info – Digest – Statistics – TOC
  4. Compression-Info – Digest – Statistics – TOC
  5. Compression-Info – Digest – Statistics – TOC
  6. How a primary index file looks like Read "comes" from Summary Scan until requested Partition is found
  7. Emphasize Java Objects
  8. Issues Many objects Promoted between Java heap generations Evicted key cache entries usually in old gen Huge GC pressure Affects Reads Compaction Repair Flush (usually not an issue)
  9. IndexedEntry including all IndexInfo read as Java object structure IndexedEntry including all IndexInfo placed in key cache DONE for ALL kinds of reads
  10. Global - all sstables - all tables - all keyspaces
  11. Partition sizes: 16k, 1M, 4M, 8M … 512M, 1G, 2G, 4G, 8G with varying total data sizes
  12. Nicely sized partitions (few MB)
  13. Updated on reads