How to achieve no compromise performance and availability

ScyllaDB webinar
How to Achieve No-Compromise
Performance and Availability

+ Intro: Who we are, what we do, who uses it
+ Why we started ScyllaDB
+ How we made design decisions to achieve no-compromise
performance and availability
+ Why should you care
+ Demonstration of how to get up and running in no time on docker
+ Q&A
Today we will cover:

Introduction
+ Founded by KVM hypervisor creators
+ Q2 2014 - Pivot to the database world
+ Q3 2015 - Decloak during Cassandra Summit 2015, Beta
+ Q1 2016 - General Availability
+ Q3 2016 - First Scylla Summit: 100+ Attendees
+ Q1 2017 - Completed B round
+ $25MM in funding
+ HQs: Palo Alto, CA; Herzelia, Israel
+ 35+ employees, hiring!

What we do: Scylla, towards the best NoSQL
+ > 1 million OPS per node
+ < 1ms 99% latency
+ Auto tuned
+ Scale up and out
+ Open source
+ Large community (piggyback on Cassandra)
+ Blends in the ecosystem- Spark, Presto, time series, search, ..

Scylla benchmark by Samsung
op/s

Why we started Scylla? (cont)
+ Originally it was about performance/efficiency only
+ Over time, we understood we can deliver more:
+ SLA between background and foreground tasks
+ Work well on any given hardware {back pressure}
+ Deliver consistent, low 99th percentile latency
+ Reduction in admin effort
+ Low latency under the face of failures (hot cache load balancing)
+ High observability

Cassandra Scylla
Throughput: Cannot utilize multi-core efficiently Scales linearly - shard-per-core
Latency: High due to Java and JVM’s GC Low and consistent - own cache
Complexity: Intricate tuning and configuration Auto tuned, dynamic scheduling
Admin: Maintenance impacts performance SLA guarantee for admin vs serving

Design decisions: #1 The trivials

+ SSTable file format
+ Configuration file format
+ CQL language
+ CQL native protocol
+ JMX management protocol
+ Management command line
Design decisions: #2 Compatibility

Threads Shards
Design decisions: #3 Shard per core

SCYLLA DB: Network Comparison
Kernel
Cassandra
TCP/IPScheduler
queuequeuequeuequeuequeue
threads
NIC
Queues
Kernel
Traditional stack SeaStar’s sharded stack
Memory
Lock contention
Cache contention
NUMA unfriendly
Application
TCP/IP
Task Scheduler
queuequeuequeuequeuequeuesmp queue
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/IP
Task Scheduler
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
Application
TCP/IP
Task Scheduler
NIC
Queue
DPDK
Kernel
(isn’t
involved)
Userspace
No contention
Linear scaling
NUMA friendly
Core
Database
Task Scheduler
NIC
Queue
Userspace

Scylla has its own task scheduler
Traditional stack Scylla’s stack
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise is a
pointer to
eventually
computed value
Task is a
pointer to a
lambda function
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread is a
function pointer
Stack is a byte
array from 64k
to megabytes
Context switch cost is
high. Large stacks pollutes
the caches No sharing, millions of
parallel events

SCYLLA IS DIFFERENT
❏ DMA
❏ Log structured
❏ merge tree
❏ DBaware cache
❏ Userspace I/O
❏ scheduler
❏ NUMA friendly
❏ Log structured
allocator
❏ Zero copy
❏ Thread per core
❏ lock-free
❏ Task scheduler
❏ Reactor programing
❏ C++14
❏ Multi queue
❏ Poll mode
❏ Userspace
TCP/IP

Design Decision: #4 Unified cache vs C* cache plethora
Cassandra Scylla
Key cache
Row cache
On-heap /
Off-heap
Linux page cache
SSTables
Unified cache
SSTables
Complex
Tuning

Unified cache vs Cassandra cache plethora
Cassandra Scylla
Key cache
Row cache
On-heap /
Off-heap
Linux page cache
SSTables
Unified cache
SSTables
SSTable page (4k)
Your data (300b)
Parasitic
Rows - 300b/4kb

Cassandra Scylla
Key cache
Row cache
On-heap /
Off-heap
Linux page cache
SSTables
Unified cache
SSTables
App
thread
Kernel
SSD
Page fault
Suspend
thread
Initiate I/O
Context switch
I/O
completes
Interrupt
Context
switch
Map page
Resume
thread
Page
fault
Unified cache vs Cassandra cache plethora

Cassandra Streaming configuration
Design decisions: #5 I/O scheduler

Scylla I/O Scheduling
Query
Commitlog
Compaction
Queue
Queue
Queue
Userspace
I/O
Scheduler
Disk
Max useful disk concurrency
I/O queued in FS/deviceNo queues

Design Decision: #6 High observability

Memtable
Seastar
Scheduler
Compaction
Query
Repair
Commitlog
SSD
Compaction
Backlog
Monitor
Memory
Monitor
Adjust priority
Adjust priority
WAN
CPU
Design Decision: #7..#10 Workload conditioning

Workload Conditioning in practice
Disk can’t keep up:
workload conditioning will figure out the right request rate

+ Intro: Who we are, what we do, who uses it
+ Why we started ScyllaDB
+ How we made design decisions to achieve no-compromise
performance and availability
+ Why should you care?
+ Demonstration of how to get up and running in no time on docker
+ Q&A
Today we will cover:

+ Outbrain is the world’s largest content
discovery platform.
+ Over 557 million unique visitors from
across the globe.
+ 250 billion personalized content
recommendations every month.
Case study: Document column family

+ First read from memcached, go to Cassandra on misses.
+ Pain
+ Stale data from cache
+ Complexity
+ Cold cache -> C* gets full volume
Read
Microservice
Write Process
Outbrain: Cassandra plus Memcache

+ Writes are written in parallel to C* and Scylla
+ Reads are done in parallel:
+ Memcached + Cassandra
+ Scylla (no cache at all)
Scylla/Cassandra side by side deployment
Read
Microservice
Write Process

Scylla (w/o cache) vs Cassandra + Memcached
Scylla Cassandra
Requests/
Minute
12,000,000 500,000
AVG
Latency
4 ms 8 ms
Max
Latency
8 ms 35 ms
31

What does it mean for a non Cassandra user?
+ Throughput, latency and scale benefits
+ Wide range of big data integration: {Kariosdb, Spark,
JanusGraph, Presto, Kafka, Elastic}
+ Best HA/DR in the industry.
+ Stop using caches in front of the database
+ Consolidate HBase, Redis, MySQL, Mongo and others

+ Enterprise release, based on 1.6
+ 1.7 (in ~1 week)
+ Counters (experimental)
+ New intra-node sharding algorithm
+ SStableloader from 2.2/3.x
+ Debian
+ 1.8 (1+ month)
+ Materialized views
+ Execution blocks (cpu cache optimization which boost performance)
+ CPU scheduler - low latency during repairs
+ Partial row cache (for wide row streaming)
Upcoming releases

NoSQL GOES NATIVE
Thank you.
Next webinar May 18th:
Monitoring and sizing different workloads on AWS i3 instances

How to achieve no compromise performance and availability

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to How to achieve no compromise performance and availability

Similar to How to achieve no compromise performance and availability (20)

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

How to achieve no compromise performance and availability