The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pickle) | C* Summit 2016

Cassandra stress
Christopher Batey
@chbatey
The Last Pickle

Overview
● Importance of early realistic testing
● What is cassandra stress
● How you use it

Why do Cassandra
projects fail?

How cassandra projects roll
● New shiny project
● Cassandra is a good fit
● Build application against a single node cassandra cluster
● Go to production
● Fail

Cassandra is not an agile database
● Schema migrations :(
● Data migrations :(
● Must know queries up front

You must iterate on your schema early
● Know your queries
● Know your data
● Test them against a realistic cluster

Cassandra stress
cassandra-stress

Where
<install>/tools/bin/cassandra-stress
<install>/tools/bin/cassandra-stressd
<install>/tools/bin/compaction-stress

Commands
cassandra-stress help
---Commands---
read : Multiple concurrent reads - the cluster must first be populated by a write test
write : Multiple concurrent writes against the cluster
mixed : Interleaving of any basic commands
user : Interleaving of user provided queries, with configurable ratio and distribution
help : Print help for a command or option
10

Command help
cassandra-stress help write
Usage: write n=? [no-warmup] [truncate=?] [cl=?]
OR
Usage: write duration=? [no-warmup] [truncate=?] [cl=?]
err<? (default=0.02) Run until the standard error of the mean is below this fraction
n>? (default=30) Run at least this many iterations before accepting uncertainty convergence
n<? (default=200) Run at most this many iterations before accepting uncertainty convergence
no-warmup Do not warmup the process
truncate=? (default=never) Truncate the table: never, before performing any work, or before each iteration
cl=? (default=LOCAL_ONE) Consistency level to use
n=? Number of operations to perform
duration=? Time to run in (in seconds, minutes or hours)

Example
cassandra-stress write n=100 no-warmup truncate=once cl=ONE

Output
Op rate : 654 op/s [WRITE: 654 op/s]
Partition rate : 654 pk/s [WRITE: 654 pk/s]
Row rate : 654 row/s [WRITE: 654 row/s]
Latency mean : 59.5 ms [WRITE: 59.5 ms]
Latency median : 64.0 ms [WRITE: 64.0 ms]
Latency 95th percentile : 79.3 ms [WRITE: 79.3 ms]
Latency 99th percentile : 84.0 ms [WRITE: 84.0 ms]
Latency 99.9th percentile : 84.9 ms [WRITE: 84.9 ms]
Latency max : 84.9 ms [WRITE: 84.9 ms]
Total partitions : 100 [WRITE: 100]
Total errors : 0 [WRITE: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 00:00:00

Options
---Options---
-pop : Population distribution and intra-partition visit order
-insert : Insert specific options relating to various methods for batching and splitting partition
-rate : Thread count, rate limit or automatic mode (default is auto)
-mode : Thrift or CQL with options
-errors : How to handle errors when encountered during stress
-sample : Specify the number of samples to collect for measuring latency
-schema : Replication settings, compression, compaction, etc.
-node : Nodes to connect to
-log : Where to log progress to, and the interval at which to do it
-transport : Custom transport factories
-port : The port to connect to cassandra nodes on
-sendto : Specify a stress server to send this command to

Options help
cassandra-stress help -mode
Usage: -mode native [unprepared] cql3 [compression=?] [port=?] [user=?] [password=?] [auth-provider=?] [maxPending=?]
[connectionsPerHost=?] [protocolVersion=?]
user=? username
password=? password
unprepared force use of unprepared statements
compression=? (default=none)
port=? (default=9046)
auth-provider=? Fully qualified implementation of com.datastax.driver.core.AuthProvider
maxPending=? (default=) Maximum pending requests per connection
connectionsPerHost=? (default=) Number of connections per host
protocolVersion=? (default=NEWEST_SUPPORTED) CQL Protocol Version

Scenario: tracking your staff
● Get all the activities for a staff member
● Get the latest event for a staff member
● Is Cassandra a good fit?

Defining your schema
table: staff_activities
table_definition: |
CREATE TABLE staff_activities (
name text,
when timeuuid,
what text,
PRIMARY KEY(name, when)
)

Column metadata
columnspec:
- name: name
size: uniform(5..10) # The names of the staff members are between 5-10 characters
population: uniform(1..10) # 10 possible staff members to pick from
- name: when
cluster: uniform(20..500) # Staff members do between 20 and 500 activities
- name: what
size: normal(10..100,50)

Column metadata
- name: name
size: uniform(5..10)
population: uniform(1..10)

Column metadata
- name: when
cluster: uniform(20..500)

Column metadata
- name: what
size: NORMAL(0..500,250,10)

Distributions
cassandra-stress print
Usage: print dist=DIST(?)
dist=DIST(?): A mathematical distribution
EXP(min..max)
EXTREME(min..max,shape)
QEXTREME(min..max,shape,quantas)
GAUSSIAN(min..max,stdvrng)
GAUSSIAN(min..max,mean,stdev)
UNIFORM(min..max)
FIXED(val)
SEQ(min..max)

Distributions
cassandra-stress print dist="UNIFORM(0..100)"
% of samples Range % of total
10.0 10 10.0
20.0 20 20.0
30.0 30 30.0
40.0 40 40.0
50.0 50 50.0
60.0 60 60.0
70.0 70 70.0
80.0 80 80.0
90.0 90 90.0
95.0 95 95.0
99.0 99 99.0
100.0 100 100.0

Distributions
cassandra-stress print dist="FIXED(100)"
10.0 100 100.0
20.0 100 100.0
30.0 100 100.0
40.0 100 100.0
50.0 100 100.0
60.0 100 100.0
70.0 100 100.0
80.0 100 100.0
90.0 100 100.0
95.0 100 100.0
99.0 100 100.0
100.0 100 100.0

Distributions
cassandra-stress print "dist=NORMAL(0..500,250,10)"
10.0 237 47.4
20.0 241 48.2
30.0 244 48.8
40.0 247 49.4
50.0 250 50.0
60.0 252 50.4
70.0 255 51.0
80.0 258 51.6
90.0 262 52.4
95.0 266 53.2
99.0 273 54.6
100.0 500 100.0

Insertion of data
insert:
# we only update a single partition in any given insert
partitions: fixed(1)
# we want to insert a single row per partition and we have between 20 and 500
# rows per partition
select: fixed(1)/500
batchtype: UNLOGGED

Queries
queries:
events:
cql: select * from staff_activities where name = ?
latest_event:
cql: select * from staff_activities where name = ? LIMIT 1

Running
cassandra-stress user profile=./tlp-users.yaml
duration=1h
"ops(insert=1,latest_event=1,events=1)"
truncate=once
cl=LOCAL_QUORUM

Throughput mode (default)
● Keeps increasing the number of clients (threads) until no performance gain

Limit mode
● You specify a target TPS e.g. 10,000 writes / seconds
● Replaced with fixed/throttle mode in trunk

Graphing - adding to existing graph

Overriding protocol version
cassandra-stress -mode native protocolVersion=X

Coordinated omission
● Limit mode replaced with throttle and fixed mode
http://psy-lob-saw.blogspot.co.uk/2016/07/fixing-co-in-cstress.html

Summary
● Not very flexible for inserting data
● No LWT support
● Distributed mode limited

Questions?
Christopher Batey
@chbatey
The Last Pickle

The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pickle) | C* Summit 2016

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pickle) | C* Summit 2016

Similar to The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pickle) | C* Summit 2016 (20)

More from DataStax

More from DataStax (20)

Recently uploaded

Recently uploaded (20)

The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pickle) | C* Summit 2016