More Related Content Similar to Cassandra Summit 2014: Monitor Everything! (20) More from DataStax Academy (20) Cassandra Summit 2014: Monitor Everything!1. About Me
● Sr. Engineer at Pythian
o Lead of Cassandra Practice
#CassandraSummit 2014
● Remote in Minnesota
● Interests
o Java, Clojure, Python dev
o Data science
o Information Security
o Hobbyist electronics
2. About Pythian
Pythian is a global data outsourcing and consulting company that
specializes in optimizing and managing mission-critical data systems.
Pythian blends the world’s leading data experts with advanced, secure
service delivery processes to create the industry’s best standard of care
for its clients.
Since its inception, Pythian has managed some of the world’s largest,
most business-critical data infrastructures.
#CassandraSummit 2014
10,000
Pythian currently manages more than 10,000
systems.
350
Pythian currently employs more than 350 people
in 25 countries worldwide.
1997
Pythian was founded in 1997
3. About Cassandra
● No Single Point of Failure
● Fault Tolerant
● Awesome properties for an operations team who does
not want to get up at 3am
#CassandraSummit 2014
4. About Cassandra
● Nothing should be set up and forgotten about
● Easy to do with Cassandra though
o Fault tolerance on properly configured setup handles
single node being down or having temp performance
issues
o No back pressure on writes until there is a lot of
trouble
#CassandraSummit 2014
5. Utilize the fault tolerance buffer
● Need to observe and react to current issues
● Predict future issues
● Divide this into two approaches
#CassandraSummit 2014
o Proactive
o Reactive
6. Proactive
● Daily & Weekly checkups to prevent, and
predict problems
o Capacity
o Performance bottlenecks
o Data Modeling issues
#CassandraSummit 2014
7. Reactive
● Something about best laid plans…
o Hardware failures
o Bugs
o Malicious or Non-Malicious users
● Alarms, Pager Duty
#CassandraSummit 2014
9. Metrics
● Window to the application
o Bridge the gap - Coda Hale
#CassandraSummit 2014
10. Gathering Metrics
SOURCES
Cassandra Environment
OpsCenter Logs
JMX CPU, Disk, Network
Nodetool JVM, GC
#CassandraSummit 2014
11. Metrics
but of course…
Without context, the data is just pretty graphs
12. JMX
● Java Management Extensions
● Complex… very engineered
● Resources represented as objects with
attributes and operations
● Used for monitoring or as input
#CassandraSummit 2014
13. JMX
● The annoying gateway to metrics
○ Poor tooling - requires java
○ Slow, Memory Leaks
○ Historically and currently frustrating for ops (pre 2.0.8)
Cassandra
Init connection to port
7199 Reply with hostname:port for
1024-65535
#CassandraSummit 2014
RMI connection
Client (You)
Gets new hostname:port,
drops old connection and
attempts to connect
7199
7199
Connected!
21. JMX
org.apache.cassandra.metrics :type=
● Cache
● Client
● ClientRequest
● ClientRequestMetrics
● ColumnFamily
● CommitLog
● Compaction
#CassandraSummit 2014
● DroppedMessage
● FileCache
● Keyspace
● Storage
● ThreadPools
22. JMX
org.apache.cassandra.metrics
type=*, scope=*, name=*,
type=ThreadPools, path=*, scope=*, name=*,
type=ColumnFamily, keyspace=*, scope=*, name=*,
type=Keyspace, keyspace=*, name=*,
#CassandraSummit 2014
23. Metrics
● Toolkit called metrics for metrics
o By Coda Hale @ Yammer
● Easy to use
● Popular
#CassandraSummit 2014
24. Types of Metrics
#CassandraSummit 2014
● Gauge
o instantaneous value
● Counter
o number that can be incremented & decremented
● Meter
o rate of events over time (1/5/15 min moving avg)
● Histogram
o representation of statistical distribution
§ 50, 75, 95, 98, 99, 99.9 percentile
§ average, median, min, max, standard deviation
● Timer
o rate of events (meter)
o histogram of duration
25. JMX
#CassandraSummit 2014
75th percentile is 683 MICROSECONDS
(75% took 683us or less)
One minute rate is 13,915 calls per SECOND
26. JMX
● Overwhelming at first
● Hard to tell what they mean without the source
● Moves around a lot
● Fortunately there is nodetool
#CassandraSummit 2014
27. Nodetool
● JMX command line wrapper
● Many options
● Operations and diagnostic procedures
● For reactive analysis
o ad hoc, spot checks
#CassandraSummit 2014
28. Nodetool tpstats
nodetool tpstats
org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name=
{ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 113702 0 0
RequestResponseStage 0 0 0 0 0
MutationStage 0 0 164503 0 0
...
InternalResponseStage 0 0 0 0 0
HintedHandoff 0 0 0 0 0
#CassandraSummit 2014
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
...
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
29. Staged Event Driven Architecture
● Decomposes complex event system
● Set of stages (thread pools)
● Queue between each
● Shares a lot of pros cons as SOA
#CassandraSummit 2014
30. Staged Event Driven Architecture
#CassandraSummit 2014
ReadStage
Threads
x32
Client Request
RequestResponse
Threads
ReadRepairStage
Threads
Messaging
Service
Node 2
Node 1 Node 1
Node 1
= Task
31. Staged Event Driven Architecture
● Its easy to overrun the processing capabilities of a stage
that is not in the requests feedback loop (i.e.
ReadRepairStage).
● No write back pressure
#CassandraSummit 2014
32. Nodetool tpstats
nodetool tpstats
org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name=
{ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 113702 0 0
RequestResponseStage 0 0 0 0 0
MutationStage 0 0 164503 0 0
...
InternalResponseStage 0 0 0 0 0
HintedHandoff 0 0 0 0 0
#CassandraSummit 2014
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
...
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
33. Nodetool tpstats
nodetool tpstats
org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name=
{ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 113702 0 0
RequestResponseStage 0 0 0 0 0
MutationStage 0 0 164503 0 0
...
InternalResponseStage 0 0 0 0 0
HintedHandoff 0 0 0 0 0
#CassandraSummit 2014
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
...
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
34. Nodetool tpstats
nodetool tpstats
org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name=
{ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 113702 0 0
RequestResponseStage 0 0 0 0 0
MutationStage 0 0 164503 0 0
...
InternalResponseStage 0 0 0 0 0
HintedHandoff 0 0 0 0 0
#CassandraSummit 2014
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
...
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
35. Nodetool tpstats
nodetool tpstats
org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name=
{ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 113702 0 0
RequestResponseStage 0 0 0 0 0
MutationStage 0 0 164503 0 0
...
InternalResponseStage 0 0 0 0 0
HintedHandoff 0 0 0 0 0
#CassandraSummit 2014
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
...
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
36. Nodetool tpstats
nodetool tpstats
org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name=
{ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 113702 0 0
RequestResponseStage 0 0 0 0 0
MutationStage 0 0 164503 0 0
...
InternalResponseStage 0 0 0 0 0
HintedHandoff 0 0 0 0 0
#CassandraSummit 2014
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
...
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
37. Nodetool tpstats
nodetool tpstats
org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name=
{ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 113702 0 0
RequestResponseStage 0 0 0 0 0
MutationStage 0 0 164503 0 0
...
InternalResponseStage 0 0 0 0 0
HintedHandoff 0 0 0 0 0
#CassandraSummit 2014
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
...
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
38. Nodetool tpstats
nodetool tpstats
org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name=
{ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 113702 0 0
RequestResponseStage 0 0 0 0 0
MutationStage 0 0 164503 0 0
...
InternalResponseStage 0 0 0 0 0
HintedHandoff 0 0 0 0 0
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
...
REQUEST_RESPONSE 0
COUNTER_MUTATION 0
#CassandraSummit 2014
RequestResponse
Threads
39. Nodetool tpstats
nodetool tpstats
org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name=
{ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 113702 0 0
RequestResponseStage 0 0 0 0 0
MutationStage 0 0 164503 0 0
...
InternalResponseStage 0 0 0 0 0
HintedHandoff 0 0 0 0 0
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
...
REQUEST_RESPONSE 1
COUNTER_MUTATION 0
#CassandraSummit 2014
RequestResponse
Threads
40. Nodetool tpstats
nodetool tpstats
org.apache.cassandra.metrics:type=ThreadPools,path={internal|request|transport},scope={*},name=
{ActiveTasks|PendingTasks|CompletedTasks|CurrentlyBlockedTasks|TotalBlockedTasks}
More at:
http://www.evidencebasedit.com/guide-to-cassandra-thread-pools
#CassandraSummit 2014
41. Nodetool cfhistograms
nodetool cfhistograms {keyspace} {table}
org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}
SSTables per Read
1 sstables: 98554
2 sstables: 4534
#CassandraSummit 2014
Write Latency (microseconds)
No Data
Read Latency (microseconds)
10 us: 2
12 us: 17
14 us: 96
17 us: 208
20 us: 677
24 us: 3081
29 us: 4552
35 us: 3559
42. Read Write Path mile high overview
Memtable SSTable
#CassandraSummit 2014
Writes Reads
43. Read Write Path mile high overview
Memtable SSTable
#CassandraSummit 2014
Writes Reads
44. Read Write Path mile high overview
Memtable SSTable
#CassandraSummit 2014
Writes Reads
45. Read Write Path mile high overview
Memtable SSTable
#CassandraSummit 2014
Writes Reads
46. Read Write Path mile high overview
Memtable SSTable
#CassandraSummit 2014
Writes Reads
47. Nodetool cfhistograms
nodetool cfhistograms {keyspace} {table}
org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}
SSTables per Read
1 sstables: 98554
2 sstables: 4534
#CassandraSummit 2014
Write Latency (microseconds)
No Data
Read Latency (microseconds)
10 us: 2
12 us: 17
14 us: 96
17 us: 208
20 us: 677
24 us: 3081
29 us: 4552
35 us: 3559
48. Nodetool cfhistograms 1.1
nodetool cfhistograms {keyspace} {table}
org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}
Offset SSTables Write Latency Read Latency Row Size Column Count
1 3579 0 0 0 0
2 0 0 0 0 0
. . .
35 0 0 0 0 0
42 0 0 27 0 0
50 0 0 187 0 0
60 0 10 460 0 0
72 0 200 689 0 0
86 0 663 552 0 0
103 0 796 367 0 0
124 0 297 736 0 0
149 0 265 243 0 0
179 0 460 263 0 0
. . .
25109160 0 0 0 0 0
#CassandraSummit 2014
50. Nodetool cfhistograms 2.1
nodetool cfhistograms {keyspace} {table}
org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}
Keyspace/Table histograms
Percentile SSTables Write Latency Read Latency Partition Size Cell Count
(micros) (micros) (bytes)
50% 1.00 10.00 524.00 310 5
75% 1.00 11.75 888.00 310 5
95% 1.00 15.00 4843.75 310 5
98% 1.00 17.00 9658.90 310 5
99% 1.00 19.00 12306.47 310 5
Min 0.00 0.00 68.00 30 0
Max 2.00 1219386.00 45383.00 310 5
#CassandraSummit 2014
51. Nodetool cfstats
nodetool cfstats {-i} {keyspace}.{table}
org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}
Keyspace: Keyspace1
Read Count: 11207
Read Latency: 0.047931114482020164 ms.
Write Count: 17598
Write Latency: 0.053502954881236506 ms.
Pending Tasks: 0
Table: Standard1
SSTable count: 3
Space used (live), bytes: 9088955
Space used (total), bytes: 9088955
Space used by snapshots (total), bytes: 0
SSTable Compression Ratio: 0.3672150946
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 3
Local read count: 11207
Local read latency: 0.048 ms
Local write count: 17598
Local write latency: 0.054 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 11688
Compacted partition minimum bytes: 1110
Compacted partition maximum bytes: 126934
Compacted partition mean bytes: 2730
Average live cells per slice: 0.0
Average tombstones per slice: 0.0
#CassandraSummit 2014
52. Nodetool cfstats
nodetool cfstats {-i} {keyspace}.{table}
org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}
Keyspace: Keyspace1
Read Count: 11207
Read Latency: 0.047931114482020164 ms.
Write Count: 17598
Write Latency: 0.053502954881236506 ms.
Pending Tasks: 0
Table: Standard1
SSTable count: 3
Space used (live), bytes: 9088955
Space used (total), bytes: 9088955
Space used by snapshots (total), bytes: 0
SSTable Compression Ratio: 0.3672150946
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 3
Local read count: 11207
Local read latency: 0.048 ms
Local write count: 17598
Local write latency: 0.054 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 11688
Compacted partition minimum bytes: 1110
Compacted partition maximum bytes: 126934
Compacted partition mean bytes: 2730
Average live cells per slice: 0.0
Average tombstones per slice: 0.0
#CassandraSummit 2014
53. Nodetool cfstats
nodetool cfstats {-i} {keyspace}.{table}
org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}
Keyspace: Keyspace1
Read Count: 11207
Read Latency: 0.047931114482020164 ms.
Write Count: 17598
Write Latency: 0.053502954881236506 ms.
Pending Tasks: 0
Table: Standard1
SSTable count: 3
Space used (live), bytes: 9088955
Space used (total), bytes: 9088955
Space used by snapshots (total), bytes: 0
SSTable Compression Ratio: 0.3672150946
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 3
Local read count: 11207
Local read latency: 0.048 ms
Local write count: 17598
Local write latency: 0.054 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 11688
Compacted partition minimum bytes: 1110
Compacted partition maximum bytes: 126934
Compacted partition mean bytes: 2730
Average live cells per slice: 0.0
Average tombstones per slice: 0.0
#CassandraSummit 2014
54. Nodetool cfstats
nodetool cfstats {-i} {keyspace}.{table}
org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}
Keyspace: Keyspace1
Read Count: 11207
Read Latency: 0.047931114482020164 ms.
Write Count: 17598
Write Latency: 0.053502954881236506 ms.
Pending Tasks: 0
Table: Standard1
SSTable count: 3
Space used (live), bytes: 9088955
Space used (total), bytes: 9088955
Space used by snapshots (total), bytes: 0
SSTable Compression Ratio: 0.3672150946
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 3
Local read count: 11207
Local read latency: 0.048 ms
Local write count: 17598
Local write latency: 0.054 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 11688
Compacted partition minimum bytes: 1110
Compacted partition maximum bytes: 126934
Compacted partition mean bytes: 2730
Average live cells per slice: 0.0
Average tombstones per slice: 0.0
#CassandraSummit 2014
55. Nodetool cfstats
nodetool cfstats {-i} {keyspace}.{table}
org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}
Keyspace: Keyspace1
Read Count: 11207
Read Latency: 0.047931114482020164 ms.
Write Count: 17598
Write Latency: 0.053502954881236506 ms.
Pending Tasks: 0
Table: Standard1
SSTable count: 3
SSTables in each level: [14/4, 1, 0, …, 0]
Space used (live), bytes: 9088955
Space used (total), bytes: 9088955
Space used by snapshots (total), bytes: 0
SSTable Compression Ratio: 0.3672150946
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 3
Local read count: 11207
Local read latency: 0.048 ms
Local write count: 17598
Local write latency: 0.054 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 11688
Compacted partition minimum bytes: 1110
Compacted partition maximum bytes: 126934
Compacted partition mean bytes: 2730
Average live cells per slice: 0.0
Average tombstones per slice: 0.0
#CassandraSummit 2014
56. Nodetool cfstats
nodetool cfstats {-i} {keyspace}.{table}
org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}
Keyspace: Keyspace1
Read Count: 11207
Read Latency: 0.047931114482020164 ms.
Write Count: 17598
Write Latency: 0.053502954881236506 ms.
Pending Tasks: 0
Table: Standard1
SSTable count: 3
Space used (live), bytes: 9088955
Space used (total), bytes: 9088955
Space used by snapshots (total), bytes: 0
SSTable Compression Ratio: 0.3672150946
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 3
Local read count: 11207
Local read latency: 0.048 ms
Local write count: 17598
Local write latency: 0.054 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 11688
Compacted partition minimum bytes: 1110
Compacted partition maximum bytes: 126934
Compacted partition mean bytes: 2730
Average live cells per slice: 0.0
Average tombstones per slice: 0.0
#CassandraSummit 2014
57. Nodetool cfstats
nodetool cfstats {-i} {keyspace}.{table}
org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}
Keyspace: Keyspace1
Read Count: 11207
Read Latency: 0.047931114482020164 ms.
Write Count: 17598
Write Latency: 0.053502954881236506 ms.
Pending Tasks: 0
Table: Standard1
SSTable count: 3
Space used (live), bytes: 9088955
Space used (total), bytes: 9088955
Space used by snapshots (total), bytes: 0
SSTable Compression Ratio: 0.3672150946
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 3
Local read count: 11207
Local read latency: 0.048 ms
Local write count: 17598
Local write latency: 0.054 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 11688
Compacted partition minimum bytes: 1110
Compacted partition maximum bytes: 126934
Compacted partition mean bytes: 2730
Average live cells per slice: 0.0
Average tombstones per slice: 0.0
#CassandraSummit 2014
58. Nodetool cfstats
nodetool cfstats {-i} {keyspace}.{table}
org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}
Keyspace: Keyspace1
Read Count: 11207
Read Latency: 0.047931114482020164 ms.
Write Count: 17598
Write Latency: 0.053502954881236506 ms.
Pending Tasks: 0
Table: Standard1
SSTable count: 3
Space used (live), bytes: 9088955
Space used (total), bytes: 9088955
Space used by snapshots (total), bytes: 0
SSTable Compression Ratio: 0.3672150946
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 3
Local read count: 11207
Local read latency: 0.048 ms
Local write count: 17598
Local write latency: 0.054 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 11688
Compacted partition minimum bytes: 1110
Compacted partition maximum bytes: 126934
Compacted partition mean bytes: 2730
Average live cells per slice: 0.0
Average tombstones per slice: 0.0
#CassandraSummit 2014
59. Nodetool cfstats
nodetool cfstats {-i} {keyspace}.{table}
org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}
Keyspace: Keyspace1
Read Count: 11207
Read Latency: 0.047931114482020164 ms.
Write Count: 17598
Write Latency: 0.053502954881236506 ms.
Pending Tasks: 0
Table: Standard1
SSTable count: 3
Space used (live), bytes: 9088955
Space used (total), bytes: 9088955
Space used by snapshots (total), bytes: 0
SSTable Compression Ratio: 0.3672150946
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 3
Local read count: 11207
Local read latency: 0.048 ms
Local write count: 17598
Local write latency: 0.054 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 11688
Compacted partition minimum bytes: 1110
Compacted partition maximum bytes: 126934
Compacted partition mean bytes: 2730
Average live cells per slice: 0.0
Average tombstones per slice: 0.0
#CassandraSummit 2014
60. Nodetool cfstats
nodetool cfstats {-i} {keyspace}.{table}
org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}
Keyspace: Keyspace1
Read Count: 11207
Read Latency: 0.047931114482020164 ms.
Write Count: 17598
Write Latency: 0.053502954881236506 ms.
Pending Tasks: 0
Table: Standard1
SSTable count: 3
Space used (live), bytes: 9088955
Space used (total), bytes: 9088955
Space used by snapshots (total), bytes: 0
SSTable Compression Ratio: 0.3672150946
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 3
Local read count: 11207
Local read latency: 0.048 ms
Local write count: 17598
Local write latency: 0.054 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 11688
Compacted partition minimum bytes: 1110
Compacted partition maximum bytes: 126934
Compacted partition mean bytes: 2730
Average live cells per slice: 0.0
Average tombstones per slice: 0.0
#CassandraSummit 2014
61. Nodetool cfstats
nodetool cfstats {-i} {keyspace}.{table}
org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}
Keyspace: Keyspace1
Read Count: 11207
Read Latency: 0.047931114482020164 ms.
Write Count: 17598
Write Latency: 0.053502954881236506 ms.
Pending Tasks: 0
Table: Standard1
SSTable count: 3
Space used (live), bytes: 9088955
Space used (total), bytes: 9088955
Space used by snapshots (total), bytes: 0
SSTable Compression Ratio: 0.3672150946
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 3
Local read count: 11207
Local read latency: 0.048 ms
Local write count: 17598
Local write latency: 0.054 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 11688
Compacted partition minimum bytes: 1110
Compacted partition maximum bytes: 126934
Compacted partition mean bytes: 2730
Average live cells per slice: 0.0
Average tombstones per slice: 0.0
#CassandraSummit 2014
62. Nodetool cfstats
nodetool cfstats {-i} {keyspace}.{table}
org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}
Keyspace: Keyspace1
Read Count: 11207
Read Latency: 0.047931114482020164 ms.
Write Count: 17598
Write Latency: 0.053502954881236506 ms.
Pending Tasks: 0
Table: Standard1
SSTable count: 3
Space used (live), bytes: 9088955
Space used (total), bytes: 9088955
Space used by snapshots (total), bytes: 0
SSTable Compression Ratio: 0.3672150946
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 3
Local read count: 11207
Local read latency: 0.048 ms
Local write count: 17598
Local write latency: 0.054 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 11688
Compacted partition minimum bytes: 1110
Compacted partition maximum bytes: 126934
Compacted partition mean bytes: 2730
Average live cells per slice: 0.0
Average tombstones per slice: 0.0
#CassandraSummit 2014
63. Nodetool cfstats
nodetool cfstats {-i} {keyspace}.{table}
org.apache.cassandra.metrics:type=ColumnFamily,keyspace={keyspace},scope={table}
Keyspace: Keyspace1
Read Count: 11207
Read Latency: 0.047931114482020164 ms.
Write Count: 17598
Write Latency: 0.053502954881236506 ms.
Pending Tasks: 0
Table: Standard1
SSTable count: 3
Space used (live), bytes: 9088955
Space used (total), bytes: 9088955
Space used by snapshots (total), bytes: 0
SSTable Compression Ratio: 0.3672150946
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 3
Local read count: 11207
Local read latency: 0.048 ms
Local write count: 17598
Local write latency: 0.054 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 11688
Compacted partition minimum bytes: 1110
Compacted partition maximum bytes: 126934
Compacted partition mean bytes: 2730
Average live cells per slice: 0.0
Average tombstones per slice: 0.0
#CassandraSummit 2014
64. Nodetool proxyhistograms
nodetool proxyhistograms
org.apache.cassandra.metrics:type=ClientRequest,scope={Read|Write|RangeSlice},name=Latency
#CassandraSummit 2014
$ nodetool proxyhistograms
proxy histograms
Read Latency (microseconds)
61214 us: 1
Write Latency (microseconds)
103 us: 22
124 us: 142
149 us: 297
179 us: 1190
215 us: 1823
258 us: 2091
...
65. Nodetool compactionstats
#CassandraSummit 2014
nodetool compactionstats
org.apache.cassandra.metrics:type=Compaction
pending tasks: 1
compaction type keyspace table completed total unit Progress
Compaction Keyspace1 Standard1 6076415 29605054 bytes 20.06%
Active compaction remaining time : 0h00m03s
66. Nodetool compactionstats
#CassandraSummit 2014
nodetool compactionstats
org.apache.cassandra.metrics:type=Compaction
pending tasks: 1
compaction type keyspace table completed total unit Progress
Compaction Keyspace1 Standard1 6076415 29605054 bytes 20.06%
Active compaction remaining time : 0h00m03s
67. Nodetool compactionstats
#CassandraSummit 2014
nodetool compactionstats
org.apache.cassandra.metrics:type=Compaction
pending tasks: 1
compaction type keyspace table completed total unit Progress
Compaction Keyspace1 Standard1 6076415 29605054 bytes 20.06%
Active compaction remaining time : 0h00m03s
68. Nodetool compactionstats
#CassandraSummit 2014
nodetool compactionstats
org.apache.cassandra.metrics:type=Compaction
pending tasks: 1
compaction type keyspace table completed total unit Progress
Compaction Keyspace1 Standard1 6076415 29605054 bytes 20.06%
Active compaction remaining time : 0h00m03s
69. Nodetool
Much more!!
http://www.datastax.com/documentation/
cassandra/2.0/cassandra/tools/
toolsNodetool_r.html
#CassandraSummit 2014
70. OpsCenter
● Provides visibility to key metrics
● Alarming
● Basic orchestration and config management
● Constantly improving
● Free*
● Almost zero barrier to get setup
● Very few reasons not to run it
#CassandraSummit 2014
71. OpsCenter
● Homogeneous tooling with rest of stack
o Integrate metrics in with what app is using
o orchestration and config management
● (paid version) “Good enough”
o a mature environment should have more
#CassandraSummit 2014
72. Reporting Interface
Default Addons Community
JMX Ganglia Cassandra StatsD NewRelic Splunk
Console Graphite Cloudwatch Kafka Riemann TempDB
Csv Munin Riak InfluxDB Sematext
Slf4j MongoDB OpenTSDB Librato … MORE
#CassandraSummit 2014
73. Reporting Interface
● Configurable with yaml
o console, csv, ganglia, graphite
● Create reporter with premain agent
o compiling new jar with manifest
o add to classpath
o add javaagent in cassandra-env.sh
#CassandraSummit 2014
74. Garbage Collection
● Death, Taxes, and a stop the world GC
● Common issue to all JVM based applications
#CassandraSummit 2014
75. Garbage Collection
Enable gc logging
● Virtually no overhead
● Can be very helpful in diagnosing
performance issues
#CassandraSummit 2014
76. Garbage Collection
JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCDateStamps"
JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC"
JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCApplicationStoppedTime"
JVM_OPTS="$JVM_OPTS -XX:+PrintPromotionFailure"
JVM_OPTS="$JVM_OPTS -XX:PrintFLSStatistics=1"
JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log"
JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc.log"
JVM_OPTS="$JVM_OPTS -XX:+UseGCLogFileRotation"
JVM_OPTS="$JVM_OPTS -XX:NumberOfGCLogFiles=10"
JVM_OPTS="$JVM_OPTS -XX:GCLogFileSize=10M"
#CassandraSummit 2014
77. Garbage Collection
JVM_OPTS="$JVM_OPTS -XX:+PrintGCDetails"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCDateStamps"
JVM_OPTS="$JVM_OPTS -XX:+PrintHeapAtGC"
JVM_OPTS="$JVM_OPTS -XX:+PrintTenuringDistribution"
JVM_OPTS="$JVM_OPTS -XX:+PrintGCApplicationStoppedTime"
JVM_OPTS="$JVM_OPTS -XX:+PrintPromotionFailure"
JVM_OPTS="$JVM_OPTS -XX:PrintFLSStatistics=1"
JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log"
JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc.log"
JVM_OPTS="$JVM_OPTS -XX:+UseGCLogFileRotation"
JVM_OPTS="$JVM_OPTS -XX:NumberOfGCLogFiles=10"
JVM_OPTS="$JVM_OPTS -XX:GCLogFileSize=10M"
#CassandraSummit 2014
78. Garbage Collection
Could be its own talk
Honorable mentions:
● https://github.com/chewiebug/GCViewer
● http://jworks.idv.tw/GcWeb/
● Python, R, Octave
#CassandraSummit 2014
79. Logging
/var/log/cassandra/system.log
o provides a rolling log
o log4j
/var/log/cassandra/output.log
o captured standard error and standard out
o truncated on restart
#CassandraSummit 2014
System Logs
o syslog, dmesg, etc