Scaling an ELK stack at bol.com

Scaling an ELK stack
Elasticsearch NL meetup
2014.09.22, Utrecht

1
Who am I?
Renzo Tomà
• IT operations
• Linux engineer
• Python developer
• Likes huge streams of raw data
• Designed metrics & logsearch platform
• Married, proud father of two
And you?

3
ELK at bol.com
Logsearch platform.
For developers & operations.
Search & analyze log events using Kibana.
Events from many sources (e.g. syslog, accesslog, log4j, …)
Part of our infrastructure.
Why? Faster root cause analyses  quicker time-to-repair.

4
Real world examples
Case: release of new webshop version.
Nagios alert: jboss processing time.
Metrics: increase in active threads (and proctime).
=> Inconclusive!
Find all HTTP requests to www.bol.com which were slower
than 5 seconds:
@type:apache_access AND @fields.site:”www_bol_com” AND
@fields.responsetimes:[5.000.000 TO *]
=> Hits for 1 URL. Enough for DEV to start its RCA.

5
Real world examples
Case: strange performance spikes on webshop.
Looks bad, but cause unknown.
Find all errors in webshop log4j logging:
@fields.application:wsp AND @fields.level:ERROR
Compare errors before vs during spike. Spot the difference.
=> Spikes caused by timeouts on a backend service.
Metrics correlation: timeouts not cause, but symptom of full
GC issue.

Initial design (mid 2013’ish)
6
Kibana2
Servers, routers, firewalls …
Remote
_syslog
pkg
Log4j
syslog
appender
Logstash
Elastic
Elassetaicrc h
search
Syslog
Log
events
Acts as syslog server.
Converts lines
into events,
into json docs.
Accesslog
Central
syslog
server
Apache webservers
Java webapplications (JVM)
Using syslog protocol
over UDP as transport.
Even for accesslog + log4j.
tail

7
Initial attempt #fail
Single logstash instance not fast enough.
Unable to keep up with events created.
High CPU load, due to intensive grokking (regex).
Network buffer overflow. UDP traffic dropped.
Result: missing events.

8
Log4j events can be multiline (e.g. stacktraces).
Events are send per line:
100 lines = 100 syslog msgs
Merging by Logstash.
Remember the UDP drops?
Result:
- unparseable events (if 1st line was missing)
- Swiss cheese. Stacktrace lines were missing.

9
Syslog RFC3164:
“The total length of the packet MUST be 1024 bytes or
less.”
Rich Apache LogFormat + lots of cookies = 4kb easily.
Anything after byte 1024 got trimmed.
Result: unparseable events (mismatch grok pattern)

10
The only way is up.
Improvement proposals:
- Use queuing to make Logstash horizontal
scalable.
- Drop syslog as transport (for non-syslog).
- Reduce amount of grokking. Pre-formatting at
source scales better. Less complexity.

Latest design (mid 2014’ish)
Lots of Many instances
other
sources
11
Kibana
2 + 3
Servers, routers, firewalls …
Local
Logsheep
Log4j
jsonevent
layout
Elastic
Elassetaicrc h
search
Syslog
Accesslog
jsonevent
format
Log
events
Central
syslog
server
Apache webservers
Java webapplications (JVM)
Elastic
Resdeaisrch
(queue)
Log4j
redis
appender
Logstash
Local
Logsheep
Events in jsonevent format.
No grokking required.

12
Current status #win
- Logstash: up to 10 instances per env (because of logstash 1.1 version)
- ES cluster (v1.0.1): 6 data + 2 client nodes
- Each datanode has 7 datadisks (striping)
- Indexing at 2k – 4k docs added per second
- Avg. index time: 0.5ms
- Peak: 300M docs = 185GB, per day
- Searches: just a few per hour
- Shardcount: 3 per idx, 1 replica, 3000 total
- Retention: up to 60 days

13
Our lessons learned
Before anything else!
Start collecting metrics so you get a baseline.
No blind tuning. Validate every change fact-based.
Our weapons of choice:
• Graphite
• Diamond (I am contributor of the ES collector)
• Jcollectd
Alternative: try Marvel.

14
Logstash tip #1
Insert Redis as queue between source and
logstash instances:
- Scale Logstash scale horizontally
- High availability (no events get lost)
Redis
Logstash
Logstash
Logstash
Redis

15
Logstash tip #2
Tune your workers. Find your chokepoint and
increase its workers to improve throughput.
Input Filter Output
Filter
Input Output
Filter
$ top –H –p $(pgrep logstash)

16
Logstash tip #3
Grok is very powerful, but CPU intensive. Hard to
write, maintain and debug.
Fix: vertical scaling. Increase filterworkers or add
more Logstash instances.
Better: feed Logstash with jsonevent input.
Solutions:
• Log4j: use log4j-jsonevent-layout
• Apache: define json output with LogFormat

17
Logstash tip #4 (last one)
Use the HTTP protocol Elasticsearch output.
Avoid a version lock in!
HTTP may be slower, but newer ES means:
- Lots of new features
- Lots of bug fixes
- Lots of performance improvements
Most important: you decide what versions to use.
Logstash v1.4.2 (June ‘14) requires ES v1.1.1 (April ‘14).
Latest ES version is v1.3.2 (Aug ‘14).

18
Elasticsearch tip #1
Do not download a ‘great’ configuration.
Elasticsearch is very complex. Lots of moving parts.
Lots of different use-cases. Lots of configuration
options. The defaults can not be optimal.
Start with defaults:
• Load it (stresstest or pre-launch traffic).
• Check your metrics.
• Find your chokepoint.
• Change setting.
• Verify and repeat.

19
Increase the ‘index.refresh_interval’ setting.
Refresh: make newly added docs available for
search. Default value: one second. High impact
on heavy indexing systems (like ours).
Change it at runtime & check the metrics:
$ curl -s -XPUT 0:9200/_all/_settings?index.refresh_interval=5s

20
Use Curator to keep total shardcount constant.
Uncontrolled shard growth may trigger a sudden
hockey stick effect.
Our setup:
- 6 datanodes
- 6 shards per index
- 3 primary, 3 replica
“One shard per datanode” (YMMV)

21
Become experienced in rolling cluster restarts:
- to roll out new Elasticsearch releases
- to apply a config setting (e.g. heap, gc, ..)
- because it will solve an incident.
Control concurrency + bandwidth:
cluster.routing.allocation.node_concurrent_recoveries
cluster.routing.allocation.cluster_concurrent_rebalance
indices.recovery.max_bytes_per_sec
Get confident enough to trust
doing a rolling restart on a
Saturday evening!
(To get this graph )

22
Elasticsearch tip #5 (last one)
Cluster restarts improve recovery time.
Recovery: compares replica vs primary shard. If
different, recreate the replica. Costly (iowait) and
very time consuming.
But … difference is normal. Primary and replica
have their own segment merge management:
same docs, but different bytes.
After recovery: replica is exact copy of primary.
Note: only works for stale shards (no more updates).
You have a lot of those when using daily Logstash indices.

You can contact me via:
rtoma@bol.com, or

26
Tools we use
http://redis.io/
Key/value memory store, no-frills queuing, extremely fast.
Used to scale logstash horizontally.
https://github.com/emicklei/log4j-redis-appender
Send log4j event to Redis queue, non-blocking, batch, failover
https://github.com/emicklei/log4j-jsonevent-layout
Format log4j events in logstash event layout.
Why have logstash do lots of grokking, if you can feed it with logstash friendly json.
http://untergeek.com/2013/09/11/getting-apache-to-output-json-for-logstash-1-2-x/
Format Apache access logging in logstash event layout. Again: avoid grokking.
https://github.com/bolcom/ (SOON)
Logsheep: custom multi-threaded logtailer / udp listener, sends events to redis.
https://github.com/BrightcoveOS/Diamond/
Great metrics collector framework with Elasticsearch collector. I am contributor.
https://github.com/elasticsearch/curator
Tool for automatic Elasticsearch index management (delete, close, optimize, bloom).

Scaling an ELK stack at bol.com

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Scaling an ELK stack at bol.com

Similar to Scaling an ELK stack at bol.com (20)

Recently uploaded

Recently uploaded (20)

Scaling an ELK stack at bol.com

Editor's Notes