Conf2014_SplunkSearchOptimization

Copyright
©
2014
Splunk
Inc.

Julian
Harty

SE,
Splunk>

Search
Op@miza@on

in
500
easy
steps

Disclaimer

2

During
the
course
of
this
presenta@on,
we
may
make
forward
looking
statements
regarding
future
events
or
the

expected
performance
of
the
company.
We
cau@on
you
that
such
statements
reflect
our
current
expecta@ons
and

es@mates
based
on
factors
currently
known
to
us
and
that
actual
events
or
results
could
differ
materially.
For

important
factors
that
may
cause
actual
results
to
differ
from
those
contained
in
our
forward-‐looking
statements,

please
review
our
filings
with
the
SEC.
The
forward-‐looking
statements
made
in
the
this
presenta@on
are
being
made
as

of
the
@me
and
date
of
its
live
presenta@on.
If
reviewed
aSer
its
live
presenta@on,
this
presenta@on
may
not
contain

current
or
accurate
informa@on.
We
do
not
assume
any
obliga@on
to
update
any
forward
looking
statements
we
may

make.
In
addi@on,
any
informa@on
about
our
roadmap
outlines
our
general
product
direc@on
and
is
subject
to
change

at
any
@me
without
no@ce.
It
is
for
informa@onal
purposes
only
and
shall
not,
be
incorporated
into
any
contract
or

other
commitment.
Splunk
undertakes
no
obliga@on
either
to
develop
the
features
or
func@onality
described
or
to

include
any
such
feature
or
func@onality
in
a
future
release.

Am
I
in
the
right
Session…

and
Who
is
this
guy?

3

Goal
of
Presenta:on:
Search
Op:miza:on

•  How
the
hell
do
I
speed
this
search
up?

Background
of
your
Presenter:
Julian
Harty

•  Splunker
for
2+
Years
-‐
Variety
of
installa@ons
from
10GB

to
100TB’s+

•  Ex-‐Oracle/MySQL
DBA
(Recovering)

•  Contact
info
julian@splunk.com

Background
–
Great
to
Not
So
Great

Growth
without
op@miza@on
=
subop@mal
performance

-‐>
our
goal:
gejng
great
performance
at
scale

4

•  More
Data

•  More
Users

•  New
Searches

•  Even
More
Data

•  Even
More
Users

•  Even
More
Searches…

Op@miza@on

Steps

Challenge
–
Why
so
slow?

The
maturity
of
a
Splunk
deployment

5

Question?

Is your environment tuned
correctly?

Question?

Has your deployment

been architected
correctly?

Question?

Are your searches optimized?

Solution:

Architecting And
Designing Your Splunk
Deployment

- Simeon Yep

Solution:

Jiffy Lube Quick Tune Up For

Your Splunk Environment

– Sean Delaney

Solution:

Welcome to this session!!!

Agenda:
Objec@ves
of
this
Session

6

•  The
Basics:

•  Common
pinalls
-‐
Best
prac@ces
and
what
not
to
do

•  Take
away:
Basic
steps
to
a
beoer
search

•  Beyond
the
Basics:

•  Search
Architecture
and
Workﬂow

•  Detailed
Search
review
–
using
Job
inspector
search
examples

•  Take
away:
Job
Inspector
Cheat-‐Sheet

•  Q&A

Iden@fying

Poorly
Performing

Searches

End
User
Enquiries

8

SOS
–
Expensive
Searches

Search
Ac@vity,
Usage
Paoerns

-‐>
SOS
–>
Search
-‐>
Search
Detail
Ac@vity
-‐>
Expensive
Searches

9

For
Splunk
6.2
Users
–
_Introspec@on
Index

10

Search
Tuning
–

The
Basics

The
Basics:
Common
Search
Behavior

12

>
be=selec@ve
AND
be=speciﬁc
|
…

Narrow
@me
range

>
foo
bar

>
host=web
sourcetype=access*

Use
Summary
Indexing

Use
Report
Accel
or
Summary
Indexing

Use
Fast/Smart
Mode
where
Possible

Bad
Behavior
Good
Behavior

Performance

Improvement

Comment

index=xyz

10-‐50%

Index
and
default
ﬁelds

source=www

-‐24h@h

365x
30x

Limit
Time
Range

>
foo
bar

30%

Combine
Searches

Fast/Smart

20-‐50%

Fast
Mode

A
AND
C
AND
D
AND
E

5-‐50%

Avoid
NOTS

Data
Models
and
Report

Accelera@on

Summary
Indexing

All
Time
Searches

>*

>
foo
|
search
bar

Verbose
Mode

Use
Intelligently

Use
Sparingly

1000%

1000%

Searches
over

large
datasets

Searches
over
long
periods

A
NOT
B

The
Basics:
Common
Op@miza@on
Mistakes

13

•  Summary
indexing
is
Awesome!

–  Ini@al
reac@on
-‐
Summarize
EVERYTHING!!!

ê  Summarizing
too
much
data
negates
the
point

•  Report
Accelerate
=
Turbo
buoon

–  Ini@al
reac@on
-‐
Report
Accelerate
EVERYTHING!!!

ê  Too
many
searches
=
skipped
search
issues

•  Data
Models
are
the
answer!

–  Ini@al
reac@on
–
everything
can
be
included!

ê  Convoluted
data
models
can
increase
workload

OK,
But
How
can

you
enforce
these

recommenda@ons?

How
do
you
enforce
Best
Prac@ces?

15

Architect
Perspec:ve:

•  User
educa@on
–
Best
Prac@ces
for
Users

Admin
Perspec:ve:

Restric@ng
User
Controls:
Pulling
in
the
reins

•  Restric@ng
Role
Capabili@es

•  Limit
index

•  Limit
search
terms

•  Limit
search
@me
range

•  Limi@ng
Power
user
role

•  Restrict
Number
of
RT+
Concurrent
Searches

How
do
you
enforce
Best
Prac@ces?

16

Admin
Perspec:ve:

•  Time
range
defaults
(ui-‐prefs.conf)

•  Time
range
Web
dropdown
op@ons
(Times.conf)

OK
Now
More

advanced

Op@miza@on:
Lets

start
with
-‐
the

skinny
on
How

Search
Works…

How
Search
Works
–
Physical
Perspec@ve

18

db_lt_et_4

db_lt_et_2

db_lt_et_1

db_lt_et_3

.tsidx

Sources.data

SourceTypes.data

Hosts.data

.gz

.gz

.gz

.gz

.gz

.gz

.gz

.gz

db_1290057665_1289504696_1
history

_internal

main

How
Search
Works
-‐
Logically

Search
Query
Structure

Parse,
Fetch,
Summarize,
Display

19

Index=mydata
|
eval
loc=long+lat+alt
|
stats
count

retrieve
events
ﬁlter/transform/map

Splunk
Distributed
Search

20

4
Steps
to
a
Splunk
Search:

Parse,
Fetch,
Summarize,
Display

" StreamingCommand:
Applies
a

transforma@on
to
search
results

as
they
travel
through
the

processing
pipeline.
Eval
rex

where…

" Repor:ngCommand:
Processes

search
results
and
generates
a

repor@ng
data
structure.

Examples:
stats,
top,
and

@mechart…

Types
of
Searches

21

•  Dense

–  Low
cardinality

–  Example:
sourcetype=access
method=GET

•  Sparse

–  High
cardinality

–  Example:
sourcetype=access
method=GET
ac@on=purchase

•  Super
Sparse
(or
Needle
in
a
Haystack)

–  Very
high
cardinality

–  Example:
sourcetype=cisco:asa
ac@on=denied
src=10.2.3.11

•  Rare

–  Use
Case:
user
behavior
tracking

–  Example:
sourcetype=magicsource
|
rare

Dense

Super

Sparse

Sparse

Dense
Searches
(>10%
matching
results)

(scanCount
vs
eventCount
in
Job
Inspector)

22

Challenge:

•  CPU
and
I/O-‐bound

–  Ini@al
spike
in
CPU
due
to
decompression

of
raw
events.

–  Retrieval
rate:
50K
events
per
second
per
server

Solu:on:

•  Divide
and
conquer

–  Distribute
search
to
an
indexing
cluster

–  Parallel
compute
and
merge
results

•  Report
Accelera@on
or
use
of
Summaries
–
divide
and
Conquer

–  Report
on
summarized
data
vs.
raw
data

>
sourcetype=access_combined
method=GET

Sparse
Searches

23

Challenge:

•  CPU-‐bound

–  Dominant
cost
is
uncompressing
*.gz
raw
data
files

–  Some@mes
need
to
read
far
into
a
file
to
retrieve
a
few
events

Solu:on:

•  Avoid
cherry
picking

–  Be
selec@ve
about
exclusions
(avoid
“NOT foo”
or
“field!=value”)

–  Leverage
indexed
fields

•  Filter
using
whole
terms

–  Instead
of

> sourcetype=access_combined clientip=192.168.11.*!
–  Use

> sourcetype=access_combined clientip=TERM(192.168.11.2)!
>
status=404

Super
Sparse
Searches

24

•  “Needle
in
Haystack”

•  Very
I/O
intensive

•  May
take
up
to
2
Seconds

to
parse
each
bucket

>
status=404
10.2.1

Rare
Term
Searches

25

•  Bloom
Filters*

–  Bloom
ﬁlters
stored
in
each
bucket

–  50-‐buckets
processed
per
second

–  I/Os
reduced
as
buckets
are
excluded
from
100-‐200
to
just
a
few

–  50-‐100x
faster
than
Super
Sparse
searches
on
conven@onal
storage,

>1000x
faster
on
SSD
(Due
to
random
reads)

>
sessionID=1234

*
A
Bloom
ﬁlter
is
a
data
structure
designed
to
tell

you
whether
or
not
an
element
is
present
in
a
set

How
can
I
determine
if
my
search
is
Dense
or
Sparse?

Use
Job
Inspector…

26

Component
Descrip:on

scanCount
The
number
of
events
that
are
scanned
or
read
oﬀ
disk.

eventCount
Number
of
events
that
are
returned
to
base
search

•  For
dense
searches
scanCount
~=
eventCount.

•  For
sparse
searches,
scanCount
>>
eventCount.

>

status=404
81.11.191.113

Measuring
Search

Using
the
Splunk
Search
Inspector

28
Copyright*©*2011,*Splunk*Inc.* Listen*to
*
Using*the*Search*Inspector*
3*
Timings*from*distributed*
Remote*timeline*
Timings*from*the*search*
command.*
Timings
from

distributed
peers

Timings
from

the
search
command

*
Using*the*Search*Inspector*
Timings*from*distributed*p
Remote*timeline*
Timings*from*the*search*
command.*
Key
Metrics:

•  Comple@on
Time

•  Number
of
Events

Scanned

•  Search
SID

Job
Inspector

Job
Inspector
Walkthrough
–
Search
Command

29

Rawdata:

Improving
I/O
and
CPU
load

KV:

Are
field
extrac@ons
efficient

Lookups:

Used
appropriately

Autolookups
causing
issues

Typer:

Inefficient
Evenoypes

Alias:

Cascading
alias

Reading
Job
Inspector
-‐

Search.Index

30

Search.index
=

Time
to
parse
and
read
the
tsidx
ﬁles
to

determine
where
to
read
in
rawdata

How
do
you
op:mize
this?

•  Improving
I/O

Reading
Job
Inspector
-‐

search.rawdata

31

Search.rawdata
=

Time
to
read
actual
events
from

rawdata
ﬁles

How
do
you
op:mize
this?

•  Filtering
as
much
as
possible

•  Add
Peers

•  Alloca@ng
more
CPU,
improving
I/O

Reading
Job
Inspector
-‐

search.kv

32

Search.KV=

Time
taken
to
apply
ﬁeld
extrac@ons

to
events

How
do
you
op:mize
this?

Regex
op@miza@ons

•  Avoid
greedy
operators
.*?

•  Use
of
Anchors
^
$

•  Non
Capturing
groups
for
repeats

Reading
Job
Inspector
-‐

search.lookups

33

Search.lookups
=

Time
to
apply
lookups
to
search

How
do
you
op:mize
this?

•  Use
Appropriately
(at
end
of
search)

•  Autolookups
maybe
causing
issues

Reading
Job
Inspector
-‐

search.typer
and
tags

34

Search.typer
=

Time
to
apply
event
types
to
the

search

How
do
you
op:mize
this?

•  Use
Appropriately

•  Removed
unused
tags
and

evenoypes

Job
Inspector
Walkthrough
–
Distributed
Search

35

Dispatch.createProviderQueue

Time
to
establish
connec@on
with
peers

Dispatch.fetch

Time
spent
wai@ng
to
fetch
events

Dispatch.evaluate

The
@me
spent
parsing
the
search
and

sejng
up
the
data
structures
needed
to
run

the
search.

How
do
you
op:mize
this?

•  Improving
Peer
conduc@vity

•  Improve
Bundle
replica@on

•  Faster
storage

Job
Inspector
Walkthrough
–
Distributed
Search

36

Dispatch.stream.remote

Time
to
retrieve
events
from
each
remove

peer

Issue:

1.  Unequal
Indexer
performance

•  Either
Hardware
mismatch

•  Uneven
distribu@on
of
indexes

2.  AutoLB
issues

Job
Inspector
Conclusions:

Search
Command
Summary

37

Component
Descrip:on

index
look
in
tsidx
files
for
where
to
read
in
rawdata

rawdata
read
actual
events
from
rawdata
files

kv
apply
fields
to
the
events

filter
filter
out
events
that
don’t
match
(e.g.,
fields,
phrases)

alias
rename
fields
according
to
props.conf

lookups
create
new
fields
based
on
exis@ng
field
values

typer
assign
evenoypes
to
events

tags
assign
tags
to
events

Job
Inspector
Conclusion:

Distributed
Search
Summary

38

Metric
Descrip:on

Area
to
review

createProvider
Queue

The
@me
to
connect
to
all
search

peers.

Peer
conduc@vity

fetch

The
@me
spent
wai@ng
for
or

fetching
events
from
search
peers.

Faster
Storage

stream.remote

The
@me
spent
execu@ng
the

remote
search
in
a
distributed

search
environment,
aggregated

across
all
peers.

evaluate
The
@me
spent
parsing
the
search

and
sejng
up
the
data
structures

needed
to
run
the
search.

Possible
bundle
issues

Addi@onal
Key
Logﬁles
related
to
search

39

Search
log:

"   Stored
in
$SPLUNK_HOME/var/run/splunk/dispatch/

"   Detailed
analysis
of
every
step
taken
by
the
search

"   Search
‘stack
trace’

What
is
the
best

search
command
to

use?

Stats
vs
Transac@on

41

Search
Goal:
compute
sta@s@cs
on
the
dura@on
of
web
session

(JSESSIONID=unique
iden@ﬁer):

>
|
stats
range(_@me)
as
dura@on
by
JSESSIONID

|
chart
count
by
dura@on
span=log2

>
|
transac@on

JSESSIONID
|
chart
count
by
dura@on

span=log2

Not
so
Great:

Much
BeUer:

Dedup
vs
Latest

42

Search
Goal:
Return
latest
cart
ac@on
for
each
web
site
customer

>
sourcetype=access*
|
stats
latest(clien@p)
by

ac@on

>
sourcetype=access*|
dedup
clien@p
sortby
-‐
_@me
|table
clien@p,
ac@on

Not
so
Great:

Much
BeUer:

Note:
dedup
can't
be

used
with
report

accelera@on

Joins
and
Subsearches

43

Search
Goal:
Return
latest
JESSIONID
across
two
sourcetypes

>
(sourcetype="access_combined")
OR

(sourcetype="applogs")
|
stats
latest(*)
as
*
by

JSESSIONID

>
sourcetype="access_combined"
|
join
type="inner"

JSESSIONID
[search
sourcetype="applogs"
|
dedup

JSESSIONID
|
table
JSESSIONID,
clien@p,
othervalue]

Not
so
Great:

Much
BeUer:

In
Closing…

45

1.  Implemen@ng
Architecture
best
prac@ces
for
performance
at
scale

•  With
search
behavior
in
mind…

2.  Implemen@ng
User
Onboarding
Best
Prac@ces

•  Basic
op@miza@on
steps

3.  Periodic
Performance
Review

•  Applying
accelera@on
technologies
where
appropriate

•  Removing
unused
searches

4.  Review
addi@onal
sides
for

•  Search
ﬂow
detail

•  Op@mizing
Splunk
Web

And
By
the
way…

46

Other
Sessions
to
look
out
for:

•  How
to
Actually
Use
Splunk
Data
Models
-‐
David
Clawson

Presented
on
Tuesday
–
Check
out
the
session
notes

•  Jiﬀy
Lube
Tune-‐Up
for
your
Splunk
Deployment
-‐
Sean
Delaney

Presented
on
Tuesday
–
Check
out
the
session
notes

•  ArchitecCng
and
Sizing
your
Splunk
Environment
-‐
Simeon
Yep

2:15-‐3:15
Today

•  Splunk
Search
AcceleraCon
Technologies
–
Gerald
Kanapathy

10:30-‐11:30
Tomorrow

My
Contact
informa:on:

julian@splunk.com
@julian_Harty

Take
Away:
Basic
Steps
to
a
beoer
search

48

•  Avoid
use
of
*
where
ever
possible.

•  Avoid
the
use
of
All
Time.

•  Avoid
subsearches
searches.

•  Incorporate
the
use
default
ﬁelds
(source,
sourcetype,
host)
as

well
as
speciﬁc
indexes
to
every
search
(where
possible).

•  Use
Fast
or
Smart
mode
where
possible
avoid
‘Verbose’
mode.

•  Use
Report
Accelera@on
Sparingly
(and
Strategically)
on
reports

on
large
datasets.

•  Use
Summary
Indexing
when
building
reports
over
@me
spans

beyond
target
index
reten@on.

•  Use
Job
Inspector
and
Search
inspector
to
get
more
info
(hold
on

for
more
details!!!)

A
few
notes
on
how
to
op@mize
Splunk
Web

49

|
ﬁelds

Change

Segmenta@on

Use
Fast
Mode
Collapse

Timeline

Search
ﬂow
–
Local
and
Distributed

50

Key
Files:

•  Info

•  Status

•  Results

•  Preview

Key
Flow:

1.  Find
which
Bundle
to
use

2.  Find
Buckets
to
use
(@me
range)

3.  LISPY
TSIDX
search

4.  Process
+
Summarizes
Events

hop://wiki.splunk.com/Community:HowDistSearchWorks

Conf2014_SplunkSearchOptimization

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Conf2014_SplunkSearchOptimization

Similar to Conf2014_SplunkSearchOptimization (20)

More from Splunk

More from Splunk (20)

Recently uploaded

Recently uploaded (20)

Conf2014_SplunkSearchOptimization