Julian Harty, Sr. Sales Engineer, Splunk reviews the internals of how a Splunk search is performed, use of job inspector, search log, and gives a review of where and when to use certain commands.
2. Disclaimer
2
During
the
course
of
this
presenta@on,
we
may
make
forward
looking
statements
regarding
future
events
or
the
expected
performance
of
the
company.
We
cau@on
you
that
such
statements
reflect
our
current
expecta@ons
and
es@mates
based
on
factors
currently
known
to
us
and
that
actual
events
or
results
could
differ
materially.
For
important
factors
that
may
cause
actual
results
to
differ
from
those
contained
in
our
forward-‐looking
statements,
please
review
our
filings
with
the
SEC.
The
forward-‐looking
statements
made
in
the
this
presenta@on
are
being
made
as
of
the
@me
and
date
of
its
live
presenta@on.
If
reviewed
aSer
its
live
presenta@on,
this
presenta@on
may
not
contain
current
or
accurate
informa@on.
We
do
not
assume
any
obliga@on
to
update
any
forward
looking
statements
we
may
make.
In
addi@on,
any
informa@on
about
our
roadmap
outlines
our
general
product
direc@on
and
is
subject
to
change
at
any
@me
without
no@ce.
It
is
for
informa@onal
purposes
only
and
shall
not,
be
incorporated
into
any
contract
or
other
commitment.
Splunk
undertakes
no
obliga@on
either
to
develop
the
features
or
func@onality
described
or
to
include
any
such
feature
or
func@onality
in
a
future
release.
3. Am
I
in
the
right
Session…
and
Who
is
this
guy?
3
Goal
of
Presenta:on:
Search
Op:miza:on
• How
the
hell
do
I
speed
this
search
up?
Background
of
your
Presenter:
Julian
Harty
• Splunker
for
2+
Years
-‐
Variety
of
installa@ons
from
10GB
to
100TB’s+
• Ex-‐Oracle/MySQL
DBA
(Recovering)
• Contact
info
julian@splunk.com
4. Background
–
Great
to
Not
So
Great
Growth
without
op@miza@on
=
subop@mal
performance
-‐>
our
goal:
gejng
great
performance
at
scale
4
• More
Data
• More
Users
• New
Searches
• Even
More
Data
• Even
More
Users
• Even
More
Searches…
Op@miza@on
Steps
5. Challenge
–
Why
so
slow?
The
maturity
of
a
Splunk
deployment
5
Question?
Is your environment tuned
correctly?
Question?
Has your deployment
been architected
correctly?
Question?
Are your searches optimized?
Solution:
Architecting And
Designing Your Splunk
Deployment
- Simeon Yep
Solution:
Jiffy Lube Quick Tune Up For
Your Splunk Environment
– Sean Delaney
Solution:
Welcome to this session!!!
6. Agenda:
Objec@ves
of
this
Session
6
• The
Basics:
• Common
pinalls
-‐
Best
prac@ces
and
what
not
to
do
• Take
away:
Basic
steps
to
a
beoer
search
• Beyond
the
Basics:
• Search
Architecture
and
Workflow
• Detailed
Search
review
–
using
Job
inspector
search
examples
• Take
away:
Job
Inspector
Cheat-‐Sheet
• Q&A
12. The
Basics:
Common
Search
Behavior
12
>
be=selec@ve
AND
be=specific
|
…
Narrow
@me
range
>
foo
bar
>
host=web
sourcetype=access*
Use
Summary
Indexing
Use
Report
Accel
or
Summary
Indexing
Use
Fast/Smart
Mode
where
Possible
Bad
Behavior
Good
Behavior
Performance
Improvement
Comment
index=xyz
10-‐50%
Index
and
default
fields
source=www
-‐24h@h
365x
30x
Limit
Time
Range
>
foo
bar
30%
Combine
Searches
Fast/Smart
20-‐50%
Fast
Mode
A
AND
C
AND
D
AND
E
5-‐50%
Avoid
NOTS
Data
Models
and
Report
Accelera@on
Summary
Indexing
All
Time
Searches
>*
>
foo
|
search
bar
Verbose
Mode
Use
Intelligently
Use
Sparingly
1000%
1000%
Searches
over
large
datasets
Searches
over
long
periods
A
NOT
B
13. The
Basics:
Common
Op@miza@on
Mistakes
13
• Summary
indexing
is
Awesome!
– Ini@al
reac@on
-‐
Summarize
EVERYTHING!!!
ê Summarizing
too
much
data
negates
the
point
• Report
Accelerate
=
Turbo
buoon
– Ini@al
reac@on
-‐
Report
Accelerate
EVERYTHING!!!
ê Too
many
searches
=
skipped
search
issues
• Data
Models
are
the
answer!
– Ini@al
reac@on
–
everything
can
be
included!
ê Convoluted
data
models
can
increase
workload
14. OK,
But
How
can
you
enforce
these
recommenda@ons?
15. How
do
you
enforce
Best
Prac@ces?
15
Architect
Perspec:ve:
• User
educa@on
–
Best
Prac@ces
for
Users
Admin
Perspec:ve:
Restric@ng
User
Controls:
Pulling
in
the
reins
• Restric@ng
Role
Capabili@es
• Limit
index
• Limit
search
terms
• Limit
search
@me
range
• Limi@ng
Power
user
role
• Restrict
Number
of
RT+
Concurrent
Searches
16. How
do
you
enforce
Best
Prac@ces?
16
Admin
Perspec:ve:
• Time
range
defaults
(ui-‐prefs.conf)
• Time
range
Web
dropdown
op@ons
(Times.conf)
17. OK
Now
More
advanced
Op@miza@on:
Lets
start
with
-‐
the
skinny
on
How
Search
Works…
18. How
Search
Works
–
Physical
Perspec@ve
18
db_lt_et_4
db_lt_et_2
db_lt_et_1
db_lt_et_3
.tsidx
Sources.data
SourceTypes.data
Hosts.data
.gz
.gz
.gz
.gz
.gz
.gz
.gz
.gz
db_1290057665_1289504696_1
history
_internal
main
20. Splunk
Distributed
Search
20
4
Steps
to
a
Splunk
Search:
Parse,
Fetch,
Summarize,
Display
" StreamingCommand:
Applies
a
transforma@on
to
search
results
as
they
travel
through
the
processing
pipeline.
Eval
rex
where…
" Repor:ngCommand:
Processes
search
results
and
generates
a
repor@ng
data
structure.
Examples:
stats,
top,
and
@mechart…
21. Types
of
Searches
21
• Dense
– Low
cardinality
– Example:
sourcetype=access
method=GET
• Sparse
– High
cardinality
– Example:
sourcetype=access
method=GET
ac@on=purchase
• Super
Sparse
(or
Needle
in
a
Haystack)
– Very
high
cardinality
– Example:
sourcetype=cisco:asa
ac@on=denied
src=10.2.3.11
• Rare
– Use
Case:
user
behavior
tracking
– Example:
sourcetype=magicsource
|
rare
Dense
Super
Sparse
Sparse
22. Dense
Searches
(>10%
matching
results)
(scanCount
vs
eventCount
in
Job
Inspector)
22
Challenge:
• CPU
and
I/O-‐bound
– Ini@al
spike
in
CPU
due
to
decompression
of
raw
events.
– Retrieval
rate:
50K
events
per
second
per
server
Solu:on:
• Divide
and
conquer
– Distribute
search
to
an
indexing
cluster
– Parallel
compute
and
merge
results
• Report
Accelera@on
or
use
of
Summaries
–
divide
and
Conquer
– Report
on
summarized
data
vs.
raw
data
>
sourcetype=access_combined
method=GET
23. Sparse
Searches
23
Challenge:
• CPU-‐bound
– Dominant
cost
is
uncompressing
*.gz
raw
data
files
– Some@mes
need
to
read
far
into
a
file
to
retrieve
a
few
events
Solu:on:
• Avoid
cherry
picking
– Be
selec@ve
about
exclusions
(avoid
“NOT foo”
or
“field!=value”)
– Leverage
indexed
fields
• Filter
using
whole
terms
– Instead
of
> sourcetype=access_combined clientip=192.168.11.*!
– Use
> sourcetype=access_combined clientip=TERM(192.168.11.2)!
>
sourcetype=access_combined
status=404
24. Super
Sparse
Searches
24
• “Needle
in
Haystack”
• Very
I/O
intensive
• May
take
up
to
2
Seconds
to
parse
each
bucket
>
sourcetype=access_combined
status=404
10.2.1
25. Rare
Term
Searches
25
• Bloom
Filters*
– Bloom
filters
stored
in
each
bucket
– 50-‐buckets
processed
per
second
– I/Os
reduced
as
buckets
are
excluded
from
100-‐200
to
just
a
few
– 50-‐100x
faster
than
Super
Sparse
searches
on
conven@onal
storage,
>1000x
faster
on
SSD
(Due
to
random
reads)
>
sourcetype=access_combined
sessionID=1234
*
A
Bloom
filter
is
a
data
structure
designed
to
tell
you
whether
or
not
an
element
is
present
in
a
set
26. How
can
I
determine
if
my
search
is
Dense
or
Sparse?
Use
Job
Inspector…
26
Component
Descrip:on
scanCount
The
number
of
events
that
are
scanned
or
read
off
disk.
eventCount
Number
of
events
that
are
returned
to
base
search
• For
dense
searches
scanCount
~=
eventCount.
• For
sparse
searches,
scanCount
>>
eventCount.
>
sourcetype=access_combined
status=404
81.11.191.113
29. Job
Inspector
Walkthrough
–
Search
Command
29
Rawdata:
Improving
I/O
and
CPU
load
KV:
Are
field
extrac@ons
efficient
Lookups:
Used
appropriately
Autolookups
causing
issues
Typer:
Inefficient
Evenoypes
Alias:
Cascading
alias
30. Reading
Job
Inspector
-‐
Search.Index
30
Search.index
=
Time
to
parse
and
read
the
tsidx
files
to
determine
where
to
read
in
rawdata
How
do
you
op:mize
this?
• Improving
I/O
31. Reading
Job
Inspector
-‐
search.rawdata
31
Search.rawdata
=
Time
to
read
actual
events
from
rawdata
files
How
do
you
op:mize
this?
• Filtering
as
much
as
possible
• Add
Peers
• Alloca@ng
more
CPU,
improving
I/O
32. Reading
Job
Inspector
-‐
search.kv
32
Search.KV=
Time
taken
to
apply
field
extrac@ons
to
events
How
do
you
op:mize
this?
Regex
op@miza@ons
• Avoid
greedy
operators
.*?
• Use
of
Anchors
^
$
• Non
Capturing
groups
for
repeats
33. Reading
Job
Inspector
-‐
search.lookups
33
Search.lookups
=
Time
to
apply
lookups
to
search
How
do
you
op:mize
this?
• Use
Appropriately
(at
end
of
search)
• Autolookups
maybe
causing
issues
34. Reading
Job
Inspector
-‐
search.typer
and
tags
34
Search.typer
=
Time
to
apply
event
types
to
the
search
How
do
you
op:mize
this?
• Use
Appropriately
• Removed
unused
tags
and
evenoypes
35. Job
Inspector
Walkthrough
–
Distributed
Search
35
Dispatch.createProviderQueue
Time
to
establish
connec@on
with
peers
Dispatch.fetch
Time
spent
wai@ng
to
fetch
events
Dispatch.evaluate
The
@me
spent
parsing
the
search
and
sejng
up
the
data
structures
needed
to
run
the
search.
How
do
you
op:mize
this?
• Improving
Peer
conduc@vity
• Improve
Bundle
replica@on
• Faster
storage
36. Job
Inspector
Walkthrough
–
Distributed
Search
36
Dispatch.stream.remote
Time
to
retrieve
events
from
each
remove
peer
Issue:
1. Unequal
Indexer
performance
• Either
Hardware
mismatch
• Uneven
distribu@on
of
indexes
2. AutoLB
issues
37. Job
Inspector
Conclusions:
Search
Command
Summary
37
Component
Descrip:on
index
look
in
tsidx
files
for
where
to
read
in
rawdata
rawdata
read
actual
events
from
rawdata
files
kv
apply
fields
to
the
events
filter
filter
out
events
that
don’t
match
(e.g.,
fields,
phrases)
alias
rename
fields
according
to
props.conf
lookups
create
new
fields
based
on
exis@ng
field
values
typer
assign
evenoypes
to
events
tags
assign
tags
to
events
38. Job
Inspector
Conclusion:
Distributed
Search
Summary
38
Metric
Descrip:on
Area
to
review
createProvider
Queue
The
@me
to
connect
to
all
search
peers.
Peer
conduc@vity
fetch
The
@me
spent
wai@ng
for
or
fetching
events
from
search
peers.
Faster
Storage
stream.remote
The
@me
spent
execu@ng
the
remote
search
in
a
distributed
search
environment,
aggregated
across
all
peers.
evaluate
The
@me
spent
parsing
the
search
and
sejng
up
the
data
structures
needed
to
run
the
search.
Possible
bundle
issues
39. Addi@onal
Key
Logfiles
related
to
search
39
Search
log:
" Stored
in
$SPLUNK_HOME/var/run/splunk/dispatch/
" Detailed
analysis
of
every
step
taken
by
the
search
" Search
‘stack
trace’
41. Stats
vs
Transac@on
41
Search
Goal:
compute
sta@s@cs
on
the
dura@on
of
web
session
(JSESSIONID=unique
iden@fier):
>
|
stats
range(_@me)
as
dura@on
by
JSESSIONID
|
chart
count
by
dura@on
span=log2
>
sourcetype=access_combined
|
transac@on
JSESSIONID
|
chart
count
by
dura@on
span=log2
Not
so
Great:
Much
BeUer:
42. Dedup
vs
Latest
42
Search
Goal:
Return
latest
cart
ac@on
for
each
web
site
customer
>
sourcetype=access*
|
stats
latest(clien@p)
by
ac@on
>
sourcetype=access*|
dedup
clien@p
sortby
-‐
_@me
|table
clien@p,
ac@on
Not
so
Great:
Much
BeUer:
Note:
dedup
can't
be
used
with
report
accelera@on
43. Joins
and
Subsearches
43
Search
Goal:
Return
latest
JESSIONID
across
two
sourcetypes
>
(sourcetype="access_combined")
OR
(sourcetype="applogs")
|
stats
latest(*)
as
*
by
JSESSIONID
>
sourcetype="access_combined"
|
join
type="inner"
JSESSIONID
[search
sourcetype="applogs"
|
dedup
JSESSIONID
|
table
JSESSIONID,
clien@p,
othervalue]
Not
so
Great:
Much
BeUer:
45. In
Closing…
45
1. Implemen@ng
Architecture
best
prac@ces
for
performance
at
scale
• With
search
behavior
in
mind…
2. Implemen@ng
User
Onboarding
Best
Prac@ces
• Basic
op@miza@on
steps
3. Periodic
Performance
Review
• Applying
accelera@on
technologies
where
appropriate
• Removing
unused
searches
4. Review
addi@onal
sides
for
• Search
flow
detail
• Op@mizing
Splunk
Web
46. And
By
the
way…
46
Other
Sessions
to
look
out
for:
• How
to
Actually
Use
Splunk
Data
Models
-‐
David
Clawson
Presented
on
Tuesday
–
Check
out
the
session
notes
• Jiffy
Lube
Tune-‐Up
for
your
Splunk
Deployment
-‐
Sean
Delaney
Presented
on
Tuesday
–
Check
out
the
session
notes
• ArchitecCng
and
Sizing
your
Splunk
Environment
-‐
Simeon
Yep
2:15-‐3:15
Today
• Splunk
Search
AcceleraCon
Technologies
–
Gerald
Kanapathy
10:30-‐11:30
Tomorrow
My
Contact
informa:on:
julian@splunk.com
@julian_Harty
48. Take
Away:
Basic
Steps
to
a
beoer
search
48
• Avoid
use
of
*
where
ever
possible.
• Avoid
the
use
of
All
Time.
• Avoid
subsearches
searches.
• Incorporate
the
use
default
fields
(source,
sourcetype,
host)
as
well
as
specific
indexes
to
every
search
(where
possible).
• Use
Fast
or
Smart
mode
where
possible
avoid
‘Verbose’
mode.
• Use
Report
Accelera@on
Sparingly
(and
Strategically)
on
reports
on
large
datasets.
• Use
Summary
Indexing
when
building
reports
over
@me
spans
beyond
target
index
reten@on.
• Use
Job
Inspector
and
Search
inspector
to
get
more
info
(hold
on
for
more
details!!!)
49. A
few
notes
on
how
to
op@mize
Splunk
Web
49
|
fields
Change
Segmenta@on
Use
Fast
Mode
Collapse
Timeline
50. Search
flow
–
Local
and
Distributed
50
Key
Files:
• Info
• Status
• Results
• Preview
Key
Flow:
1. Find
which
Bundle
to
use
2. Find
Buckets
to
use
(@me
range)
3. LISPY
TSIDX
search
4. Process
+
Summarizes
Events
hop://wiki.splunk.com/Community:HowDistSearchWorks