Splunk is a powerful platform that can harness your machine data and turn it into valuable information thereby enabling your business to make informed decisions, taking your organization from reactive to proactive. Just like any other platform, Splunk is only as powerful as the data it has access to, therefore in this session we will be conducting a walk thru of how to successfully on-board data, with samples of data ranging from simple to complex. We will also be taking a look at how to use common TA’s to bring valuable data into Splunk. This session is designed to give you a better understanding of how to onboard data into Splunk enabling you to unlock the power of your data
2. • Major
components
involved
in
data
indexing
• What
happens
to
data
within
Splunk
• What
the
data
pipeline
is
&
how
to
influence
it
• Shaping
data
understanding
via
props.conf
• Configuring
data
inputs
via
inputs.conf
• What
goes
where
• Heavy
Forwarders
vs.
Universal
Forwarders
• How
to
get
your
data
into
Splunk
(mostly
correctly)
~
60
minutes
from
now...
3. • SystemaMc
way
to
bring
new
data
sources
into
Splunk
• Make
sure
that
new
data
is
instantly
usable
&
has
maximum
value
for
users
• Goes
hand-‐in-‐hand
with
the
User
Onboarding
process
(sold
separately)
What
is
the
Data
Onboarding
Process?
4. 4
Machine Data > Business Value
Index
Untapped
Data:
Any
Source,
Type,
Volume
Online
Services
Web
Services
Servers
Security
GPS
LocaMon
Storage
Desktops
Networks
Packaged
ApplicaMons
Custom
ApplicaMons
Messaging
Telecoms
Online
Shopping
Cart
Web
Clickstream
s
Databases
Energy
Meters
Call
Detail
Records
Smartphones
and
Devices
RFID
On-‐
Premises
Private
Cloud
Public
Cloud
Ask
Any
QuesMon
ApplicaMon
Delivery
Security,
Compliance
and
Fraud
IT
OperaMons
Business
AnalyMcs
Industrial
Data
and
the
Internet
of
Things
5. Flavors of Machine Data
Order
Processing
TwiRer
Care
IVR
Middleware
Error
6. Getting Data Into Splunk
6
Agent
and
Agent-‐less
Approach
for
Flexibility
perf
shell
code
Mounted
File
Systems
hostnamemount
syslog
TCP/UDP
WMI
Event
Logs
Performance
AcMve
Directory
syslog
compaMble
hosts
and
network
devices
Unix,
Linux
and
Windows
hosts
Windows
hosts
Custom
apps
and
scripted
API
connecMons
Local
File
Monitoring
log
files,
config
files
dumps
and
trace
files
Windows
Inputs
Event
Logs
performance
counters
registry
monitoring
AcAve
Directory
monitoring
virtual
host
Windows
hosts
Scripted
Inputs
shell
scripts
custom
parsers
batch
loading
Agent-‐less
Data
Input
Splunk
Forwarder
7. Splunk
Data
Ingest
UF
UF
HF
UF
IDX
SH
Splunk
Enterprise
(with
opMonal
configs)
Splunk
Universal
Forwarder
Summary:
when
it
comes
to
"core"
Splunk,
there
are
two
dis8nct
products:
Splunk
Universal
Forwarder
and
Splunk
Enterprise.
"Everything
else"
–
Indexer,
Search
Head,
License
Server,
Deployment
Server,
Cluster
Master,
Deployer,
Heavy
Forwarder,
etc.
are
all
instances
of
Splunk
Enterprise
with
varying
configs.
12. • Input
Processors:
Monitor,
FIFO,
UDP,
TCP,
Scripted
• No
events
yet-‐-‐
just
a
stream
of
bytes
• Break
data
stream
into
64KB
blocks
• Annotate
stream
with
metadata
keys
(host,
source,
sourcetype,
index,
etc.)
• Can
happen
on
UF,
HF
or
indexer
Inputs–
Where
it
all
starts
13. • Check
character
set
• Break
lines
• Process
headers
• Can
happen
on
HF
or
indexer
Parsing
14. • Merge
lines
for
mulM-‐line
events
• IdenMfy
events
(finally!)
• Extract
Mmestamps
• Exclude
events
based
on
Mmestamp
(MAX_DAYS_AGO,
..)
• Can
happen
on
HF
or
indexer
AggregaMon/Merging
15. • Do
regex
replacement
(field
extracMon,
punctuaMon
extracMon,
event
rouMng,
host/source/sourcetype
overrides)
• Annotate
events
with
metadata
keys
(host,
source,
sourcetype,
..)
• Can
happen
on
HF
or
indexer
Typing
16. • Output
processors:
TCP,
syslog,
HTTP
• indexAndForward
• Sign
blocks
• Calculate
license
volume
and
throughput
metrics
• Index
• [Write
to
disk
]
/
[forward
elsewhere]
/
...
• Can
happen
on
HF
or
indexer
Indexing
22. Splunk
Data
Ingest
UF
UF
HF
UF
IDX
SH
Parsing
Not
Parsing
Note:
the
data
is
parsed
at
the
first
component
that
has
a
parsing
engine
–
and
not
again
This
effects
where
you
put
certain
props.conf
and
transforms.conf
files
(a.k.a.
some8mes
they
go
on
the
forwarder)
24. • IdenMfy
the
specific
sourcetype(s)
-‐
onboard
each
separately
• Check
for
pre-‐exisMng
app/TA
on
splunk.com-‐-‐
don't
reinvent
the
wheel!
• Gather
info
• Where
does
this
data
originate/reside?
How
will
Splunk
collect
it?
• Which
users/groups
will
need
access
to
this
data?
Access
controls?
• Determine
the
indexing
volume
and
data
retenMon
requirements
• Will
this
data
need
to
drive
exisMng
dashboards
(ES,
PCI,
etc.)?
• Who
is
the
SME
for
this
data?
• Map
it
out
• Get
a
"big
enough"
sample
of
the
event
data
• IdenMfy
and
map
out
fields
• Assign
sourcetype
and
TA
names
according
to
CIM
convenMons
On-‐boarding
Process
25. • Dev
• Create
(or
use)
an
app
• Props
/
inputs
definiMon
• Sourcetype
definiMon
• Use
data
import
wizard
• Import,
tweak,
repeat
• Oneshot
• [hook
up
monitor]
On-‐boarding
Process
• Prod
• Deploy
app
• Validate
• Monitor
• Test
• Deploy
app
• Oneshost
• Validate
• Hook
up
monitor
• Validate
1
2
3
26. • General:
• Use
apps
for
configs
• Use
TAs
/
add-‐ons
from
Splunk
if
possible
• Use
dev,
test,
prod
• Dev
can
be
laptop,
test
can
be
ephemeral
• UF
when
possible
• HF
only
if
filtering
/
transforming
is
required
in
foreign
land
• Unique
Sourcetype
per
event
stream
• Don't
send
data
through
Search
Heads
• Don't
send
data
direct
to
Indexers
Good
Hygiene
27. • inputs.conf
• As
specific
as
possible
• Set
sourcetype,
if
possible
• Don't
let
splunk
auto-‐sourcetype
(no
...too_small)
• Specify
index
if
possible
• props.conf
• Set:
TIME_PREFIX,
TIME_FORMAT,
MAX_TIMESTAMP_LOOKAHEAD
• OpMmally:
SHOULD_LINEMERGE
=
false,
LINE_BREAKER,
TRUNCATE
Good
Hygiene
29. • IdenMfy
the
specific
sourcetype(s)
-‐
onboard
each
separately
• Check
for
pre-‐exisMng
app/TA
on
splunk.com-‐-‐
don't
reinvent
the
wheel!
• Gather
info
• Where
does
this
data
originate/reside?
How
will
Splunk
collect
it?
• Which
users/groups
will
need
access
to
this
data?
Access
controls?
• Determine
the
indexing
volume
and
data
retenMon
requirements
• Will
this
data
need
to
drive
exisMng
dashboards
(ES,
PCI,
etc.)?
• Who
is
the
SME
for
this
data?
• Map
it
out
• Get
a
"big
enough"
sample
of
the
event
data
• IdenMfy
and
map
out
fields
• Assign
sourcetype
and
TA
names
according
to
CIM
convenMons
Pre-‐Board
30. • The
Common
InformaMon
Model
(CIM)
defines
relaMonships
in
the
underlying
data,
while
leaving
the
raw
machine
data
intact
• A
naming
convenMon
for
fields,
evensypes
&
tags
• More
advanced
reporMng
and
correlaMon
requires
that
the
data
be
normalized,
categorized,
and
parsed
• CIM-‐compliant
data
sources
can
drive
CIM-‐based
dashboards
(ES,
PCI,
others)
Tangent:
What
is
the
CIM
and
why
should
I
care?
31. • IdenMfy
necessary
configs
(inputs,
props
and
transforms)
to
properly
handle:
• Mmestamp
extracMon,
Mmezone,
event
breaking,
sourcetype/host/source
assignments
• Do
events
contain
sensiMve
data
(i.e.,
PII,
PAN,
etc.)?
Create
masking
transforms
if
necessary
• Package
all
index-‐Mme
configs
into
the
TA
Build
the
index-‐Mme
configs
32. • Assign
sourcetype
according
to
event
format;
events
with
similar
format
should
have
the
same
sourcetype
• When
do
I
need
a
separate
index?
• When
the
data
volume
will
be
very
large,
or
when
it
will
be
searched
exclusively
a
lot
• When
access
to
the
data
needs
to
be
controlled
• When
the
data
requires
a
specific
data
retenMon
policy
• Resist
the
temptaMon
to
create
lots
of
indexes
Tangent:
Best
&
Worst
PracMces
33. • Always
specify
a
sourcetype
and
index
• Be
as
specific
as
possible:
use
/var/log/fubar.log,
not
/var/log/
• Arrange
your
monitored
filesystems
to
minimize
unnecessary
monitored
logfiles
• Use
a
scratch
index
while
tesMng
new
inputs
Best
&
Worst
PracMces
–
[monitor]
34. • Lookout
for
inadvertent,
runaway
monitor
clauses
• Don’t
monitor
thousands
of
files
unnecessarily–
that’s
the
NSA’s
job
• From
the
CLI:
splunk
show
monitor
• From
your
browser:
hsps://your_splunkd:8089/
services/admin/inputstatus/TailingProcessor:FileStatus
Best
&
Worst
PracMces
–
[monitor]
35. • Find
&
fix
index-‐Mme
problems
BEFORE
polluMng
your
index
• A
try-‐it-‐before-‐you-‐fry-‐it
interface
for
figuring
out
• Event
breaking
• Timestamp
recogniMon
• Timezone
assignment
• Provides
the
necessary
props.conf
parameter
sewngs
Your
friend,
the
Data
Previewer
Another
Tangent!
37. • IdenMfy
"interesMng"
events
which
should
be
tagged
with
an
exisMng
CIM
tag
(hsp://
docs.splunk.com/DocumentaMon/CIM/latest/User/Alerts)
• Get
a
list
of
all
current
tags:
|
rest
splunk_server=local
/services/admin/tags
|
rename
tag_name
as
tag,
field_name_value
AS
definiMon,
eai:acl.app
AS
app
|
eval
definiMon_and_app=definiMon
.
"
("
.
app
.
")"
|
stats
values(definiMon_and_app)
as
"definiMons
(app)"
by
tag
|
sort
+tag
• Get
a
list
of
all
evensypes
(with
associated
tags):
|
rest
splunk_server=local
/services/
admin/evensypes
|
rename
Mtle
as
evensype,
search
AS
definiMon,
eai:acl.app
AS
app
|
table
evensype
definiMon
app
tags
|
sort
+evensype
• Examine
the
current
list
of
CIM
tags.
For
each
"interesMng"
event,
idenMfy
which
tags
should
be
applied
to
each.
A
parMcular
event
may
have
mulMple
tags.
• Are
there
new
tags
which
should
be
created,
beyond
those
in
the
current
CIM
tag
library?
If
so,
add
them
to
the
CIM
library
Build
the
search-‐Mme
configs:
evenRypes
&
tags
38. • Extract
"interesMng"
fields
• If
already
in
your
CIM
library,
name
or
alias
appropriately
• If
not
already
in
your
CIM
library,
name
according
to
CIM
convenMons
• Add
lookups
for
missing/desirable
fields
• Lookups
may
be
required
to
supply
CIM-‐compliant
fields/field
values
(for
example,
to
convert
'sev=42'
to
'severity=medium'
• Make
the
values
more
readable
for
humans
• Put
everything
into
the
TA
package
Build
the
search-‐Mme
configs:
extracMons
&
lookups
39. • Create
data
models.
What
will
be
interesMng
for
end
users?
• Document!
(Especially
the
fields,
evensypes
&
tags)
• Test
• Does
this
data
drive
relevant
exisMng
dashboards
correctly?
• Do
the
data
models
work
properly
/
produce
correct
results?
• Is
the
TA
packaged
properly?
• Check
with
originaMng
user/group;
is
it
OK?
Keep
Going
40. • Determine
addiMonal
Splunk
infrastructure
required;
can
exisMng
infrastructure
&
license
support
this?
• Will
new
forwarders
be
required?
If
so,
iniMate
CR
process(es)
• Will
firewall
changes
be
required?
If
so,
iniMate
CR
process(es)
• Will
new
Splunk
roles
be
required?
Create
&
map
to
AD
roles
• Will
new
app
contexts
be
required?
Create
app(s)
as
necessary
• Will
new
users
be
added?
Create
the
accounts
Get
ready
to
deploy
41. • Deploy
new
search
heads
&
indexers
as
needed
• Install
new
forwarders
as
needed
• Deploy
new
app
&
TA
to
search
heads
&
indexers
• Deploy
new
TA
to
relevant
forwarders
Bring
it!
42. • All
sources
reporMng?
• Event
breaking,
Mmestamp,
Mmezone,
host,
source,
sourcetype?
• Field
extracMons,
aliases,
lookups?
• Evensypes,
tags?
• Data
model(s)?
• User
access?
• Confirm
with
original
requesMng
user/group:
looks
OK?
Test
&
Validate
44. • Bring
new
data
sources
in
correctly
the
first
Mme
• Reduce
the
amount
of
“bad”
data
in
your
indexes–
and
the
Mme
spent
dealing
with
it
• Make
the
new
data
immediately
useful
to
ALL
users–
not
just
the
ones
who
originally
requested
it
• Allow
the
data
to
drive
all
sorts
of
dashboards
without
extra
modificaMons
Gee,
this
seems
like
a
lot
of
work…
45. • What
splunk
can
monitor:
• hsp://docs.splunk.com/DocumentaMon/Splunk/latest/Data/WhatSplunkcanmonitor
• How
data
moves
through
splunk:
• hsp://docs.splunk.com/DocumentaMon/Splunk/latest/Deploy/Datapipeline
• Components
of
the
data
pipeline:
• hsp://docs.splunk.com/DocumentaMon/Splunk/latest/Deploy/Componentsofadistributedenvironment
• Common
informaMon
model
app:
• hsps://splunkbase.splunk.com/app/1621
• Common
informaMon
model
docs:
• hsp://docs.splunk.com/DocumentaMon/CIM/latest/User/Overview
• Where
do
I
put
configs:
• hsp://wiki.splunk.com/Where_do_I_configure_my_Splunk_sewngs
Reference