Your adversaries continue to attack and get into companies. You can no longer rely on alerts from point solutions alone to secure your network. To identify and mitigate these advanced threats, analysts must become proactive in identifying not just indicators, but attack patterns and behavior. In this workshop we will walk through a hands-on exercise with a real world attack scenario. The workshop will illustrate how advanced correlations from multiple data sources and machine learning can enhance security analysts capability to detect and quickly mitigate advanced attacks.
3. 3
> René Agüero raguero@splunk.com
• > 1 year at Splunk – Security Specialist
• Based in Manhattan
• 18 years in security – MCSE NT4.0
• CISSP, MSBA – Information Assurance (forensics, auditing and
security)
• Offensive Security
• Exploitation – Metasploit, Web attacks
• Rapid7 SE Director
$whoami
4. Agenda
• Threat Hunting Basics
• Sysmon Endpoint Data
• Cyber Kill Chain
• Walkthrough of Attack Scenario Using Core Splunk (hands on)
• Lateral Movement
• DNS Exfil
• Enterprise Security Walkthrough
5. Log In Credentials
January, February & March https://od-threathunting-01.splunkoxygen.com
April, May & June https://od-threathunting-02.splunkoxygen.com
July and August https://od-threathunting-03.splunkoxygen.com
September and October https://od-threathunting-04.splunkoxygen.com
November and December https://od-threathunting-05.splunkoxygen.com
User: hunter
Pass: pr3datorbos
Birth Month
7. Am I in the right place?
Some familiarity with…
● CSIRT/SOC Operations
● General understanding of Threat Intelligence
● General understanding of DNS, Proxy, and Endpoint types of data
7
8. This is a hands-on session.
The overview slides are important for building your
“hunt” methodology
10 minutes - Seriously.
10. SANS Threat Hunting Maturity
10
Ad Hoc
Search
Statistical
Analysis
Visualization
Techniques
Aggregation Machine Learning/
Data Science
85% 55% 50% 48% 32%
Source: SANS IR & Threat Hunting Summit 2016
11. Hunting Tools: Internal Data
11
• IP Addresses: threat intelligence, blacklist, whitelist, reputation monitoring
Tools: Firewalls, proxies, Splunk Stream, Bro, IDS
• Network Artifacts and Patterns: network flow, packet capture, active network connections, historic network connections, ports
and services
Tools: Splunk Stream, Bro IDS, FPC, Netflow
• DNS: activity, queries and responses, zone transfer activity
Tools: Splunk Stream, Bro IDS, OpenDNS
• Endpoint – Host Artifacts and Patterns: users, processes, services, drivers, files, registry, hardware, memory, disk activity, file
monitoring: hash values, integrity checking and alerts, creation or deletion
Tools: Windows/Linux, Carbon Black, Tanium, Tripwire, Active Directory
• Vulnerability Management Data
Tools: Tripwire IP360, Qualys, Nessus
• User Behavior Analytics: TTPs, user monitoring, time of day location, HR watchlist
Splunk UBA, (All of the above)
12. Endpoint: Microsoft Sysmon Primer
12
● TA Available on the App Store
● Great Blog Post to get you started
● Increases the fidelity of Microsoft
Logging
Blog Post:
http://blogs.splunk.com/2014/11/24/monitoring-network-traffic-with-sysmon-and-splunk/
13. User: hunter
Pass: pr3datorbos
January, February & March https://od-threathunting-01.splunkoxygen.com
April, May & June https://od-threathunting-02.splunkoxygen.com
July and August https://od-threathunting-03.splunkoxygen.com
September and October https://od-threathunting-04.splunkoxygen.com
November and December https://od-threathunting-05.splunkoxygen.com
14. Sysmon Event Tags
14
Maps Network Comm to process_id
Process_id creation and mapping to parentprocess_id
18. Demo Story - Kill Chain Framework
Successful brute force
– download sensitive
pdf document
Weaponize the pdf file
with Zeus Malware
Convincing email
sent with
weaponized pdf
Vulnerable pdf reader
exploited by malware.
Dropper created on machine
Dropper retrieves
and installs the
malware
Persistence via regular
outbound comm
Data Exfiltration
Source: Lockheed Martin
21. APT Transaction Flow Across Data Sources
21
http (proxy) session
to
command & control
server
Remote control
Steal data
Persist in company
Rent as botnet
Proxy
Conduct
Business
Create additional
environment
Gain Access
to systemTransaction
Threat
Intelligence
Endpoint
Network
Email, Proxy,
DNS, and Web
Data Sources
.pdf
.pdf executes & unpacks malware
overwriting and running “allowed” programs
Svchost.exe
(malware)
Calc.exe
(dropper)
Attacker hacks website
Steals .pdf files
Web
Portal.pdf
Attacker creates
malware, embed in .pdf,
emails
to the target
MAIL
Read email, open attachment
Our Investigation begins by
detecting high risk
communications through the
proxy, at the endpoint, and
even a DNS call.
23. To begin our
investigation, we will
start with a quick search
to familiarize ourselves
with the data sources.
In this demo
environment, we have a
variety of security
relevant data including…
Web
DNS
Proxy
Firewall
Endpoint
Email
24. Take a look at the
endpoint data source.
We are using the
Microsoft Sysmon TA.
We have endpoint
visibility into all network
communication and can
map each connection
back to a process.
}
We also have detailed
info on each process and
can map it back to the
user and parent process.}
Lets get our day started by looking
using threat intel to prioritize our
efforts and focus on communication
with known high risk entities.
25. We have multiple source
IPs communicating to
high risk entities
identified by these 2
threat sources.
We are seeing high risk
communication from
multiple data sources.
We see multiple threat intel related
events across multiple source types
associated with the IP Address of
Chris Gilbert. Let’s take closer look
at the IP Address.
We can now see the owner of the system
(Chris Gilbert) and that it isn’t a PII or PCI
related asset, so there are no immediate
business implications that would require
informing agencies or external customers
within a certain timeframe.
This dashboard is based on event
data that contains a threat intel
based indicator match( IP Address,
domain, etc.). The data is further
enriched with CMDB based
Asset/identity information.
26. We are now looking at only threat
intel related activity for the IP
Address associated with Chris
Gilbert and see activity spanning
endpoint, proxy, and DNS data
sources.
ScrollDown
Scroll down the dashboard to
examine these threat intel events
associated with the IP Address.
We then see threat intel related
endpoint and proxy events
occurring periodically and likely
communicating with a known Zeus
botnet based on the threat intel
source (zeus_c2s).
27. It’s worth mentioning that at this point
you could create a ticket to have
someone re-image the machine to
prevent further damage as we continue
our investigation within Splunk.
Within the same dashboard, we have
access to very high fidelity endpoint
data that allows an analyst to continue
the investigation in a very efficient
manner. It is important to note that
near real-time access to this type of
endpoint data is not not common within
the traditional SOC.
The initial goal of the investigation is
to determine whether this
communication is malicious or a
potential false positive. Expand the
endpoint event to continue the
investigation.
Proxy related threat intel matches are
important for helping us to prioritize our
efforts toward initiating an
investigation. Further investigation into
the endpoint is often very time
consuming and often involves multiple
internal hand-offs to other teams or
needing to access additional systems.
This encrypted proxy traffic is concerning
because of the large amount of data
(~1.5MB) being transferred which is
common when data is being exfiltrated.
28. Exfiltration of data is a serious
concern and outbound
communication to external entity
that has a known threat intel
indicator, especially when it is
encrypted as in this case.
Lets continue the investigation.
Another clue. We also see that
svchost.exe should be located in a
Windows system directory but this is
being run in the user space. Not
good.
We immediately see the outbound
communication with 115.29.46.99 via
https is associated with the svchost.exe
process on the windows endpoint. The
process id is 4768. There is a great deal
more information from the endpoint as
you scroll down such as the user ID that
started the process and the associated
CMDB enrichment information.
29. We have a workflow action that will
link us to a Process Explorer
dashboard and populate it with the
process id extracted from the event
(4768).
30. This is a standard Windows app, but
not in its usual directory, telling us
that the malware has again spoofed
a common file name.
We also can see that the parent
process that created this
suspicuous svchost.exe process is
called calc.exe.
This has brought us to the Process
Explorer dashboard which lets us
view Windows Sysmon endpoint
data.
Suspected Malware
Lets continue the investigation by
examining the parent process as this
is almost certainly a genuine threat
and we are now working toward a
root cause.
This is very consistent with Zeus
behavior. The initial exploitation
generally creates a downloader or
dropper that will then download the
Zeus malware. It seems like calc.exe
may be that downloader/dropper.
Suspected Downloader/Dropper
This process calls itself “svchost.exe,”
a common Windows process, but the
path is not the normal path for
svchost.exe.
…which is a common trait of
malware attempting to evade
detection. We also see it making a
DNS query (port 53) then
communicating via port 443.
31. The Parent Process of our suspected
downloader/dropper is the legitimate PDF
Reader program. This will likely turn out to
be the vulnerable app that was exploited
in this attack.
Suspected Downloader/Dropper
Suspected Vulnerable AppWe have very quickly moved from
threat intel related network and
endpoint activity to the likely
exploitation of a vulnerable app.
Click on the parent process to keep
investigating.
32. We can see that the PDF
Reader process has no
identified parent and is the
root of the infection.
ScrollDown
Scroll down the dashboard to
examine activity related to the PDF
reader process.
33. Chris opened 2nd_qtr_2014_report.pdf
which was an attachment to an email!
We have our root cause! Chris opened a
weaponized .pdf file which contained the Zeus
malware. It appears to have been delivered via
email and we have access to our email logs as one
of our important data sources. Lets copy the
filename 2nd_qtr_2014_report.pdf and search a
bit further to determine the scope of this
compromise.
34. Lets dig a little further into
2nd_qtr_2014_report.pdf to determine the scope
of this compromise.
36. Lets search though multiple data sources to
quickly get a sense for who else may have
have been exposed to this file.
We will come back to the web
activity that contains reference to
the pdf file but lets first look at the
email event to determine the scope
of this apparent phishing attack.
38. We have access to the email
body and can see why this was
such a convincing attack. The
sender apparently had access to
sensitive insider knowledge and
hinted at quarterly results.
There is our attachment.
Hold On! That’s not our
Domain Name! The spelling is
close but it’s missing a “t”. The
attacker likely registered a
domain name that is very close
to the company domain hoping
Chris would not notice.
This looks to be a very
targeted spear phishing
attack as it was sent to
only one employee (Chris).
39. Root Cause Recap
39
Data Sources
.pdf executes & unpacks malware
overwriting and running “allowed” programs
http (proxy) session
to
command & control
server
Remote control
Steal data
Persist in company
Rent as botnet
Proxy
Conduct
Business
Create additional
environment
Gain Access
to systemTransaction
Threat
Intelligence
Endpoint
Network
Email, Proxy,
DNS, and Web
.pdf
Svchost.exe
(malware)
Calc.exe
(dropper)
Attacker hacks website
Steals .pdf files
Web
Portal.pdf
Attacker creates
malware, embed in .pdf,
emails
to the target
MAIL
Read email, open attachment
We utilized threat intel to detect
communication with known high risk
indicators and kick off our investigation
then worked backward through the kill
chain toward a root cause.
Key to this investigative process is the
ability to associate network
communications with endpoint process
data.
This high value and very relevant ability to
work a malware related investigation
through to root cause translates into a very
streamlined investigative process compared
to the legacy SIEM based approach.
40. 40
Lets revisit the search for additional
information on the 2nd_qtr_2014-
_report.pdf file.
We understand that the file was delivered
via email and opened at the endpoint. Why
do we see a reference to the file in the
access_combined (web server) logs?
Select the access_combined
sourcetype to investigate
further.
41. 41
The results show 54.211.114.134 has
accessed this file from the web portal
of buttergames.com.
There is also a known threat intel
association with the source IP
Address downloading (HTTP GET)
the file.
42. 42
Select the IP Address, left-click, then
select “New search”. We would like to
understand what else this IP Address
has accessed in the environment.
43. 43
That’s an abnormally large
number of requests sourced
from a single IP Address in a
~90 minute window.
This looks like a scripted
action given the constant
high rate of requests over
the below window.
ScrollDown
Scroll down the dashboard to
examine other interesting fields to
further investigate.
Notice the Googlebot
useragent string which is
another attempt to avoid
raising attention..
44. 44
The requests from 52.211.114.134 are
dominated by requests to the login page
(wp-login.php). It’s clearly not possible to
attempt a login this many times in a short
period of time – this is clearly a scripted
brute force attack.
After successfully gaining access to our
website, the attacker downloaded the
pdf file, weaponized it with the zeus
malware, then delivered it to Chris
Gilbert as a phishing email.
The attacker is also accessing admin
pages which may be an attempt to
establish persistence via a backdoor into
the web site.
45. Kill Chain Analysis Across Data Sources
45
http (proxy) session
to
command & control
server
Remote control
Steal data
Persist in company
Rent as botnet
Proxy
Conduct
Business
Create additional
environment
Gain Access
to systemTransaction
Threat
Intelligence
Endpoint
Network
Email, Proxy,
DNS, and Web
Data Sources
.pdf
.pdf executes & unpacks malware
overwriting and running “allowed” programs
Svchost.exe
(malware)
Calc.exe
(dropper)
Attacker hacks website
Steals .pdf files
Web
Portal.pdf
Attacker creates
malware, embed in .pdf,
emails
to the target
MAIL
Read email, open attachment
We continued the investigation
by pivoting into the endpoint
data source and used a
workflow action to determine
which process on the endpoint
was responsible for the
outbound communication.
We Began by reviewing
threat intel related events
for a particular IP address
and observed DNS, Proxy,
and Endpoint events for a
user in Sales.
Investigation complete! Lets get this
turned over to Incident Reponse team.
We traced the svchost.exe
Zeus malware back to it’s
parent process ID which was
the calc.exe
downloader/dropper.
Once our root cause analysis
was complete, we shifted out
focus into the web logs to
determine that the sensitive pdf
file was obtained via a brute
force attack against the
company website.
We were able to see which
file was opened by the
vulnerable app and
determined that the
malicious file was delivered
to the user via email.
A quick search into the mail
logs revealed the details
behind the phishing attack
and revealed that the scope
of the compromise was
limited to just the one user.
We traced calc.exe back to
the vulnerable application
PDF Reader.
50. Most famous Lateral Movement attack?
(excluding password re-use)
Pass the Hash!
51. Detecting Legacy PtH
Look for Windows Events:
● Event ID: 4624 or 4625
● Logon type: 3
● Auth package: NTLM
● User account is not a domain logon, or Anonymous
Logon
52. LM Detection: Pass the Hash
source=WinEventLog:Security
EventCode=4624
Authentication_Package=NTLM
Type=Information
53.
54. Then it got harder
• Pass the Hash tools have improved
• Tracking of jitter, other metrics
• So let’s detect lateral movement
differently
55. Network traffic provides source of truth
● I usually talk to 10 hosts
● Then one day I talk to 10,000 hosts
● ALARM!
58. LM Detection: Network Destinations
sourcetype="pan:traffic"
| bucket _time span=1d
| stats count dc(dest) as NumDests by src_ip _time
| stats avg(NumDests) as avg stdev(NumDests) as
stdev latest(NumDests) as latest by src_ip
| where latest > 2 * stdev + avg
Find daily average, standard deviation, and most recent
65. DNS exfil tends to be
overlooked within an
ocean of DNS data.
Let’s fix that!
DNS exfiltration
66. FrameworkPOS: a card-stealing program that exfiltrates data from the
target’s network by transmitting it as domain name system (DNS) traffic
But the big difference is the way how stolen data is
exfiltrated: the malware used DNS requests!
https://blog.gdatasoftware.com/2014/10/23942-new-frameworkpos-
variant-exfiltrates-data-via-dns-requests
“
”
… few organizations actually keep detailed logs or records
of the DNS traffic traversing their networks — making it an
ideal way to siphon data from a hacked network.
http://krebsonsecurity.com/2015/05/deconstructing-the-2014-sally-
beauty-breach/#more-30872
“
”
DNS exfiltration
67. https://splunkbase.splunk.com/app/2734/
DNS exfil detection – tricks of the trade
parse URLs & complicated TLDs (Top Level Domain)
calculate Shannon Entropy
List of provided lookups
• ut_parse_simple(url)
• ut_parse(url, list) or ut_parse_extended(url, list)
• ut_shannon(word)
• ut_countset(word, set)
• ut_suites(word, sets)
• ut_meaning(word)
• ut_bayesian(word)
• ut_levenshtein(word1, word2)
68. Examples
• The domain aaaaa.com has a Shannon Entropy score of 1.8 (very low)
• The domain google.com has a Shannon Entropy score of 2.6 (rather low)
• A00wlkj—(-a.aslkn-C.a.2.sk.esasdfasf1111)-890209uC.4.com has a Shannon
Entropy score of 3 (rather high)
Layman’s definition: a score reflecting the randomness or measure of
uncertainty of a string
Shannon Entropy
71. Detecting Data Exfiltration
… | stats
count
avg(ut_shannon) as avg_sha
avg(sublen) as avg_sublen
stdev(sublen) as stdev_sublen
by ut_domain
| search avg_sha>3 avg_sublen>20
stdev_sublen<2
TIPS
Leverage our Bro DNS data
Calculate Shannon Entropy scores
Calculate subdomain length
Display count, scores, lengths,
deviations
72. Detecting Data Exfiltration
RESULTS
• Exfiltrating data requires many DNS requests – look for high counts
• DNS exfiltration to mooo.com and chickenkiller.com
73. Summary: DNS exfiltration
● Exfiltration by DNS and ICMP is a very common technique
● Many organizations do not analyze DNS activity – do not be like them!
● No DNS logs? No Splunk Stream? Look at FW byte counts
86. Data from asset framework
Configurable Swimlanes
Darker=more events
All happened around same timeChange to
“Today” if needed
Asset Investigator, enter
“192.168.56.102”
94. ML Toolkit & Showcase
• Splunk Supported framework for building ML Apps
– Get it for free: http://tiny.cc/splunkmlapp
• Leverages Python for Scientific Computing (PSC) add-on:
– Open-source Python data science ecosystem
– NumPy, SciPy, scitkit-learn, pandas, statsmodels
• Showcase use cases: Predict Hard Drive Failure, Server Power
Consumption, Application Usage, Customer Churn & more
• Standard algorithms out of the box:
– Supervised: Logistic Regression, SVM, Linear Regression, Random Forest, etc.
– Unsupervised: KMeans, DBSCAN, Spectral Clustering, PCA, KernelPCA, etc.
• Implement one of 300+ algorithms by editing Python scripts
98. Splunk UBA Use Cases
ACCOUNT TAKEOVER
• Privileged account compromise
• Data exfiltration
LATERAL MOVEMENT
• Pass-the-hash kill chain
• Privilege escalation
SUSPICIOUS ACTIVITY
• Misuse of credentials
• Geo-location anomalies
MALWARE ATTACKS
• Hidden malware activity
BOTNET, COMMAND & CONTROL
• Malware beaconing
• Data leakage
USER & ENTITY BEHAVIOR ANALYTICS
• Suspicious behavior by accounts or
devices
EXTERNAL THREATSINSIDER THREATS
99. Splunk User Behavior Analytics (UBA)
• ~100% of breaches involve valid credentials (Mandiant Report)
• Need to understand normal & anomalous behaviors for ALL users
• UBA detects Advanced Cyberattacks and Malicious Insider Threats
• Lots of ML under the hood:
– Behavior Baselining & Modeling
– Anomaly Detection (30+ models)
– Advanced Threat Detection
• E.g., Data Exfil Threat:
– “Saw this strange login & data transferfor user kwestin
at 3am in China…”
– Surface threat to SOC Analysts
105. OWASP 2013 Top 10
[10] Unvalidated redirects and forwards
[9] Using components with known vulnerabilities
[8] Cross-site request forgery
[7] Missing function level access control
[6] Sensitive data exposure
[5] Security misconfiguration
[4] Insecure direct object reference
[3] Cross-site scripting (XSS)
[2] Broken authentication and session management
110. The anatomy of a SQL injection attack
SELECT * FROM users WHERE email='xxx@xxx.com'
OR 1 = 1 -- ' AND password='xxx';
xxx@xxx.xxx' OR 1 = 1 -- '
xxx
admin@admin.sys
1234
An attacker might supply:
112. What have we here?
Our learning environment consists of:
• A bunch of publically-accessible single
Splunk servers
• Each with ~5.5M events, from real
environments but massaged:
• Windows Security events
• Apache web access logs
• Bro DNS & HTTP
• Palo Alto traffic logs
• Some other various bits
114. https://splunkbase.splunk.com/app/1528/
Search for possible SQL injection in your events:
looks for patterns in URI query field to see if
anyone has injected them with SQL
statements
use standard deviations that are 2.5 times
greater than the average length of your URI
query field
Macros used
• sqlinjection_pattern(sourcetype, uri query field)
• sqlinjection_stats(sourcetype, uri query field)
115. Regular Expression FTW
sqlinjection_rex is a search macro. It contains:
(?<injection>(?i)select.*?from|union.*?select|'$|delete.*?from|update.*?set|alter.*?table|([
%27|'](%20)*=(%20)*[%27|'])|w*[%27|']or)
Which means: In the string we are given, look for ANY of the following matches and put that
into the “injection” field.
• Anything containing SELECT followed by FROM
• Anything containing UNION followed by SELECT
• Anything with a ‘ at the end
• Anything containing DELETE followed by FROM
• Anything containing UPDATE followed by SET
• Anything containing ALTER followed by TABLE
• A %27 OR a ‘ and then a %20 and any amount of characters then a %20 and then a %27 OR a ‘
• Note: %27 is encoded “’” and %20 is encoded <space>
• Any amount of word characters followed by a %27 OR a ‘ and then “or”
118. Summary: Web attacks/SQL injection
● SQL injection provide attackers with easy access to data
● Detecting advanced SQL injection is hard – use an app!
● Understand where SQLi is happening on your network and put a
stop to it.
● Augment your WAF with enterprise-wide Splunk searches.
Editor's Notes
Now unfortunately, you do need a modern laptop with a modern browser to participate. You can probably get away with a Surface or something like that, but iPads, old browsers, and especially IBM PCjr’s will not work. (don’t laugh – I actually had one of those.)
We are going to get hands on, and I want to make sure we have enough time to go through the exercises. But I need to frame up why this is important first, so bear with me.
-Why did we decide to do a hands on session
-compelling presentations, especially from Splunk customers but….
-we want to get our hands on the software and experience it for ourselves
Great Blog Post covering the Microsoft Sysmon TA (download, config, install, usage):
http://blogs.splunk.com/2014/11/24/monitoring-network-traffic-with-sysmon-and-splunk/
The App TS is located here:
https://apps.splunk.com/apps/#/search/sysmon/page/1
Given a general timeframe and a network 5-tuple (src_ip dest_ip, src_port. Dest_port, and protocol) we can query our mountain of endpoint data VERY QUICKLY to figure out who/what was responsible for the communication.
Armed with an understanding of the general methodologies that the attacker utilizes in modern Advanced Persistent Threats (APT), we can better equip ourselves to defend and disrupt this type of attack. The above phases are described by the Lockheed Martin Cyber Kill Chain. Our goal is to make certain that we choose our data sources wisely such that we have covered across all of the phases to give us the best chance or defending ourselves. We overlay both CMDB and Threat Intelligence based lookups to enrich that data to act as both context to help prioritize our efforts and act as a detection mechanism in this example.
Keep in mind, this exercise is focused on investigations (hunting) and not on detection. Threat Intelligence will likely not look as clean as it does in this lab and will usually contain a good deal of false positives depending on the fidelity of the threat intelligence source.
Here is the scenario that we will be walking through in this lab.
So to find all of the “known” threats – that’s pretty simple. We’re going to grab all of the data from all of the things in your environment that are regularly updated with knowledge about known threats. All the traditional security stuff. IDS/IPS. Anti-malware defenses. DLP, Vulnerability scans. SIEM technology. And of course we will collect firewall data and auth data.
But what about unknown threats? How do we find those? Well we need to look at a much bigger set of data, and then find the unusual patterns in that data. So we want to look at things like threat intel, email, web, desktops – the first four items on the top line are what we’ll focus on in this hands-on exercise. But note that Splunk can collect a whole lot more data that we believe is extremely security relevant – especially when it comes to detecting those unknown threats.
Cymru is pronounced (“cum-ree”)
Recalling back to the magical UBA solution that David just showed with the three high-level stages of the kill chain:
We started with an intrusion through the use of SQL injection, expansion through lateral movement, and now we will discuss exfiltration via DNS.
Here is a Dave‘s credentials in clear text; <CLICK>
And what it looks like encoded <CLICK>
An attacker could send that encoded string, containing Dave‘s credentials via a DNS request bound for his own DNS server.
By the way – there are other ways to exfil data, too – you can use TXT records and chunk data into that and tunnel it. Our examples here however will exfil data right in the domain name.
Raise your hand if you log your DNS data? Do you look at it?
The point is: DNS data tends to be too chatty to log and often overlooked. The same goes for an ICMP content body.
There are a couple of retail breaches that we have called out here.
We will be posting all session slides to Slideshare, so you can view these articles at your leisure.
FrameworkPOS is a packaged tool that hackers can buy and put into their hacking arsenal along with dnscat.
Just like the hacker community have their tools, our Splunk community does too!
URL Toolbox can parse URLs and complicated top level domains by dissecting a domain into its parts, such as subdomain, domain name and top level domain. It can also perform functions like shannon entropy, counting, suites, meaning ratio, bayesian analysis, etc.
But our focus today will be on the use of Shannon Entropy.
The Shannon Entropy will give us a numeric representation for how much randomness there is in the letters / numbers used in a subdomain.
It is a good indication of Domain Generation Algorithms (DGA) or of data exfiltration because compressed and/or encrypted data will have very high entropy.
Let’s all go to the DNS Exfil menu and select Detecting Data Exfiltration from the dropdown.
Again, using pre-built content, we can parse out the query and domain components and calculate the Shannon Entropy for each subdomain. Since most data exfiltration occurs via subdomain, that’s where we will be focusing our search.
If you recall back to the tornado slide, the attacker used encoded credentials as a subdomain for the DNS request.
Everything I’m going through up here has been pretty well documented in a word doc. You can use the link here to get that doc, or if you’re really interested in it later come see me. You won’t need it right now though.
Each of you has creds – there are 10 fairly large Amazon EC2 instances that have been provisioned for this exercise and if we’re at capacity there will be 12 of you on each. Now’s a good time to try hitting that URL and logging into Splunk.
Large volumes of data exfiltration tend to have relatively similar data length because it is chunking big segments of data. A low standard deviation will help indicate that this is large volume data transfer.
How many DNS requests would it take to exfiltrate a MB of data? How about a GB? This is why we look for high counts.
Here, an 18K text file is being exfiltrated to chickenkiller.com and a 20MB+ zip file is being exfiltrated to mooo.com.
In summary, don’t overlook the value in your DNS and ICMP data. If you don’t log it, you could consider grabbing the data right off the wire with Splunk Stream. If you don’t have logs or Stream, at a minimum, trend your firewall byte counts.
This should look familiar to you. What we’re doing here is giving a starting point for any Security Analyst to understand at a high level what’s going on in the environment. A single pane of glass, if you will, for all security data.
Everything we are seeing here is customizable – the panels, the indicators, via standard Splunk functionality.
Most of the data on this dashboard is centered on Notable Events. Notable Events are a concept unique to Splunk with ES – there’s an entire Notable Event framework that allows us to perform simple or complex correlations, and then create events by analyzing disparate events from disparate sources.
Notable Events in ES are categorized into various high-level security domains: access, audit, identity, network, and threat. We’ll see those categories throughout the app.
You can see Splunk Sparklines here – these little green lines. These are great for detecting quick trends in the security events – a continuous line means something constant, which could be a heartbeat or a scripted attack. A spike could be a single attack or maybe just someone fat-fingering their password a few times.
We’ll drill into some of these incidents in a few minutes, but let’s continue on with our tour. How does all this data get into Splunk?
This should look familiar to you. What we’re doing here is giving a starting point for any Security Analyst to understand at a high level what’s going on in the environment. A single pane of glass, if you will, for all security data.
Everything we are seeing here is customizable – the panels, the indicators, via standard Splunk functionality.
Most of the data on this dashboard is centered on Notable Events. Notable Events are a concept unique to Splunk with ES – there’s an entire Notable Event framework that allows us to perform simple or complex correlations, and then create events by analyzing disparate events from disparate sources.
Notable Events in ES are categorized into various high-level security domains: access, audit, identity, network, and threat. We’ll see those categories throughout the app.
You can see Splunk Sparklines here – these little green lines. These are great for detecting quick trends in the security events – a continuous line means something constant, which could be a heartbeat or a scripted attack. A spike could be a single attack or maybe just someone fat-fingering their password a few times.
We’ll drill into some of these incidents in a few minutes, but let’s continue on with our tour. How does all this data get into Splunk?
Here we have a simple dashboard showing us all sorts of detail about recent malware activity in the environment. Like Security Posture, this is high level information, but more granular about a certain security domain (Malware, which is under Endpoint). We have these “centers” throughout ES for things like Access, Traffic, Intrusions, Updates, Vulnerabilities, and many other security-relevant areas, and you can investigate them later.
For now, let’s drill into two of the “top infections” to see CIM at work. Looking at this dashboard we can’t tell that we actually have at least two different endpoint protection systems feeding data into Splunk: Sophos, Trend Micro, and Symantec Endpoint Protection. Splunk normalizes the data on search time, according to CIM, to create this (and the other) dashboards.
Click on Mal/Packer, and you’ll see that this infection was detected by Sophos. The raw logs are literally a click away:
The main reason why this risk framework is important is that it gets you away from writing specific rules for specific threats or assets. You don’t need 1,000 correlation rules anymore – you simply can elevate risk scores on whatever object you want, based on the behavior you’re seeing in the environment. So the idea here is, a correlation rule fires, and then a risk modifier takes effect and changes the risk score based on cumulative scoring of whatever else has happened to that user, or system, or other object.
On the dashboard, we can define filters to find a particular system or user or timeframe.
Note the natural language descriptions (in the screenshot they are medium and low). We track how your overall risk scoring is doing over time, and constantly re-calculate the baseline. Got a lot of activity going on that isn’t “normal” for that timeframe and you might see things going from “increasing minimally” to “extremely increasing” – all based on what the historical norm is.
We can of course see which objects have the highest risk and which correlation rules are contributing the most to the highest risk.
On the dashboard we can see that we’re using the power of Splunk search to match artifacts in our incoming data against IoC’s we find in our threat feeds. Splunk de-duplicates the threat feeds so that if an artifact shows up in multiple feeds you don’t get duplicate notifications.
We can filter the display by threat_group, which is essentially the source of the IoCs. This could be something commercial like ThreatStream or ThreatConnect or Norse, something open-source like Sans or iblocklist, or something from your ISAC that is delivered over a TAXII feed in STIX format.
The threat collection shows that we can use various IoCs to match up against artifacts in our data – IP addresses, domain names, URLs, filenames, certificate common names and organizations, email addresses, registry keys – as long as it can be defined in your incoming feed or locally, you can use it as an IoC.
You can see the most active threat sources, and if you scroll down, you can see the most recent matches against your threat feeds.
How are these configured? Let’s go to the configuration, and see.
In version 3.1 of Enterprise Security we introduced a full Risk Analysis framework. This is unique because we allow you to assign an arbitrary risk number, that means something to you, based on a notable event. You can assign risk to a user, or to a system, or to some other object that you see in the environment – perhaps a particular piece of malware is considered risky to you so you elevate the risk on the malware “object” itself.
Let’s bring up the Risk Analysis page associated with Advanced Threat:
Rounding out the Threat Intelligence capabilities are the Threat Artifacts browser, which allows us to search through all of the artifacts stored in ES:
We don’t have time to go through each and every one of the advanced threat capabilities in the ES app. However, let’s just see that up here under Advanced Threat we have some very interesting capabilities:
Some of the most useful ones are the Protocol Intelligence that leverages wire data from things like Splunk Stream, Netflow, and Bro. Also the Access Anomalies and User Activity, which are very useful to detect possible insider threat. And the New Domain Analysis, which analyzes traffic patterns and DNS queries to domains, and then tells you if you have devices communicating with recently registered garbage domains (that are often associated with DGA). Again – this is something you can go through on your own time.
Asset Investigator shows us, at the top, all of the things we know about this asset from sources such as CMDBs or Active Directory. It also has multiple “swimlanes” that visually show you what’s been going on with the asset:
We can see Threat List, Exec File, IDS, and Notable Events associated with this asset, most of those happening right around the same time (this was likely the time of infection).
More beaker than bunson
Supervised learning is where you have existing LABELS in the data to help you out.
Example: If you’re training a model for CUSTOMER CHURN, historically you know which customers stayed and which left. You can build a model to correlate historical churn with other features in the data. Then you can PREDICT churn for each customer based on everything they’re doing in real-time and have done in the past.
Unsupervised learning is where you have NO LABELS to help you out. You have to figure out patterns
Example: If you’re trying to do BEHAVIORAL ANALYTICS, you might just have a big confusing pile of IT & Security data to wade through. Unsupervised learning is the art & science of finding PATTERNS, BASELINES and ANOMALIES in the data. Once you understand all this (that’s hard!) you can try to predict possible INCIDENTS and THREATS.
Good ML involves FEEDBACK loops. Best bet is to incorporate INCIDENT RESPONSE data and learn from what analysts have done in the past.
Free app! Toolkit & PSC are both free. Go to ML App link above, and click Documentation. Links for all distros ()
Q: Why standalone SH?
A: Don’t want ML exploration & production to bring down other Splunk workloads
Can use standalone 6.4 SH with older version SH cluster & indexers.
Re: 100% of breaches involve valid credentials:
"Mandiant is the leader in incident response. They are the best of the best. They're brought in to deal with the largest, highest profile, most damaging breaches. When you read a news headline about a large organization being compromised, there is a great chance that Mandiant is working behind the scenes to eradicate the attackers from the environment. In a recent yearly report-when they looked across all the very damaging attacks they responded to-they noticed that valid credentials were used at some point in every single one of them. Why do we care? Well we care because it means that we cannot use simple techniques like counting failed logins to detect an attack. In fact based on this stat, there may not even be any failed logins to count! Instead we need to be much smarter. For example how can we look at 1000 successful logins and determine which of them was the malicious one? We do that through behavior analytics, baseline/outlier, ML, etc."
The goal of the OWASP (Open Web Application Security Project ) Top Ten is to raise awareness about application security by identifying some of the most critical risks facing organizations today. This list is refreshed every three years – we’re currently waiting on the 2016 one. These were the top 10 through 2 for 2013. What is #1?
These are examples of injections flaws that occur when untrusted data is sent to an interpreter as part of a command or query.
The attacker essentially tricks the interpreter into executing unintended commands or bypassing authorization/authentication.
Most of us are familiar with SQL injection attacks, the risks they pose to our environments and maybe even the inner workings of the threat itself, because they have been around for about 10 years. However, the attack types can vary and mitigating them efficiently requires an understanding of the basic types of SQL injection attacks.
SQL injection has been around so long…
…so long, that this is what Google looked like when SQL injection first became a topic of discussion: 1998. With 18 years of ways to prevent SQLi, you would think that this isn’t a problem anymore, right?
Wrong. Check out this survey conducted by Imperva – ostensibly to sell their web application firewalls, but still. This survey was just conducted last year…we can see that SQL injection attempts appear to happen in concentrated bursts, perhaps the result of specific campaigns. Regardless of when they happen, SQL injection attacks remain some of the most damaging to organizations in terms of records compromised and lost value.
Attribution: http://www.imperva.com/docs/HII_Web_Application_Attack_Report_Ed6.pdf
In fact, you could even say that no matter how much technologies change, injection vulnerabilities seem to remain! Last year’s SANS holiday hack challenge prominently featured injection vulnerabilities in server-side JS framework Node.JS and popular nosql database MongoDB. If you like today’s exercise, you will love the Holiday Hack Challenge.
Attribution: https://quest.holidayhackchallenge.com/ and https://holidayhackchallenge.com/
A SQL injection needs just two conditions to exist – a relational database that uses SQL, and a user controllable input which is directly used in an SQL query.
An attacker would get into your organization though unsanitized user input.
Instead of entering a “real” username or password, the hacker will enter a SQL command (or a JSON string in the case of a NoSQL database). As a result, the malicious code, the example SQL query shown here, is then sent to the database.
Cybercriminals use injection attacks against databases to export data, to delete accounts, create bogus accounts and modify data. Injection attacks can even be used to initiate a Denial of Service (DOS) attack.
There have been some notable SQL injection attacks in our recent history that have exfiltrated entire user databases, prompting system-wide password resets, such as…
And so far this year – we’ve had at least 39 publically-disclosed breaches covered in the media that involved SQLi. You can just google “SQL injection hall of shame” to pull this up. By the way – this number was “9” when I did this slide back in April….
Attribution: The Code Curmudgeon Blog http://codecurmudgeon.com/wp/sql-injection-hall-of-shame/
A little bit about our environment
Everything I’m going through up here has been pretty well documented in a word doc. You can use the link here to get that doc, or if you’re really interested in it later come see me. You won’t need it right now though.
Each of you has creds – there are 10 fairly large Amazon EC2 instances that have been provisioned for this exercise and if we’re at capacity there will be 12 of you on each. Now’s a good time to try hitting that URL and logging into Splunk.
The answer is Splunk apps! Apps come with pre-built content, like dashboards, reports, alerts and workflows that anyone can leverage as a starting point or template and adapt them accordingly to their environment. We have a community of 11,000+ Splunk customers and this is one of the 1,000+ apps on SplunkBase (stats as of Feb 2016).
New Splunk users often try to reinvent the wheel, when it turns out it is already there. In this case, a Splunk engineer wrote an app that searches for SQLI, and then published it on SplunkBase. He provides a broader regex that we are going to leverage for our next example.
In that app one of the things provided is a regular expression pre-created for you to find common patterns in your data that could indicate a SQL injection attack. So all we need to do is call the macro when we do the search.
Everything I’m going through up here has been pretty well documented in a word doc. You can use the link here to get that doc, or if you’re really interested in it later come see me. You won’t need it right now though.
Each of you has creds – there are 10 fairly large Amazon EC2 instances that have been provisioned for this exercise and if we’re at capacity there will be 12 of you on each. Now’s a good time to try hitting that URL and logging into Splunk.
If you’d like, you can take a look at the dashboards in the SQL injection app on your own time.
In summary, why does SQL injection attacks continue to be one of the most common? BECAUSE IT STILL WORK!!
Don’t waste your previous cycles to reinvent the wheel, search for an existing app, or even provide one to the community.
While there are great Web Application Firewalls and similar products that will sit inline and detect or even block these attacks, you will still benefit from looking through all of your data for any signs of SQL injection, and Splunk can help.
If you have a specific question or use case that you want to run by us, come find us at the Splunk for Security table outside of Salon 9.
We’re going to do some hands-on here, but before we do, let me set the stage a bit. How many of you have had your hands on Splunk before? OK – for the purposes of this discussion, you all are ninjas.
For the newbies – or for anyone who wants a little more help – the app we’re going to use this afternoon is very self-guided. You can literally just read along and see the search examples without having to type any splunk searches.
This is what that looks like – the search is shown, and the results below.
You will notice that in the app there’s this “step by step” button. Don’t press it – what it does is it allows you to step through the search line by line – great for learning, but not great for this shared environment where you’re all hitting the same servers.
This is what that step by step looks like. You can use it later on your own time.
Now if you are a ninja you can simply cut and paste the searches into the Search bar within the app – that’s what I will do up here.
So what I will be doing is mousing over the search…