SlideShare a Scribd company logo
1 of 41
Extending Splunk with
Machine-learning Predictive Analytics

Rich Collier
Solutions Architect
rich@prelert.com
Why Machine Learning?
• Overcome limitations of human analysis
• Auto-learn baseline behavior using proper
modeling
• Detect anomalous behavior
Why Machine Learning?
• Overcome limitations of human analysis
• Auto-learn baseline behavior using proper
modeling
• Detect anomalous behavior
Overcoming limitations of
Human Analysis
• Judging what’s “normal” is not always easy

• Humans don’t always choose the right
techniques
IPTables (firewall)
• How to find most anomalous users (aggressive
brute force attackers)?
• Here is a typical (manual) process
Step 1) Search

Questions:
What’s normal?
What about that
spike?
Probably should try to visualize counts by SRC over time…
Step 2) stats
command, sort by
count

Question: How to
show as a function
of time, not just
overall?
Step 3) add
bucketing for
breakdown by time

Question: What is
an anomalous
count per bucket?
100? 1000?
10,000? Maybe we should try to use some more stats?
Step 4) add some
“basic” statistical
analysis:
avg +/- 2
Question: How to
show the individual
“outliers” (and not
lose the concept of
time)?
Step 5) use
eventstats to repair
time problem and
add “where” clause
to only show those
outside of +/-2
Question: Are these
161 results accurate?
(I hope you didn’t build an
alert and get 161 of them!)
Problem: Statistical modeling is
INCORRECT for this data
– (-75) events doesn’t make
sense for avg - 2
– how much confidence do
you have in avg + 2 ?
Result:
• Wrong model= false
positives/negatives
The Problem: +/-2
assumes data is
Gaussian (Bell Curve)
Clearly, this data is
better fit by a
Poisson curve
Examples of Non-Gaussian Data
status=503
Memory Utilization

CPU load

status=404
Revenue Transactions
One More Problem…
• Even if the demonstrated technique was
accurate:
– Still need to persist what you’ve learned “so far”
so that you don’t have to keep re-inspecting
historical data as new data comes in
– This requires you to manually write/read
information into a summary index
Why Machine Learning?
• Overcome limitations of human analysis
• Auto-learn baseline behavior using proper
modeling
• Detect anomalous behavior
First, an Analogy
• How could I accurately predict how much
Postal-mail you are likely to get delivered to
your home tomorrow?
I Would…
• Watch your mail delivery for a while
– 1 day?
– 1 week?
– 1 month?
– 1 year?

• Use my observations to create a…
Average?
Std. Deviation?
Probability Distribution Function?
A Probability Distribution Function!
% likelihood (probability)

Best for my house

pieces of mail per day
A Probability Distribution Function!
% likelihood (probability)

College Student?

pieces of mail per day
% likelihood (probability)

A Probability Distribution Function!

My Mom

pieces of mail per day
Using Machine Learning
to build a Probability Distribution Function
• PDF must be built specifically for each
“instance”
• PDF should be constructed automatically
merely by watching the data
Using Machine Learning
to build a Probability Distribution Function

23
Now what?
Why Machine Learning?
• Overcome limitations of human analysis
• Auto-learn baseline behavior using proper
modeling
• Detect anomalous behavior
Finding “what’s unexpected”…
Your job is often looking for unexpected change in your
environment, either proactively through monitoring or
reactively through diagnostics/troubleshooting
% likelihood (probability)

Using the PDF to Find
What is Unexpected
zero pieces
of mail?
fifteen
pieces of
mail?

pieces of mail per day
Relate back to data in Splunk
• # Pieces of mail = # events of a certain type
– number of failed logins
– number of errors of different types
– number of events with certain status codes
– etc.

• Or, performance metrics
– response time
– utilization %
Back to our Example!
• Prelert Anomaly Detective
– Automatically, and correctly
models data via self-learning
– Applies sophisticated
Bayesian techniques
– Persists “on-going” analysis
to allow real-time alerting
– Makes it easy to use
3 significant alerts, not 161!
• Results are:
– Accurate outliers
– Automatically clustered
and scored by their
probabilistic “unlikelihood”
– Relevant in time, easy to
make alerts
– Clickable for drill-down
• Drill-downs:
– Automatically constructs
useful search syntax and
time selection
– Shows anomalies in
context of the original data
– Serve as a possible
jumping-off point for
subsequent manual mining
Automated Anomaly Detection

• Less time searching & troubleshooting
• Proactive trustworthy alerts without
thresholds
• Auto-discovers the previously unknown
Automated Anomaly Detection for
splunk>

Additional
Use Cases
Use Case
• Data sources:
– App logs
– Network performance
– SQL-Server metrics

• Prelert identifies
network discards that
cause app to
disconnect from DB

Correlating Anomalies
Across Data Types
Use Case
• Data source: Netstat
• Prelert finds a rare FTP
connection from a
server that doesn’t
normally use FTP

Servers making
unusual TCP connections
Use Case
• Data source: Custom
logs
• Prelert identifies unusual
$0.60 transaction –
traced to bug in currency
conversion

Revenue
Transactions
Use Case
• Data source:
BlueCoat proxy
• Prelert identifies
users abusing
Internet privileges
gambling sites

porn sites

Clients pervasively
visiting rare URLs
Use Case
• Response time of
online bank website
• Prelert alerts on
spikes without the
need to create a
single threshold

Monitoring Performance
w/o Thresholds
Use Case
• Data source: BlueCoat
proxy
• Prelert identifies client
attempting to exploit an
outside IIS webserver

Unusual outbound
traffic rates
Automated Anomaly Detection
for splunk>

More Related Content

What's hot

Anomaly Detection Using the CLA
Anomaly Detection Using the CLAAnomaly Detection Using the CLA
Anomaly Detection Using the CLA
Numenta
 

What's hot (20)

Numenta Anomaly Benchmark - SF Data Science Meetup
Numenta Anomaly Benchmark - SF Data Science Meetup Numenta Anomaly Benchmark - SF Data Science Meetup
Numenta Anomaly Benchmark - SF Data Science Meetup
 
Detecting Anomalies in Streaming Data
Detecting Anomalies in Streaming DataDetecting Anomalies in Streaming Data
Detecting Anomalies in Streaming Data
 
Anomaly Detection Using the CLA
Anomaly Detection Using the CLAAnomaly Detection Using the CLA
Anomaly Detection Using the CLA
 
Data pipelines and anomaly detection
Data pipelines and anomaly detectionData pipelines and anomaly detection
Data pipelines and anomaly detection
 
Predictive Analytics with Numenta Machine Intelligence
Predictive Analytics with Numenta Machine IntelligencePredictive Analytics with Numenta Machine Intelligence
Predictive Analytics with Numenta Machine Intelligence
 
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
A Fast Decision Rule Engine for Anomaly Detection
A Fast Decision Rule Engine for Anomaly DetectionA Fast Decision Rule Engine for Anomaly Detection
A Fast Decision Rule Engine for Anomaly Detection
 
Science of Anomaly Detection
Science of Anomaly Detection Science of Anomaly Detection
Science of Anomaly Detection
 
Data Science unit 2 By: Professor Lili Saghafi
Data Science unit 2 By: Professor Lili SaghafiData Science unit 2 By: Professor Lili Saghafi
Data Science unit 2 By: Professor Lili Saghafi
 
Streaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same GameStreaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same Game
 
Putting the Magic in Data Science
Putting the Magic in Data SciencePutting the Magic in Data Science
Putting the Magic in Data Science
 
Ml masterclass
Ml masterclassMl masterclass
Ml masterclass
 
Vissec2014
Vissec2014Vissec2014
Vissec2014
 
Azure machine learning
Azure machine learningAzure machine learning
Azure machine learning
 
Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark Landry
 
Modern Machine Learning Infrastructure and Practices
Modern Machine Learning Infrastructure and PracticesModern Machine Learning Infrastructure and Practices
Modern Machine Learning Infrastructure and Practices
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 

Viewers also liked

Microsoft DAT203.2x - Principles of Machine Learning
Microsoft DAT203.2x - Principles of Machine LearningMicrosoft DAT203.2x - Principles of Machine Learning
Microsoft DAT203.2x - Principles of Machine Learning
Ralph Marion Victa
 
Splunk Dynamic lookup
Splunk Dynamic lookupSplunk Dynamic lookup
Splunk Dynamic lookup
Splunk
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Lior Rokach
 

Viewers also liked (10)

Real World Machine Learning at Orbitz, Strata 2011
Real World Machine Learning at Orbitz, Strata 2011Real World Machine Learning at Orbitz, Strata 2011
Real World Machine Learning at Orbitz, Strata 2011
 
Maxymizely - On-page Conversion Rate Optiimization via A/B testing and Machin...
Maxymizely - On-page Conversion Rate Optiimization via A/B testing and Machin...Maxymizely - On-page Conversion Rate Optiimization via A/B testing and Machin...
Maxymizely - On-page Conversion Rate Optiimization via A/B testing and Machin...
 
Microsoft DAT203.2x - Principles of Machine Learning
Microsoft DAT203.2x - Principles of Machine LearningMicrosoft DAT203.2x - Principles of Machine Learning
Microsoft DAT203.2x - Principles of Machine Learning
 
Splunk Dynamic lookup
Splunk Dynamic lookupSplunk Dynamic lookup
Splunk Dynamic lookup
 
All The Ways Your Workforce Will Benefit From Facilities Management Software
All The Ways Your Workforce Will Benefit From Facilities Management SoftwareAll The Ways Your Workforce Will Benefit From Facilities Management Software
All The Ways Your Workforce Will Benefit From Facilities Management Software
 
SplunkLive! Warsaw 2016 - Machine Learning
SplunkLive! Warsaw 2016 - Machine LearningSplunkLive! Warsaw 2016 - Machine Learning
SplunkLive! Warsaw 2016 - Machine Learning
 
Pragmatic machine learning for the real world
Pragmatic machine learning for the real worldPragmatic machine learning for the real world
Pragmatic machine learning for the real world
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine Learning
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 

Similar to SplunkLive! Prelert Session - Extending Splunk with Machine Learning

ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
Srinath Perera
 

Similar to SplunkLive! Prelert Session - Extending Splunk with Machine Learning (20)

Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00tDefcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
 
Machine Learning SPPU Unit 1
Machine Learning SPPU Unit 1Machine Learning SPPU Unit 1
Machine Learning SPPU Unit 1
 
Machine learning basics by akanksha bali
Machine learning basics by akanksha baliMachine learning basics by akanksha bali
Machine learning basics by akanksha bali
 
Machine learning basics
Machine learning basics Machine learning basics
Machine learning basics
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Deep learning introduction
Deep learning introductionDeep learning introduction
Deep learning introduction
 
Nondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsNondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of Us
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptx
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
 
Simplify Your Life with CQRS
Simplify Your Life with CQRSSimplify Your Life with CQRS
Simplify Your Life with CQRS
 
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
 
Nicola Pagni - Anomaly Detection in Elasticsearch
Nicola Pagni - Anomaly Detection in ElasticsearchNicola Pagni - Anomaly Detection in Elasticsearch
Nicola Pagni - Anomaly Detection in Elasticsearch
 
Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)
 
Rise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetupRise of the machines -- Owasp israel -- June 2014 meetup
Rise of the machines -- Owasp israel -- June 2014 meetup
 
BIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNINGBIG DATA AND MACHINE LEARNING
BIG DATA AND MACHINE LEARNING
 
Machinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdfMachinr Learning and artificial_Lect1.pdf
Machinr Learning and artificial_Lect1.pdf
 
Essential concepts for machine learning
Essential concepts for machine learning Essential concepts for machine learning
Essential concepts for machine learning
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Artificial Intelligence - Overview
Artificial Intelligence - OverviewArtificial Intelligence - Overview
Artificial Intelligence - Overview
 

More from Splunk

More from Splunk (20)

.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine.conf Go 2023 - Data analysis as a routine
.conf Go 2023 - Data analysis as a routine
 
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
 
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica).conf Go 2023 - Navegando la normativa SOX (Telefónica)
.conf Go 2023 - Navegando la normativa SOX (Telefónica)
 
.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank International.conf Go 2023 - Raiffeisen Bank International
.conf Go 2023 - Raiffeisen Bank International
 
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett .conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
.conf Go 2023 - På liv og død Om sikkerhetsarbeid i Norsk helsenett
 
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär).conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
.conf Go 2023 - Many roads lead to Rome - this was our journey (Julius Bär)
 
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu....conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
.conf Go 2023 - Das passende Rezept für die digitale (Security) Revolution zu...
 
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever....conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
.conf go 2023 - Cyber Resilienz – Herausforderungen und Ansatz für Energiever...
 
.conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex).conf go 2023 - De NOC a CSIRT (Cellnex)
.conf go 2023 - De NOC a CSIRT (Cellnex)
 
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
conf go 2023 - El camino hacia la ciberseguridad (ABANCA)
 
Splunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11ySplunk - BMW connects business and IT with data driven operations SRE and O11y
Splunk - BMW connects business and IT with data driven operations SRE and O11y
 
Splunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go KölnSplunk x Freenet - .conf Go Köln
Splunk x Freenet - .conf Go Köln
 
Splunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go KölnSplunk Security Session - .conf Go Köln
Splunk Security Session - .conf Go Köln
 
Data foundations building success, at city scale – Imperial College London
 Data foundations building success, at city scale – Imperial College London Data foundations building success, at city scale – Imperial College London
Data foundations building success, at city scale – Imperial College London
 
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
Splunk: How Vodafone established Operational Analytics in a Hybrid Environmen...
 
SOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security WebinarSOC, Amore Mio! | Security Webinar
SOC, Amore Mio! | Security Webinar
 
.conf Go 2022 - Observability Session
.conf Go 2022 - Observability Session.conf Go 2022 - Observability Session
.conf Go 2022 - Observability Session
 
.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - Keynote.conf Go Zurich 2022 - Keynote
.conf Go Zurich 2022 - Keynote
 
.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session.conf Go Zurich 2022 - Platform Session
.conf Go Zurich 2022 - Platform Session
 
.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security Session.conf Go Zurich 2022 - Security Session
.conf Go Zurich 2022 - Security Session
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

SplunkLive! Prelert Session - Extending Splunk with Machine Learning

  • 1. Extending Splunk with Machine-learning Predictive Analytics Rich Collier Solutions Architect rich@prelert.com
  • 2. Why Machine Learning? • Overcome limitations of human analysis • Auto-learn baseline behavior using proper modeling • Detect anomalous behavior
  • 3. Why Machine Learning? • Overcome limitations of human analysis • Auto-learn baseline behavior using proper modeling • Detect anomalous behavior
  • 4. Overcoming limitations of Human Analysis • Judging what’s “normal” is not always easy • Humans don’t always choose the right techniques
  • 5. IPTables (firewall) • How to find most anomalous users (aggressive brute force attackers)? • Here is a typical (manual) process
  • 6. Step 1) Search Questions: What’s normal? What about that spike? Probably should try to visualize counts by SRC over time…
  • 7. Step 2) stats command, sort by count Question: How to show as a function of time, not just overall?
  • 8. Step 3) add bucketing for breakdown by time Question: What is an anomalous count per bucket? 100? 1000? 10,000? Maybe we should try to use some more stats?
  • 9. Step 4) add some “basic” statistical analysis: avg +/- 2 Question: How to show the individual “outliers” (and not lose the concept of time)?
  • 10. Step 5) use eventstats to repair time problem and add “where” clause to only show those outside of +/-2 Question: Are these 161 results accurate? (I hope you didn’t build an alert and get 161 of them!)
  • 11. Problem: Statistical modeling is INCORRECT for this data – (-75) events doesn’t make sense for avg - 2 – how much confidence do you have in avg + 2 ? Result: • Wrong model= false positives/negatives
  • 12. The Problem: +/-2 assumes data is Gaussian (Bell Curve) Clearly, this data is better fit by a Poisson curve
  • 13. Examples of Non-Gaussian Data status=503 Memory Utilization CPU load status=404 Revenue Transactions
  • 14. One More Problem… • Even if the demonstrated technique was accurate: – Still need to persist what you’ve learned “so far” so that you don’t have to keep re-inspecting historical data as new data comes in – This requires you to manually write/read information into a summary index
  • 15. Why Machine Learning? • Overcome limitations of human analysis • Auto-learn baseline behavior using proper modeling • Detect anomalous behavior
  • 16. First, an Analogy • How could I accurately predict how much Postal-mail you are likely to get delivered to your home tomorrow?
  • 17. I Would… • Watch your mail delivery for a while – 1 day? – 1 week? – 1 month? – 1 year? • Use my observations to create a…
  • 19. A Probability Distribution Function! % likelihood (probability) Best for my house pieces of mail per day
  • 20. A Probability Distribution Function! % likelihood (probability) College Student? pieces of mail per day
  • 21. % likelihood (probability) A Probability Distribution Function! My Mom pieces of mail per day
  • 22. Using Machine Learning to build a Probability Distribution Function • PDF must be built specifically for each “instance” • PDF should be constructed automatically merely by watching the data
  • 23. Using Machine Learning to build a Probability Distribution Function 23
  • 25. Why Machine Learning? • Overcome limitations of human analysis • Auto-learn baseline behavior using proper modeling • Detect anomalous behavior
  • 26. Finding “what’s unexpected”… Your job is often looking for unexpected change in your environment, either proactively through monitoring or reactively through diagnostics/troubleshooting
  • 27. % likelihood (probability) Using the PDF to Find What is Unexpected zero pieces of mail? fifteen pieces of mail? pieces of mail per day
  • 28. Relate back to data in Splunk • # Pieces of mail = # events of a certain type – number of failed logins – number of errors of different types – number of events with certain status codes – etc. • Or, performance metrics – response time – utilization %
  • 29. Back to our Example!
  • 30. • Prelert Anomaly Detective – Automatically, and correctly models data via self-learning – Applies sophisticated Bayesian techniques – Persists “on-going” analysis to allow real-time alerting – Makes it easy to use 3 significant alerts, not 161!
  • 31. • Results are: – Accurate outliers – Automatically clustered and scored by their probabilistic “unlikelihood” – Relevant in time, easy to make alerts – Clickable for drill-down
  • 32. • Drill-downs: – Automatically constructs useful search syntax and time selection – Shows anomalies in context of the original data – Serve as a possible jumping-off point for subsequent manual mining
  • 33. Automated Anomaly Detection • Less time searching & troubleshooting • Proactive trustworthy alerts without thresholds • Auto-discovers the previously unknown
  • 34. Automated Anomaly Detection for splunk> Additional Use Cases
  • 35. Use Case • Data sources: – App logs – Network performance – SQL-Server metrics • Prelert identifies network discards that cause app to disconnect from DB Correlating Anomalies Across Data Types
  • 36. Use Case • Data source: Netstat • Prelert finds a rare FTP connection from a server that doesn’t normally use FTP Servers making unusual TCP connections
  • 37. Use Case • Data source: Custom logs • Prelert identifies unusual $0.60 transaction – traced to bug in currency conversion Revenue Transactions
  • 38. Use Case • Data source: BlueCoat proxy • Prelert identifies users abusing Internet privileges gambling sites porn sites Clients pervasively visiting rare URLs
  • 39. Use Case • Response time of online bank website • Prelert alerts on spikes without the need to create a single threshold Monitoring Performance w/o Thresholds
  • 40. Use Case • Data source: BlueCoat proxy • Prelert identifies client attempting to exploit an outside IIS webserver Unusual outbound traffic rates

Editor's Notes

  1. [no audio here]
  2. Probability of data comes in all shapes and sizes – rarely does it fit a nice bell curve