SlideShare a Scribd company logo
1 of 35
Download to read offline
1© 2010 Cisco and/or its affiliates. All rights reserved.
Detecting Hacks:
Anomaly Detection on
Networking Data
James Sirota (@JamesSirota)
Lead Data Scientist – Managed Threat Defense
Chester Parrott (@ParrottSquawk)
Data Scientist – Managed Threat Defense
June 2015
© 2015 Cisco and/or its affiliates. All rights reserved. 2
In the next few minutes…
•  Defense in Depth for Big Data
•  Network Anomaly Detection Overview
•  Volume Anomaly Detection
•  Feature Anomaly Detection
•  Model Architecture
•  Deployment on OpenSOC Platform
•  Questions
© 2015 Cisco and/or its affiliates. All rights reserved. 3
Who are we?
Big Data
Security
Analytics
Open Source
Managed Service
© 2015 Cisco and/or its affiliates. All rights reserved. 4
The New Defense-In-Depth
Defense
Strategy
Static
Sandboxing
Threat Intel
Feeds
Rules
Engines
Volume-
Based
Feature-
Based
NLP-Based
Token
Clustering
User
Profiling
Asset
Profiling
Interaction
Profiling
Dynamic
Sandboxing
Malware
Classifiers
Script
Classifiers
Perimeter
Monitoring
Web
Scraping
Soc. Media
Analytics
Model
Validators
Training Set
Generation
Signature
Matching
Rules-
Based
Matching
Network
Anomaly
Detection
Log
Anomaly
Detection
Behavioral
Anomaly
Detection
Malware
Family
Script
Family
Scraping Honeypots
Misuse
Detection
Intrusion
Detection
Supervised
Class.
Look-
Ahead
Analytics
Legacy Mindset
Generic Threats Targeted Threats Future Threats
© 2015 Cisco and/or its affiliates. All rights reserved. 5
Network Anomaly Detection
Network
Anomaly
Detection
Volume-
Based
Feature-
Based
Statistical
Process
Control
Frequency
Domain
Time series
Forecasting
Information
Theory
Principal
Component
Analysis
Sketch-
Based
3-sigma
algorithms
Exponential
Smoothing
ARIMA
Fast Fourier
Transform
Wavelets
Entropy Subspace
Heavy
Hitters
Set
Cardinality
Probability
Models
Markov
Models
Bayes Nets
Unsupervis
ed ML
Clustering
Density
Proximity
Anomalous
Traffic Patterns
Interrelationships between
Features
© 2015 Cisco and/or its affiliates. All rights reserved. 6
Volume-Based vs. Feature Based
Telemetry Volume-Based Feature-Based
Encrypted Traffic (Raw Packet) YES NO
Raw Packet + Header Metadata YES YES
Machine Exhaust Data YES (online) NO
DPI Metadata NO YES
Netflow YES YES
Enrichment Metadata YES YES
Application Logs YES YES
Other Alerts NO* YES
© 2015 Cisco and/or its affiliates. All rights reserved. 7
Anomaly Detection: 3-Phase Process
Unstructured Data
Identify
Anomaly
Classify
Alert
Examine +
Reinforce
Training Set
Historical
Context
© 2015 Cisco and/or its affiliates. All rights reserved. 8
Phase 1: Identify
Unstructured Data
Understandingof
Normal
Anomaly A
Anomaly B
Anomaly C
Anomaly (N)
© 2015 Cisco and/or its affiliates. All rights reserved. 9
Phase 2: Classify
Full Packet Telemetry DPI Telemetry Telemetry (N) Outcome
Volume
Anomaly
Entropy
Anomaly
Feature (x)
Heavy Hitters
Anomaly
Volume
Anomaly
Cardinality
Anomaly
Feature (x)
Protocol
Anomaly
Featur(x)
Anomaly (A) Anomaly (B) Anomaly (N) Class Label
x x x x x x x Port Scan
x x x x x False Positive
x x x x Network Scan
x x x x Port Scan
x x x x False Positive
x x x x x x DDoS
© 2015 Cisco and/or its affiliates. All rights reserved. 10
Phase 3: Examine + Reinforce
Full Packet Telemetry DPI Telemetry Telemetry (N) Outcome
Volume
Anomaly
Entropy
Anomaly
Feature (x)
Heavy Hitters
Anomaly
Volume
Anomaly
Cardinality
Anomaly
Feature (x)
Protocol
Anomaly
Featur(x)
Anomaly (A) Anomaly (B) Anomaly (N) Class Label
x x x x x x x Port Scan
x x x x x False Positive
x x x x Network Scan
x x x x False Positive
x x x x x x DDoS
x x x x x x False Positive
x x x x x x False Positive
x x x x False Positive
x x x x x x DDoS
© 2015 Cisco and/or its affiliates. All rights reserved. 11
Basic Anomalies
Anomaly 	
   Definition	
  
Alpha Flows Large volume point-to-point flows
DoS Denial of service (distributed or single source)
Flash Crowd Large volume of traffic to a single destination from a large number of sources
Port Scan Probe to many destination ports on a small number of destination addresses
Network Scan Probe to many destination addresses on a small number of destination ports
Outage Events Traffic shifts because of equipment failures or maintenance
Plateau Behavior Behavior caused by traffic reaching environmental limits
Point-to-Multipoint Traffic from a single source to many destinations, e.g., content distribution
Worms Scanning by worms for vulnerable hosts, which is a special case of network scan
© 2015 Cisco and/or its affiliates. All rights reserved. 12
Batch Analytics
Normalcy Models
© 2015 Cisco and/or its affiliates. All rights reserved. 13
Implementation
MAP MAP MAP
Time Series DB
Key: assetID-metricID-Bin
RED RED RED
Asset Bin Value
Server 1 15 5pt *
Server 2 15 5pt *
Server (N) 15 5pt *
assetID-metricID-Bin : 5pt
Telemetry
Anomaly?
* 5-point summary (5pt):
1.  the sample minimum
(smallest observation)
2.  the lower quartile or first
quartile
3.  the median (middle value)
4.  the upper quartile or third
quartile
5.  the sample maximum (largest
observation)
Table Name: Metric ID (Cumulative Volume)
© 2015 Cisco and/or its affiliates. All rights reserved. 14
Batch Analytics
Forecasting Models
Forecast
Forecasting Algorithm
(ARIMA/Holt-Winters, …)
© 2015 Cisco and/or its affiliates. All rights reserved. 15
Implementation
MAP MAP MAP
Time Series DB
Key: assetID-metricID-Bin
RED RED RED
Key: assetID-metricID-Bin:
[Expected | STD]
Telemetry
Anomaly?
Asset Bin Value
Server 1 15 EX |STD
Server 2 15 EX |STD
Server (N) 15 EX |STD
Table Name: Metric ID (Cumulative Volume)
© 2015 Cisco and/or its affiliates. All rights reserved. 16
Time Series DB
Batch Model Deployment
Step 1: Bootstrap: Stream Data
Unstructured Data
OpenSOC
OpenSOC JSON
Step 2: Pre-Compute Expected Values (Batch)
Timestamp
HIVE
Time Series DB MR/SparkMR/SparkMR/Spark
Step 3: Generate Alerts (Online)
Unstructured Data
OpenSOC
Expected Values
Reference Cache
Time Series DB
OpenSOC JSON
Timestamp
HIVE
Alert ES
Expected Values
Reference
Cache
© 2015 Cisco and/or its affiliates. All rights reserved. 17
Online Analytics
Data Preparation
Deseasonalizer
AV CMA RAT UF RF DV
© 2015 Cisco and/or its affiliates. All rights reserved. 18
Online Analytics
Other things to check for
Trend:
Seasonal Variability:
Evolution of
Regularities:
© 2015 Cisco and/or its affiliates. All rights reserved. 19
Online Processing
3-Sigma Algorithms
Micro Forecasting
Histogram Bins
© 2015 Cisco and/or its affiliates. All rights reserved. 20
Frequency Domain
High
•  Trendless
•  Noise
•  Spikes represent
Anomalies
Medium
•  Flatter
•  Finer-grained
Trends
Low
•  Seasonal &
‘Peaky’
•  Weekly/Daily
Trends
© 2015 Cisco and/or its affiliates. All rights reserved. 21
Frequency Domain – Wavelet Separation
© 2015 Cisco and/or its affiliates. All rights reserved. 22
Online Model Deployment
Time Series DB
Step 1: Bootstrap: Stream Data
Unstructured Data
OpenSOC
OpenSOC JSON
Step 2: Generate Adjuster
Timestamp
HIVE
Time Series DB
MR/Spark
Adjuster / Decomposer
Step 3: Generate Alerts (Online)
Unstructured Data
OpenSOC
Time Series DB
OpenSOC JSON
Timestamp
HIVE
Alert ES
Adjuster
Decomposer
MR/Spark
MR/Spark
© 2015 Cisco and/or its affiliates. All rights reserved. 23
Feature-Based Anomaly Detection
Continuous Numeric Features*
•  Continuous Numeric Feature - can take on any value between its minimum value and its maximum value
•  Normalization - adjusting values measured on different scales to a notionally common scale
1.  Proximity Based Techniques
Example: K-Nearest Neighbors (KNN)
2. Clustering
Example: K-Means
3. Density - Based
MPS
Anomaly
KBps
Anomaly
Possible Explanation
TOO HIGH TOO LOW Port Scan
Network Scan
TOO HIGH TOO HIGH DDoS
TOO LOW TOO HIGH Control Traffic Anomaly
OK OK No Anomaly
Sample Anomalies Detected
© 2015 Cisco and/or its affiliates. All rights reserved. 24
Feature-Based Anomaly Detection
Categorical Features *
•  Categorical Features - can take on one of a limited, and usually fixed, number of possible values
•  Stream Sketch - algorithm produces an approximate answer based on a summary of the data stream in memory
Example: Protocol {UDP|FTP|HTTP|…}, GEO-MET {PHOENIX | DALLAS | LONDON| …}, …
Count-Min (CM) Sketch : number of occurrences of the element in a stream (Heavy Hitters)
Why not count? Protocol: 42k elements per asset. GeoMet: 246k per asset
Time Series DBCategorical Data
CM
Sketch Heavy Hitters
Asset Bin Value
Server 1 15 HH
Server 2 15 HH
Server (N) 15 HH
MR
Table Name: Protocol
Unstructured Data
CM
Sketch Alert
Expected: {HTTP, UDP, FTP, DNS}
ACTUAL: {DNS, ICMP, HTP, FTP}
© 2015 Cisco and/or its affiliates. All rights reserved. 25
Feature-Based Anomaly Detection
Feature Ratios
HyperLogLog: approximating the number of distinct elements in a multiset
Useful Ratio: # distinct elements / total elements [0-1]
•  Digest- structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means
Unstructured Data
Hyper
LogLog
Distinct
Src_port
Dst_port
Src_ip
Dst_ip
Storm
Bolt
Src_port
Dst_port
Src_ip
Dst_ip
Ack Total
Ratios
Digest *
Alert
FEATURE DT RATIO
Anomaly
Possible Reason
SRC_IP ~1/~0 Flash Crowd/DDoS
SRC_PORT ~1/~0 Failure Probing/App Hijack
DST_IP ~1/~0 Network Scan/DDoS
DST_PORT ~1/~0 Port Scan/Footprinting
© 2015 Cisco and/or its affiliates. All rights reserved. 26
Feature-Based Anomaly Detection
Correlation - Information Theory
•  Information Theory - study of fundamental limits on signal processing, compression, and storage
•  Entropy- a measure of unpredictability of information content
Unstructured Data
Anomaly-Free
Training Set
Entropy
Summarizer
Entropy
Src_port
Dst_port
Src_ip
Dst_ipTime Bin (n)
SRC_I
P
SRC_POR
T
DST_I
P
DST_PORT
SRC_IP - .95 .85 .75
SRC_PORT - .97 .76
DST_IP - - - .98
DST_PORT - - - -
MR
Alert
Time Bin (n)
© 2015 Cisco and/or its affiliates. All rights reserved. 27
Principal Component Analysis (PCA)
Analysis
Component
Principal •  Feature Selection Algorithm
•  Dimensionality Reduction
•  E.g. 4 features
•  ServerA (A)
•  ServerB (B)
•  ServerC (C)
•  Cumulative = A + B + C
© 2015 Cisco and/or its affiliates. All rights reserved. 28
PCA – Component Construction
ServerA
Traffic
X
-0.5052803
ServerB
Traffic
X
-0.4990556
ServerC
Traffic
X
-0.4816276
Cumulative
X
-0.5134882
PC1
σ: 0.0135
ServerA
Traffic
X
0.2801275
ServerB
Traffic
X
0.4611079
ServerC
Traffic
X
-0.8395562
Cumulative
X
0.0636666
PC2
σ: 0.5773
ServerA
Traffic
X
0.6867089
ServerB
Traffic
X
-0.6988557
ServerC
Traffic
X
-0.1441834
Cumulative
X
0.138718
PC3
σ: 0.5773
ServerA
Traffic
X
-0.4411929
ServerB
Traffic
X
-0.2234362
ServerC
Traffic
X
-0.2058916
Cumulative
X
0.8444132
PC4
σ: 0.5773
© 2015 Cisco and/or its affiliates. All rights reserved. 29
PCA – Component Separation
© 2015 Cisco and/or its affiliates. All rights reserved. 30
PCA – Component Separation
© 2015 Cisco and/or its affiliates. All rights reserved. 31
Putting it All Together: OpenSOC
RAW Transform Enrich Alert
(Rules-Based)
Enriched
Filter Aggregators
Router Model 1 Scorer
HIVE + Hbase
Long-Term Data Store
Flume Kafka Storm
Model 2
Model n
OpenSOC-Streaming
OpenSOC-Aggregation
OpenSOC-ML
SOC Alert Consumers
UIUIUI
UIUIWeb
Services
Secure Gateway
Services
External Alert
Consumers
Big Data Stores
Elastic Search
Real-Time Index and
Search
Hbase
OpenTSDB
Titan Graph
Alerts
ES/HIVE
Alerts Store
Remedy
Ticketing System
© 2015 Cisco and/or its affiliates. All rights reserved. 32
We are hiring…
•  Data Scientists (Security)
•  Aspiring Data Scientists
•  Security/Networking Experience Required
•  Software Engineering Experience Required
•  PhD not required
•  Background in stats or ML not required
•  Security Researchers
*Please contact us via LinkedIn with your profile
© 2015 Cisco and/or its affiliates. All rights reserved. 33
Book idea…
Security Analytics on Hadoop
•  Anomaly Detection
•  Targeted Models
•  Deployment Best Practices
•  Alerts
•  Visualization Techniques
•  Etc…
If interested in contributing please contact James Sirota on LinkedIn
© 2015 Cisco and/or its affiliates. All rights reserved. 34
OpenSOC Resources (@ProjectOpenSOC)
Github Repo
•  https://github.com/OpenSOC/opensoc
Slides
•  http://www.slideshare.net/JamesSirota
•  https://speakerdeck.com/jsirota
Corporate Blogs
•  http://blogs.cisco.com/author/jamessirota
•  http://blogs.cisco.com/security/opensoc-an-open-commitment-to-security
Contributor Blogs
•  https://medium.com/@jamessirota
•  parrottsquawk.com
Thank you.

More Related Content

What's hot

A Practical Guide to Anomaly Detection for DevOps
A Practical Guide to Anomaly Detection for DevOpsA Practical Guide to Anomaly Detection for DevOps
A Practical Guide to Anomaly Detection for DevOpsBigPanda
 
TIG / Infocyte: Proactive Cybersecurity for State and Local Government
TIG / Infocyte: Proactive Cybersecurity for State and Local GovernmentTIG / Infocyte: Proactive Cybersecurity for State and Local Government
TIG / Infocyte: Proactive Cybersecurity for State and Local GovernmentInfocyte
 
How MITRE ATT&CK helps security operations
How MITRE ATT&CK helps security operationsHow MITRE ATT&CK helps security operations
How MITRE ATT&CK helps security operationsSergey Soldatov
 
New Threats, New Approaches in Modern Data Centers
New Threats, New Approaches in Modern Data CentersNew Threats, New Approaches in Modern Data Centers
New Threats, New Approaches in Modern Data CentersIben Rodriguez
 
Owasp Proactive Controls for Web developer
Owasp  Proactive Controls for Web developerOwasp  Proactive Controls for Web developer
Owasp Proactive Controls for Web developerSameer Paradia
 
Novetta Cyber Analytics Product Brochure Final_Web_4.20.2015
Novetta Cyber Analytics Product Brochure Final_Web_4.20.2015Novetta Cyber Analytics Product Brochure Final_Web_4.20.2015
Novetta Cyber Analytics Product Brochure Final_Web_4.20.2015Scott Van Valkenburgh
 
Slide Deck – Session 11 – FRSecure CISSP Mentor Program 2017
Slide Deck – Session 11 – FRSecure CISSP Mentor Program 2017Slide Deck – Session 11 – FRSecure CISSP Mentor Program 2017
Slide Deck – Session 11 – FRSecure CISSP Mentor Program 2017FRSecure
 
Strata 2015 Presentation -- Detecting Lateral Movement
Strata 2015 Presentation -- Detecting Lateral Movement Strata 2015 Presentation -- Detecting Lateral Movement
Strata 2015 Presentation -- Detecting Lateral Movement Ram Shankar Siva Kumar
 
A review of machine learning based anomaly detection
A review of machine learning based anomaly detectionA review of machine learning based anomaly detection
A review of machine learning based anomaly detectionMohamed Elfadly
 
Slide Deck CISSP Class Session 7
Slide Deck CISSP Class Session 7Slide Deck CISSP Class Session 7
Slide Deck CISSP Class Session 7FRSecure
 
Juniper competitive cheatsheet
Juniper competitive cheatsheetJuniper competitive cheatsheet
Juniper competitive cheatsheetUsman Arif
 
Fundamentals of-information-security
Fundamentals of-information-security Fundamentals of-information-security
Fundamentals of-information-security madunix
 
The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?Raffael Marty
 
The Golden Rules - Detecting more with RSA Security Analytics
The Golden Rules  - Detecting more with RSA Security AnalyticsThe Golden Rules  - Detecting more with RSA Security Analytics
The Golden Rules - Detecting more with RSA Security AnalyticsDemetrio Milea
 
The Finest Penetration Testing Framework for Software-Defined Networks
The Finest Penetration Testing Framework for Software-Defined NetworksThe Finest Penetration Testing Framework for Software-Defined Networks
The Finest Penetration Testing Framework for Software-Defined NetworksPriyanka Aash
 
AI & ML in Cyber Security - Why Algorithms are Dangerous
AI & ML in Cyber Security - Why Algorithms are DangerousAI & ML in Cyber Security - Why Algorithms are Dangerous
AI & ML in Cyber Security - Why Algorithms are DangerousRaffael Marty
 
IRJET- SDN Multi-Controller based Framework to Detect and Mitigate DDoS i...
IRJET-  	  SDN Multi-Controller based Framework to Detect and Mitigate DDoS i...IRJET-  	  SDN Multi-Controller based Framework to Detect and Mitigate DDoS i...
IRJET- SDN Multi-Controller based Framework to Detect and Mitigate DDoS i...IRJET Journal
 
Pivotal Data Lake Architecture & its role in security analytics
Pivotal Data Lake Architecture & its role in security analyticsPivotal Data Lake Architecture & its role in security analytics
Pivotal Data Lake Architecture & its role in security analyticsEMC
 

What's hot (20)

A Practical Guide to Anomaly Detection for DevOps
A Practical Guide to Anomaly Detection for DevOpsA Practical Guide to Anomaly Detection for DevOps
A Practical Guide to Anomaly Detection for DevOps
 
TIG / Infocyte: Proactive Cybersecurity for State and Local Government
TIG / Infocyte: Proactive Cybersecurity for State and Local GovernmentTIG / Infocyte: Proactive Cybersecurity for State and Local Government
TIG / Infocyte: Proactive Cybersecurity for State and Local Government
 
How MITRE ATT&CK helps security operations
How MITRE ATT&CK helps security operationsHow MITRE ATT&CK helps security operations
How MITRE ATT&CK helps security operations
 
New Threats, New Approaches in Modern Data Centers
New Threats, New Approaches in Modern Data CentersNew Threats, New Approaches in Modern Data Centers
New Threats, New Approaches in Modern Data Centers
 
Assessing network security
Assessing network securityAssessing network security
Assessing network security
 
Owasp Proactive Controls for Web developer
Owasp  Proactive Controls for Web developerOwasp  Proactive Controls for Web developer
Owasp Proactive Controls for Web developer
 
Novetta Cyber Analytics Product Brochure Final_Web_4.20.2015
Novetta Cyber Analytics Product Brochure Final_Web_4.20.2015Novetta Cyber Analytics Product Brochure Final_Web_4.20.2015
Novetta Cyber Analytics Product Brochure Final_Web_4.20.2015
 
Slide Deck – Session 11 – FRSecure CISSP Mentor Program 2017
Slide Deck – Session 11 – FRSecure CISSP Mentor Program 2017Slide Deck – Session 11 – FRSecure CISSP Mentor Program 2017
Slide Deck – Session 11 – FRSecure CISSP Mentor Program 2017
 
Strata 2015 Presentation -- Detecting Lateral Movement
Strata 2015 Presentation -- Detecting Lateral Movement Strata 2015 Presentation -- Detecting Lateral Movement
Strata 2015 Presentation -- Detecting Lateral Movement
 
Testbed For Ids
Testbed For IdsTestbed For Ids
Testbed For Ids
 
A review of machine learning based anomaly detection
A review of machine learning based anomaly detectionA review of machine learning based anomaly detection
A review of machine learning based anomaly detection
 
Slide Deck CISSP Class Session 7
Slide Deck CISSP Class Session 7Slide Deck CISSP Class Session 7
Slide Deck CISSP Class Session 7
 
Juniper competitive cheatsheet
Juniper competitive cheatsheetJuniper competitive cheatsheet
Juniper competitive cheatsheet
 
Fundamentals of-information-security
Fundamentals of-information-security Fundamentals of-information-security
Fundamentals of-information-security
 
The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?
 
The Golden Rules - Detecting more with RSA Security Analytics
The Golden Rules  - Detecting more with RSA Security AnalyticsThe Golden Rules  - Detecting more with RSA Security Analytics
The Golden Rules - Detecting more with RSA Security Analytics
 
The Finest Penetration Testing Framework for Software-Defined Networks
The Finest Penetration Testing Framework for Software-Defined NetworksThe Finest Penetration Testing Framework for Software-Defined Networks
The Finest Penetration Testing Framework for Software-Defined Networks
 
AI & ML in Cyber Security - Why Algorithms are Dangerous
AI & ML in Cyber Security - Why Algorithms are DangerousAI & ML in Cyber Security - Why Algorithms are Dangerous
AI & ML in Cyber Security - Why Algorithms are Dangerous
 
IRJET- SDN Multi-Controller based Framework to Detect and Mitigate DDoS i...
IRJET-  	  SDN Multi-Controller based Framework to Detect and Mitigate DDoS i...IRJET-  	  SDN Multi-Controller based Framework to Detect and Mitigate DDoS i...
IRJET- SDN Multi-Controller based Framework to Detect and Mitigate DDoS i...
 
Pivotal Data Lake Architecture & its role in security analytics
Pivotal Data Lake Architecture & its role in security analyticsPivotal Data Lake Architecture & its role in security analytics
Pivotal Data Lake Architecture & its role in security analytics
 

Similar to Detecting Hacks: Anomaly Detection on Networking Data

Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDataWorks Summit
 
Model-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data AnalyticsModel-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data AnalyticsCisco Canada
 
Swisscom Network Analytics
Swisscom Network AnalyticsSwisscom Network Analytics
Swisscom Network Analyticsconfluent
 
Swisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdf
Swisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdfSwisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdf
Swisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdfThomasGraf40
 
breed_python_tx_redacted
breed_python_tx_redactedbreed_python_tx_redacted
breed_python_tx_redactedRyan Breed
 
Botprobe - Reducing network threat intelligence big data
Botprobe - Reducing network threat intelligence big data Botprobe - Reducing network threat intelligence big data
Botprobe - Reducing network threat intelligence big data DATA SECURITY SOLUTIONS
 
Model driven telemetry
Model driven telemetryModel driven telemetry
Model driven telemetryCisco Canada
 
Monitoring ICS Communications
Monitoring ICS CommunicationsMonitoring ICS Communications
Monitoring ICS CommunicationsDigital Bond
 
Reactive Stream Processing in Industrial IoT using DDS and Rx
Reactive Stream Processing in Industrial IoT using DDS and RxReactive Stream Processing in Industrial IoT using DDS and Rx
Reactive Stream Processing in Industrial IoT using DDS and RxSumant Tambe
 
TechWiseTV Workshop: Encrypted Traffic Analytics
TechWiseTV Workshop: Encrypted Traffic Analytics TechWiseTV Workshop: Encrypted Traffic Analytics
TechWiseTV Workshop: Encrypted Traffic Analytics Robb Boyd
 
ONF & iSDX Webinar
ONF & iSDX WebinarONF & iSDX Webinar
ONF & iSDX WebinarKatie Hyman
 
Proposal for System Analysis and Desing
Proposal for System Analysis and DesingProposal for System Analysis and Desing
Proposal for System Analysis and DesingMd Khaza Main Uddin
 
ING CoreIntel - collect and process network logs across data centers in near ...
ING CoreIntel - collect and process network logs across data centers in near ...ING CoreIntel - collect and process network logs across data centers in near ...
ING CoreIntel - collect and process network logs across data centers in near ...Evention
 
Streaming real time data with Vibe Data Stream
Streaming real time data with Vibe Data StreamStreaming real time data with Vibe Data Stream
Streaming real time data with Vibe Data StreamInformaticaMarketplace
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisAlex Henthorn-Iwane
 

Similar to Detecting Hacks: Anomaly Detection on Networking Data (20)

Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 
Model-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data AnalyticsModel-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data Analytics
 
Swisscom Network Analytics
Swisscom Network AnalyticsSwisscom Network Analytics
Swisscom Network Analytics
 
Swisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdf
Swisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdfSwisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdf
Swisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdf
 
breed_python_tx_redacted
breed_python_tx_redactedbreed_python_tx_redacted
breed_python_tx_redacted
 
Botprobe - Reducing network threat intelligence big data
Botprobe - Reducing network threat intelligence big data Botprobe - Reducing network threat intelligence big data
Botprobe - Reducing network threat intelligence big data
 
Model driven telemetry
Model driven telemetryModel driven telemetry
Model driven telemetry
 
Monitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp DockerMonitoring in 2017 - TIAD Camp Docker
Monitoring in 2017 - TIAD Camp Docker
 
Monitoring ICS Communications
Monitoring ICS CommunicationsMonitoring ICS Communications
Monitoring ICS Communications
 
Reactive Stream Processing in Industrial IoT using DDS and Rx
Reactive Stream Processing in Industrial IoT using DDS and RxReactive Stream Processing in Industrial IoT using DDS and Rx
Reactive Stream Processing in Industrial IoT using DDS and Rx
 
TechWiseTV Workshop: Encrypted Traffic Analytics
TechWiseTV Workshop: Encrypted Traffic Analytics TechWiseTV Workshop: Encrypted Traffic Analytics
TechWiseTV Workshop: Encrypted Traffic Analytics
 
ONF & iSDX Webinar
ONF & iSDX WebinarONF & iSDX Webinar
ONF & iSDX Webinar
 
Proposal for System Analysis and Desing
Proposal for System Analysis and DesingProposal for System Analysis and Desing
Proposal for System Analysis and Desing
 
Next-Gen DDoS Detection
Next-Gen DDoS DetectionNext-Gen DDoS Detection
Next-Gen DDoS Detection
 
Core intel
Core intelCore intel
Core intel
 
ING CoreIntel - collect and process network logs across data centers in near ...
ING CoreIntel - collect and process network logs across data centers in near ...ING CoreIntel - collect and process network logs across data centers in near ...
ING CoreIntel - collect and process network logs across data centers in near ...
 
Streaming real time data with Vibe Data Stream
Streaming real time data with Vibe Data StreamStreaming real time data with Vibe Data Stream
Streaming real time data with Vibe Data Stream
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
Shaping a Digital Vision
Shaping a Digital VisionShaping a Digital Vision
Shaping a Digital Vision
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow Analysis
 

Recently uploaded

Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 

Recently uploaded (20)

Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 

Detecting Hacks: Anomaly Detection on Networking Data

  • 1. 1© 2010 Cisco and/or its affiliates. All rights reserved. Detecting Hacks: Anomaly Detection on Networking Data James Sirota (@JamesSirota) Lead Data Scientist – Managed Threat Defense Chester Parrott (@ParrottSquawk) Data Scientist – Managed Threat Defense June 2015
  • 2. © 2015 Cisco and/or its affiliates. All rights reserved. 2 In the next few minutes… •  Defense in Depth for Big Data •  Network Anomaly Detection Overview •  Volume Anomaly Detection •  Feature Anomaly Detection •  Model Architecture •  Deployment on OpenSOC Platform •  Questions
  • 3. © 2015 Cisco and/or its affiliates. All rights reserved. 3 Who are we? Big Data Security Analytics Open Source Managed Service
  • 4. © 2015 Cisco and/or its affiliates. All rights reserved. 4 The New Defense-In-Depth Defense Strategy Static Sandboxing Threat Intel Feeds Rules Engines Volume- Based Feature- Based NLP-Based Token Clustering User Profiling Asset Profiling Interaction Profiling Dynamic Sandboxing Malware Classifiers Script Classifiers Perimeter Monitoring Web Scraping Soc. Media Analytics Model Validators Training Set Generation Signature Matching Rules- Based Matching Network Anomaly Detection Log Anomaly Detection Behavioral Anomaly Detection Malware Family Script Family Scraping Honeypots Misuse Detection Intrusion Detection Supervised Class. Look- Ahead Analytics Legacy Mindset Generic Threats Targeted Threats Future Threats
  • 5. © 2015 Cisco and/or its affiliates. All rights reserved. 5 Network Anomaly Detection Network Anomaly Detection Volume- Based Feature- Based Statistical Process Control Frequency Domain Time series Forecasting Information Theory Principal Component Analysis Sketch- Based 3-sigma algorithms Exponential Smoothing ARIMA Fast Fourier Transform Wavelets Entropy Subspace Heavy Hitters Set Cardinality Probability Models Markov Models Bayes Nets Unsupervis ed ML Clustering Density Proximity Anomalous Traffic Patterns Interrelationships between Features
  • 6. © 2015 Cisco and/or its affiliates. All rights reserved. 6 Volume-Based vs. Feature Based Telemetry Volume-Based Feature-Based Encrypted Traffic (Raw Packet) YES NO Raw Packet + Header Metadata YES YES Machine Exhaust Data YES (online) NO DPI Metadata NO YES Netflow YES YES Enrichment Metadata YES YES Application Logs YES YES Other Alerts NO* YES
  • 7. © 2015 Cisco and/or its affiliates. All rights reserved. 7 Anomaly Detection: 3-Phase Process Unstructured Data Identify Anomaly Classify Alert Examine + Reinforce Training Set Historical Context
  • 8. © 2015 Cisco and/or its affiliates. All rights reserved. 8 Phase 1: Identify Unstructured Data Understandingof Normal Anomaly A Anomaly B Anomaly C Anomaly (N)
  • 9. © 2015 Cisco and/or its affiliates. All rights reserved. 9 Phase 2: Classify Full Packet Telemetry DPI Telemetry Telemetry (N) Outcome Volume Anomaly Entropy Anomaly Feature (x) Heavy Hitters Anomaly Volume Anomaly Cardinality Anomaly Feature (x) Protocol Anomaly Featur(x) Anomaly (A) Anomaly (B) Anomaly (N) Class Label x x x x x x x Port Scan x x x x x False Positive x x x x Network Scan x x x x Port Scan x x x x False Positive x x x x x x DDoS
  • 10. © 2015 Cisco and/or its affiliates. All rights reserved. 10 Phase 3: Examine + Reinforce Full Packet Telemetry DPI Telemetry Telemetry (N) Outcome Volume Anomaly Entropy Anomaly Feature (x) Heavy Hitters Anomaly Volume Anomaly Cardinality Anomaly Feature (x) Protocol Anomaly Featur(x) Anomaly (A) Anomaly (B) Anomaly (N) Class Label x x x x x x x Port Scan x x x x x False Positive x x x x Network Scan x x x x False Positive x x x x x x DDoS x x x x x x False Positive x x x x x x False Positive x x x x False Positive x x x x x x DDoS
  • 11. © 2015 Cisco and/or its affiliates. All rights reserved. 11 Basic Anomalies Anomaly   Definition   Alpha Flows Large volume point-to-point flows DoS Denial of service (distributed or single source) Flash Crowd Large volume of traffic to a single destination from a large number of sources Port Scan Probe to many destination ports on a small number of destination addresses Network Scan Probe to many destination addresses on a small number of destination ports Outage Events Traffic shifts because of equipment failures or maintenance Plateau Behavior Behavior caused by traffic reaching environmental limits Point-to-Multipoint Traffic from a single source to many destinations, e.g., content distribution Worms Scanning by worms for vulnerable hosts, which is a special case of network scan
  • 12. © 2015 Cisco and/or its affiliates. All rights reserved. 12 Batch Analytics Normalcy Models
  • 13. © 2015 Cisco and/or its affiliates. All rights reserved. 13 Implementation MAP MAP MAP Time Series DB Key: assetID-metricID-Bin RED RED RED Asset Bin Value Server 1 15 5pt * Server 2 15 5pt * Server (N) 15 5pt * assetID-metricID-Bin : 5pt Telemetry Anomaly? * 5-point summary (5pt): 1.  the sample minimum (smallest observation) 2.  the lower quartile or first quartile 3.  the median (middle value) 4.  the upper quartile or third quartile 5.  the sample maximum (largest observation) Table Name: Metric ID (Cumulative Volume)
  • 14. © 2015 Cisco and/or its affiliates. All rights reserved. 14 Batch Analytics Forecasting Models Forecast Forecasting Algorithm (ARIMA/Holt-Winters, …)
  • 15. © 2015 Cisco and/or its affiliates. All rights reserved. 15 Implementation MAP MAP MAP Time Series DB Key: assetID-metricID-Bin RED RED RED Key: assetID-metricID-Bin: [Expected | STD] Telemetry Anomaly? Asset Bin Value Server 1 15 EX |STD Server 2 15 EX |STD Server (N) 15 EX |STD Table Name: Metric ID (Cumulative Volume)
  • 16. © 2015 Cisco and/or its affiliates. All rights reserved. 16 Time Series DB Batch Model Deployment Step 1: Bootstrap: Stream Data Unstructured Data OpenSOC OpenSOC JSON Step 2: Pre-Compute Expected Values (Batch) Timestamp HIVE Time Series DB MR/SparkMR/SparkMR/Spark Step 3: Generate Alerts (Online) Unstructured Data OpenSOC Expected Values Reference Cache Time Series DB OpenSOC JSON Timestamp HIVE Alert ES Expected Values Reference Cache
  • 17. © 2015 Cisco and/or its affiliates. All rights reserved. 17 Online Analytics Data Preparation Deseasonalizer AV CMA RAT UF RF DV
  • 18. © 2015 Cisco and/or its affiliates. All rights reserved. 18 Online Analytics Other things to check for Trend: Seasonal Variability: Evolution of Regularities:
  • 19. © 2015 Cisco and/or its affiliates. All rights reserved. 19 Online Processing 3-Sigma Algorithms Micro Forecasting Histogram Bins
  • 20. © 2015 Cisco and/or its affiliates. All rights reserved. 20 Frequency Domain High •  Trendless •  Noise •  Spikes represent Anomalies Medium •  Flatter •  Finer-grained Trends Low •  Seasonal & ‘Peaky’ •  Weekly/Daily Trends
  • 21. © 2015 Cisco and/or its affiliates. All rights reserved. 21 Frequency Domain – Wavelet Separation
  • 22. © 2015 Cisco and/or its affiliates. All rights reserved. 22 Online Model Deployment Time Series DB Step 1: Bootstrap: Stream Data Unstructured Data OpenSOC OpenSOC JSON Step 2: Generate Adjuster Timestamp HIVE Time Series DB MR/Spark Adjuster / Decomposer Step 3: Generate Alerts (Online) Unstructured Data OpenSOC Time Series DB OpenSOC JSON Timestamp HIVE Alert ES Adjuster Decomposer MR/Spark MR/Spark
  • 23. © 2015 Cisco and/or its affiliates. All rights reserved. 23 Feature-Based Anomaly Detection Continuous Numeric Features* •  Continuous Numeric Feature - can take on any value between its minimum value and its maximum value •  Normalization - adjusting values measured on different scales to a notionally common scale 1.  Proximity Based Techniques Example: K-Nearest Neighbors (KNN) 2. Clustering Example: K-Means 3. Density - Based MPS Anomaly KBps Anomaly Possible Explanation TOO HIGH TOO LOW Port Scan Network Scan TOO HIGH TOO HIGH DDoS TOO LOW TOO HIGH Control Traffic Anomaly OK OK No Anomaly Sample Anomalies Detected
  • 24. © 2015 Cisco and/or its affiliates. All rights reserved. 24 Feature-Based Anomaly Detection Categorical Features * •  Categorical Features - can take on one of a limited, and usually fixed, number of possible values •  Stream Sketch - algorithm produces an approximate answer based on a summary of the data stream in memory Example: Protocol {UDP|FTP|HTTP|…}, GEO-MET {PHOENIX | DALLAS | LONDON| …}, … Count-Min (CM) Sketch : number of occurrences of the element in a stream (Heavy Hitters) Why not count? Protocol: 42k elements per asset. GeoMet: 246k per asset Time Series DBCategorical Data CM Sketch Heavy Hitters Asset Bin Value Server 1 15 HH Server 2 15 HH Server (N) 15 HH MR Table Name: Protocol Unstructured Data CM Sketch Alert Expected: {HTTP, UDP, FTP, DNS} ACTUAL: {DNS, ICMP, HTP, FTP}
  • 25. © 2015 Cisco and/or its affiliates. All rights reserved. 25 Feature-Based Anomaly Detection Feature Ratios HyperLogLog: approximating the number of distinct elements in a multiset Useful Ratio: # distinct elements / total elements [0-1] •  Digest- structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means Unstructured Data Hyper LogLog Distinct Src_port Dst_port Src_ip Dst_ip Storm Bolt Src_port Dst_port Src_ip Dst_ip Ack Total Ratios Digest * Alert FEATURE DT RATIO Anomaly Possible Reason SRC_IP ~1/~0 Flash Crowd/DDoS SRC_PORT ~1/~0 Failure Probing/App Hijack DST_IP ~1/~0 Network Scan/DDoS DST_PORT ~1/~0 Port Scan/Footprinting
  • 26. © 2015 Cisco and/or its affiliates. All rights reserved. 26 Feature-Based Anomaly Detection Correlation - Information Theory •  Information Theory - study of fundamental limits on signal processing, compression, and storage •  Entropy- a measure of unpredictability of information content Unstructured Data Anomaly-Free Training Set Entropy Summarizer Entropy Src_port Dst_port Src_ip Dst_ipTime Bin (n) SRC_I P SRC_POR T DST_I P DST_PORT SRC_IP - .95 .85 .75 SRC_PORT - .97 .76 DST_IP - - - .98 DST_PORT - - - - MR Alert Time Bin (n)
  • 27. © 2015 Cisco and/or its affiliates. All rights reserved. 27 Principal Component Analysis (PCA) Analysis Component Principal •  Feature Selection Algorithm •  Dimensionality Reduction •  E.g. 4 features •  ServerA (A) •  ServerB (B) •  ServerC (C) •  Cumulative = A + B + C
  • 28. © 2015 Cisco and/or its affiliates. All rights reserved. 28 PCA – Component Construction ServerA Traffic X -0.5052803 ServerB Traffic X -0.4990556 ServerC Traffic X -0.4816276 Cumulative X -0.5134882 PC1 σ: 0.0135 ServerA Traffic X 0.2801275 ServerB Traffic X 0.4611079 ServerC Traffic X -0.8395562 Cumulative X 0.0636666 PC2 σ: 0.5773 ServerA Traffic X 0.6867089 ServerB Traffic X -0.6988557 ServerC Traffic X -0.1441834 Cumulative X 0.138718 PC3 σ: 0.5773 ServerA Traffic X -0.4411929 ServerB Traffic X -0.2234362 ServerC Traffic X -0.2058916 Cumulative X 0.8444132 PC4 σ: 0.5773
  • 29. © 2015 Cisco and/or its affiliates. All rights reserved. 29 PCA – Component Separation
  • 30. © 2015 Cisco and/or its affiliates. All rights reserved. 30 PCA – Component Separation
  • 31. © 2015 Cisco and/or its affiliates. All rights reserved. 31 Putting it All Together: OpenSOC RAW Transform Enrich Alert (Rules-Based) Enriched Filter Aggregators Router Model 1 Scorer HIVE + Hbase Long-Term Data Store Flume Kafka Storm Model 2 Model n OpenSOC-Streaming OpenSOC-Aggregation OpenSOC-ML SOC Alert Consumers UIUIUI UIUIWeb Services Secure Gateway Services External Alert Consumers Big Data Stores Elastic Search Real-Time Index and Search Hbase OpenTSDB Titan Graph Alerts ES/HIVE Alerts Store Remedy Ticketing System
  • 32. © 2015 Cisco and/or its affiliates. All rights reserved. 32 We are hiring… •  Data Scientists (Security) •  Aspiring Data Scientists •  Security/Networking Experience Required •  Software Engineering Experience Required •  PhD not required •  Background in stats or ML not required •  Security Researchers *Please contact us via LinkedIn with your profile
  • 33. © 2015 Cisco and/or its affiliates. All rights reserved. 33 Book idea… Security Analytics on Hadoop •  Anomaly Detection •  Targeted Models •  Deployment Best Practices •  Alerts •  Visualization Techniques •  Etc… If interested in contributing please contact James Sirota on LinkedIn
  • 34. © 2015 Cisco and/or its affiliates. All rights reserved. 34 OpenSOC Resources (@ProjectOpenSOC) Github Repo •  https://github.com/OpenSOC/opensoc Slides •  http://www.slideshare.net/JamesSirota •  https://speakerdeck.com/jsirota Corporate Blogs •  http://blogs.cisco.com/author/jamessirota •  http://blogs.cisco.com/security/opensoc-an-open-commitment-to-security Contributor Blogs •  https://medium.com/@jamessirota •  parrottsquawk.com