SlideShare a Scribd company logo
1 of 34
1© 2010 Cisco and/or its affiliates. All rights reserved.
Detecting Hacks:
Anomaly Detection on
Networking Data
James Sirota (@JamesSirota)
Lead Data Scientist – Managed Threat Defense
Chester Parrott (@ParrottSquawk)
Data Scientist – Managed Threat Defense
June 2015
© 2015 Cisco and/or its affiliates. All rights reserved. 2
In the next few minutes…
• Defense in Depth for Big Data
• Network Anomaly Detection Overview
• Volume Anomaly Detection
• Feature Anomaly Detection
• Model Architecture
• Deployment on OpenSOC Platform
• Questions
© 2015 Cisco and/or its affiliates. All rights reserved. 3
Who are we?
Big Data
Security
Analytics
Open Source
Managed Service
© 2015 Cisco and/or its affiliates. All rights reserved. 4
The New Defense-In-Depth
Defense
Strategy
Static
Sandboxing
Threat Intel
Feeds
Rules
Engines
Volume-
Based
Feature-
Based
NLP-Based
Token
Clustering
User
Profiling
Asset
Profiling
Interaction
Profiling
Dynamic
Sandboxing
Malware
Classifiers
Script
Classifiers
Perimeter
Monitoring
Web
Scraping
Soc. Media
Analytics
Model
Validators
Training Set
Generation
Signature
Matching
Rules-
Based
Matching
Network
Anomaly
Detection
Log
Anomaly
Detection
Behavioral
Anomaly
Detection
Malware
Family
Script
Family
Scraping Honeypots
Misuse
Detection
Intrusion
Detection
Supervised
Class.
Look-
Ahead
Analytics
Legacy Mindset
Generic Threats Targeted Threats Future Threats
© 2015 Cisco and/or its affiliates. All rights reserved. 5
Network Anomaly Detection
Network
Anomaly
Detection
Volume-
Based
Feature-
Based
Statistical
Process
Control
Frequency
Domain
Time series
Forecasting
Information
Theory
Principal
Component
Analysis
Sketch-
Based
3-sigma
algorithms
Exponential
Smoothing
ARIMA
Fast Fourier
Transform
Wavelets
Entropy Subspace
Heavy
Hitters
Set
Cardinality
Probability
Models
Markov
Models
Bayes Nets
Unsupervis
ed ML
Clustering
Density
Proximity
Anomalous
Traffic Patterns
Interrelationships between
Features
© 2015 Cisco and/or its affiliates. All rights reserved. 6
Volume-Based vs. Feature Based
Telemetry Volume-Based Feature-Based
Encrypted Traffic (Raw Packet) YES NO
Raw Packet + Header Metadata YES YES
Machine Exhaust Data YES (online) NO
DPI Metadata NO YES
Netflow YES YES
Enrichment Metadata YES YES
Application Logs YES YES
Other Alerts NO* YES
© 2015 Cisco and/or its affiliates. All rights reserved. 7
Anomaly Detection: 3-Phase Process
Unstructured Data
Identify
Anomaly
Classify
Alert
Examine +
Reinforce
Training Set
Historical
Context
© 2015 Cisco and/or its affiliates. All rights reserved. 8
Phase 1: Identify
Unstructured Data
Understandingof
Normal
Anomaly A
Anomaly B
Anomaly C
Anomaly (N)
© 2015 Cisco and/or its affiliates. All rights reserved. 9
Phase 2: Classify
Full Packet Telemetry DPI Telemetry Telemetry (N) Outcome
Volume
Anomaly
Entropy
Anomaly
Feature (x)
Heavy Hitters
Anomaly
Volume
Anomaly
Cardinality
Anomaly
Feature (x)
Protocol
Anomaly
Featur(x)
Anomaly (A) Anomaly (B) Anomaly (N) Class Label
x x x x x x x Port Scan
x x x x x False
Positive
x x x x Network
Scan
x x x x Port Scan
x x x x False
Positive
x x x x x x DDoS
© 2015 Cisco and/or its affiliates. All rights reserved. 10
Phase 3: Examine + Reinforce
Full Packet Telemetry DPI Telemetry Telemetry (N) Outcome
Volume
Anomaly
Entropy
Anomaly
Feature (x)
Heavy Hitters
Anomaly
Volume
Anomaly
Cardinality
Anomaly
Feature (x)
Protocol
Anomaly
Featur(x)
Anomaly (A) Anomaly (B) Anomaly (N) Class Label
x x x x x x x Port Scan
x x x x x False
Positive
x x x x Network
Scan
x x x x False
Positive
x x x x x x DDoS
x x x x x x False
Positive
x x x x x x False
Positive
x x x x False
Positive
x x x x x x DDoS
© 2015 Cisco and/or its affiliates. All rights reserved. 11
Basic Anomalies
Anomaly Definition
Alpha Flows Large volume point-to-point flows
DoS Denial of service (distributed or single source)
Flash Crowd Large volume of traffic to a single destination from a large number of sources
Port Scan Probe to many destination ports on a small number of destination addresses
Network Scan Probe to many destination addresses on a small number of destination ports
Outage Events Traffic shifts because of equipment failures or maintenance
Plateau Behavior Behavior caused by traffic reaching environmental limits
Point-to-Multipoint Traffic from a single source to many destinations, e.g., content distribution
Worms Scanning by worms for vulnerable hosts, which is a special case of network scan
© 2015 Cisco and/or its affiliates. All rights reserved. 12
Batch Analytics
Normalcy Models
© 2015 Cisco and/or its affiliates. All rights reserved. 13
Implementation
MAP MAP MAP
Time Series DB
Key: assetID-metricID-Bin
RED RED RED
Asset Bin Value
Server 1 15 5pt *
Server 2 15 5pt *
Server (N) 15 5pt *
assetID-metricID-Bin : 5pt
Telemetry
Anomaly?
* 5-point summary (5pt):
1. the sample minimum
(smallest observation)
2. the lower quartile or first
quartile
3. the median (middle value)
4. the upper quartile or third
quartile
5. the sample maximum (largest
observation)
Table Name: Metric ID (Cumulative Volume)
© 2015 Cisco and/or its affiliates. All rights reserved. 14
Batch Analytics
Forecasting Models
Forecast
Forecasting Algorithm
(ARIMA/Holt-Winters, …)
© 2015 Cisco and/or its affiliates. All rights reserved. 15
Implementation
MAP MAP MAP
Time Series DB
Key: assetID-metricID-Bin
RED RED RED
Key: assetID-metricID-Bin:
[Expected | STD]
Telemetry
Anomaly?
Asset Bin Value
Server 1 15 EX |STD
Server 2 15 EX |STD
Server (N) 15 EX |STD
Table Name: Metric ID (Cumulative Volume)
© 2015 Cisco and/or its affiliates. All rights reserved. 16
Time Series DB
Batch Model Deployment
Step 1: Bootstrap: Stream Data
Unstructured Data
OpenSOC
OpenSOC JSON
Step 2: Pre-Compute Expected Values (Batch)
Timestamp
HIVE
Time Series DB MR/SparkMR/SparkMR/Spark
Step 3: Generate Alerts (Online)
Unstructured Data
OpenSOC
Expected Values
Reference Cache
Time Series DB
OpenSOC JSON
Timestamp
HIVE
Alert ES
Expected Values
Reference
Cache
© 2015 Cisco and/or its affiliates. All rights reserved. 17
Online Analytics
Data Preparation
Deseasonalizer
AV CMA RAT UF RF DV
© 2015 Cisco and/or its affiliates. All rights reserved. 18
Online Analytics
Other things to check for
Trend:
Seasonal Variability:
Evolution of
Regularities:
© 2015 Cisco and/or its affiliates. All rights reserved. 19
Online Processing
3-Sigma Algorithms
Micro Forecasting
Histogram Bins
© 2015 Cisco and/or its affiliates. All rights reserved. 20
Frequency Domain
High
• Trendless
• Noise
• Spikes represent
Anomalies
Medium
• Flatter
• Finer-grained
Trends
Low
• Seasonal &
‘Peaky’
• Weekly/Daily
Trends
© 2015 Cisco and/or its affiliates. All rights reserved. 21
Frequency Domain – Wavelet Separation
© 2015 Cisco and/or its affiliates. All rights reserved. 22
Online Model Deployment
Time Series DB
Step 1: Bootstrap: Stream Data
Unstructured Data
OpenSOC
OpenSOC JSON
Step 2: Generate Adjuster
Timestamp
HIVE
Time Series DB
MR/Spark
Adjuster / Decomposer
Step 3: Generate Alerts (Online)
Unstructured Data
OpenSOC
Time Series DB
OpenSOC JSON
Timestamp
HIVE
Alert ES
Adjuster
Decomposer
MR/Spark
MR/Spark
© 2015 Cisco and/or its affiliates. All rights reserved. 23
Feature-Based Anomaly Detection
Continuous Numeric Features*
• Continuous Numeric Feature - can take on any value between its minimum value and its maximum value
• Normalization - adjusting values measured on different scales to a notionally common scale
1. Proximity Based Techniques
Example: K-Nearest Neighbors (KNN)
2. Clustering
Example: K-Means
3. Density - Based
MPS
Anomaly
KBps
Anomaly
Possible Explanation
TOO HIGH TOO LOW Port Scan
Network Scan
TOO HIGH TOO HIGH DDoS
TOO LOW TOO HIGH Control Traffic Anomaly
OK OK No Anomaly
Sample Anomalies Detected
© 2015 Cisco and/or its affiliates. All rights reserved. 24
Feature-Based Anomaly Detection
Categorical Features *
• Categorical Features - can take on one of a limited, and usually fixed, number of possible values
• Stream Sketch - algorithm produces an approximate answer based on a summary of the data stream in memory
Example: Protocol {UDP|FTP|HTTP|…}, GEO-MET {PHOENIX | DALLAS | LONDON| …}, …
Count-Min (CM) Sketch : number of occurrences of the element in a stream (Heavy Hitters)
Why not count? Protocol: 42k elements per asset. GeoMet: 246k per asset
Time Series DBCategorical Data
CM
Sketch Heavy Hitters
Asset Bin Value
Server 1 15 HH
Server 2 15 HH
Server (N) 15 HH
MR
Table Name: Protocol
Unstructured Data
CM
Sketch Alert
Expected: {HTTP, UDP, FTP, DNS}
ACTUAL: {DNS, ICMP, HTP, FTP}
© 2015 Cisco and/or its affiliates. All rights reserved. 25
Feature-Based Anomaly Detection
Feature Ratios
HyperLogLog: approximating the number of distinct elements in a multiset
Useful Ratio: # distinct elements / total elements [0-1]
• Digest- structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means
Unstructured Data
Hyper
LogLog
Distinct
Src_port
Dst_port
Src_ip
Dst_ip
Storm
Bolt
Src_port
Dst_port
Src_ip
Dst_ip
Ack Total
Ratios
Digest *
Alert
FEATURE DT RATIO
Anomaly
Possible Reason
SRC_IP ~1/~0 Flash Crowd/DDoS
SRC_PORT ~1/~0 Failure Probing/App Hijack
DST_IP ~1/~0 Network Scan/DDoS
DST_PORT ~1/~0 Port Scan/Footprinting
© 2015 Cisco and/or its affiliates. All rights reserved. 26
Feature-Based Anomaly Detection
Correlation - Information Theory
• Information Theory - study of fundamental limits on signal processing, compression, and storage
• Entropy- a measure of unpredictability of information content
Unstructured Data
Anomaly-Free
Training Set
Entropy
Summarizer
Entropy
Src_port
Dst_port
Src_ip
Dst_ipTime Bin (n)
SRC_I
P
SRC_POR
T
DST_I
P
DST_PORT
SRC_IP - .95 .85 .75
SRC_PORT - .97 .76
DST_IP - - - .98
DST_PORT - - - -
MR
Alert
Time Bin (n)
© 2015 Cisco and/or its affiliates. All rights reserved. 27
Principal Component Analysis (PCA)
Analysis
Component
Principal
• Feature Selection Algorithm
• Dimensionality Reduction
• E.g. 4 features
• ServerA (A)
• ServerB (B)
• ServerC (C)
• Cumulative = A + B + C
© 2015 Cisco and/or its affiliates. All rights reserved. 28
PCA – Component Construction
ServerA
Traffic
X
-0.5052803
ServerB
Traffic
X
-0.4990556
ServerC
Traffic
X
-0.4816276
Cumulative
X
-0.5134882
PC1
σ: 0.0135
ServerA
Traffic
X
0.2801275
ServerB
Traffic
X
0.4611079
ServerC
Traffic
X
-0.8395562
Cumulative
X
0.0636666
PC2
σ: 0.5773
ServerA
Traffic
X
0.6867089
ServerB
Traffic
X
-0.6988557
ServerC
Traffic
X
-0.1441834
Cumulative
X
0.138718
PC3
σ: 0.5773
ServerA
Traffic
X
-0.4411929
ServerB
Traffic
X
-0.2234362
ServerC
Traffic
X
-0.2058916
Cumulative
X
0.8444132
PC4
σ:
0.5773
© 2015 Cisco and/or its affiliates. All rights reserved. 29
PCA – Component Separation
© 2015 Cisco and/or its affiliates. All rights reserved. 30
PCA – Component Separation
© 2015 Cisco and/or its affiliates. All rights reserved. 31
Putting it All Together: OpenSOC
RAW Transform Enrich Alert
(Rules-Based)
Enriched
Filter Aggregators
Router Model 1 Scorer
HIVE + Hbase
Long-Term Data Store
Flume Kafka Storm
Model 2
Model n
OpenSOC-Streaming
OpenSOC-Aggregation
OpenSOC-ML
SOC Alert Consumers
UIUIUI
UIUIWeb
Services
Secure Gateway
Services
External Alert
Consumers
Big Data Stores
Elastic Search
Real-Time Index and
Search
Hbase
OpenTSDB
Titan Graph
Alerts
ES/HIVE
Alerts Store
Remedy
Ticketing System
© 2015 Cisco and/or its affiliates. All rights reserved. 32
We are hiring…
• Data Scientists (Security)
• Aspiring Data Scientists
• Security/Networking Experience Required
• Software Engineering Experience Required
• PhD not required
• Background in stats or ML not required
• Security Researchers
*Please contact us via LinkedIn with your profile
© 2015 Cisco and/or its affiliates. All rights reserved. 33
Book idea…
Security Analytics on Hadoop
• Anomaly Detection
• Targeted Models
• Deployment Best Practices
• Alerts
• Visualization Techniques
• Etc…
If interested in contributing please contact James Sirota on LinkedIn
Thank you.

More Related Content

What's hot

Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeDataWorks Summit
 
Simplificando la seguridad en entornos de nube híbridos con el Security Fabri...
Simplificando la seguridad en entornos de nube híbridos con el Security Fabri...Simplificando la seguridad en entornos de nube híbridos con el Security Fabri...
Simplificando la seguridad en entornos de nube híbridos con el Security Fabri...Cristian Garcia G.
 
Analytics Driven SIEM Workshop
Analytics Driven SIEM WorkshopAnalytics Driven SIEM Workshop
Analytics Driven SIEM WorkshopSplunk
 
Artificial Intelligence and Cybersecurity
Artificial Intelligence and CybersecurityArtificial Intelligence and Cybersecurity
Artificial Intelligence and CybersecurityOlivier Busolini
 
Strategies for Managing OT Cybersecurity Risk
Strategies for Managing OT Cybersecurity RiskStrategies for Managing OT Cybersecurity Risk
Strategies for Managing OT Cybersecurity RiskMighty Guides, Inc.
 
Institucional proofpoint
Institucional proofpointInstitucional proofpoint
Institucional proofpointvoliverio
 
IBM QRadar Security Intelligence Overview
IBM QRadar Security Intelligence OverviewIBM QRadar Security Intelligence Overview
IBM QRadar Security Intelligence OverviewCamilo Fandiño Gómez
 
NTXISSACSC2 - Advanced Persistent Threat (APT) Life Cycle Management Monty Mc...
NTXISSACSC2 - Advanced Persistent Threat (APT) Life Cycle Management Monty Mc...NTXISSACSC2 - Advanced Persistent Threat (APT) Life Cycle Management Monty Mc...
NTXISSACSC2 - Advanced Persistent Threat (APT) Life Cycle Management Monty Mc...North Texas Chapter of the ISSA
 
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
Apache Kafka for Cybersecurity and SIEM / SOAR ModernizationApache Kafka for Cybersecurity and SIEM / SOAR Modernization
Apache Kafka for Cybersecurity and SIEM / SOAR ModernizationKai Wähner
 
Software security engineering
Software security engineeringSoftware security engineering
Software security engineeringAHM Pervej Kabir
 
Elastic SIEM (Endpoint Security)
Elastic SIEM (Endpoint Security)Elastic SIEM (Endpoint Security)
Elastic SIEM (Endpoint Security)Kangaroot
 
Helping Small Companies Leverage CTI with an Open Source Threat Mapping
Helping Small Companies Leverage CTI with an Open Source Threat MappingHelping Small Companies Leverage CTI with an Open Source Threat Mapping
Helping Small Companies Leverage CTI with an Open Source Threat MappingMITRE - ATT&CKcon
 
Siem ppt
Siem pptSiem ppt
Siem pptkmehul
 
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiTimothy Spann
 
Detection and Response Roles
Detection and Response RolesDetection and Response Roles
Detection and Response RolesFlorian Roth
 

What's hot (20)

Security Information Event Management - nullhyd
Security Information Event Management - nullhydSecurity Information Event Management - nullhyd
Security Information Event Management - nullhyd
 
Building the Security Operations and SIEM Use CAse
Building the Security Operations and SIEM Use CAseBuilding the Security Operations and SIEM Use CAse
Building the Security Operations and SIEM Use CAse
 
Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-time
 
Cisco OpenSOC
Cisco OpenSOCCisco OpenSOC
Cisco OpenSOC
 
Simplificando la seguridad en entornos de nube híbridos con el Security Fabri...
Simplificando la seguridad en entornos de nube híbridos con el Security Fabri...Simplificando la seguridad en entornos de nube híbridos con el Security Fabri...
Simplificando la seguridad en entornos de nube híbridos con el Security Fabri...
 
Security Information and Event Managemen
Security Information and Event ManagemenSecurity Information and Event Managemen
Security Information and Event Managemen
 
Analytics Driven SIEM Workshop
Analytics Driven SIEM WorkshopAnalytics Driven SIEM Workshop
Analytics Driven SIEM Workshop
 
Artificial Intelligence and Cybersecurity
Artificial Intelligence and CybersecurityArtificial Intelligence and Cybersecurity
Artificial Intelligence and Cybersecurity
 
Strategies for Managing OT Cybersecurity Risk
Strategies for Managing OT Cybersecurity RiskStrategies for Managing OT Cybersecurity Risk
Strategies for Managing OT Cybersecurity Risk
 
Institucional proofpoint
Institucional proofpointInstitucional proofpoint
Institucional proofpoint
 
IBM QRadar Security Intelligence Overview
IBM QRadar Security Intelligence OverviewIBM QRadar Security Intelligence Overview
IBM QRadar Security Intelligence Overview
 
NTXISSACSC2 - Advanced Persistent Threat (APT) Life Cycle Management Monty Mc...
NTXISSACSC2 - Advanced Persistent Threat (APT) Life Cycle Management Monty Mc...NTXISSACSC2 - Advanced Persistent Threat (APT) Life Cycle Management Monty Mc...
NTXISSACSC2 - Advanced Persistent Threat (APT) Life Cycle Management Monty Mc...
 
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
Apache Kafka for Cybersecurity and SIEM / SOAR ModernizationApache Kafka for Cybersecurity and SIEM / SOAR Modernization
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
 
Software security engineering
Software security engineeringSoftware security engineering
Software security engineering
 
Elastic SIEM (Endpoint Security)
Elastic SIEM (Endpoint Security)Elastic SIEM (Endpoint Security)
Elastic SIEM (Endpoint Security)
 
Helping Small Companies Leverage CTI with an Open Source Threat Mapping
Helping Small Companies Leverage CTI with an Open Source Threat MappingHelping Small Companies Leverage CTI with an Open Source Threat Mapping
Helping Small Companies Leverage CTI with an Open Source Threat Mapping
 
Baselining Logs
Baselining LogsBaselining Logs
Baselining Logs
 
Siem ppt
Siem pptSiem ppt
Siem ppt
 
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
 
Detection and Response Roles
Detection and Response RolesDetection and Response Roles
Detection and Response Roles
 

Viewers also liked

Anomaly Detection using Spark MLlib and Spark Streaming
Anomaly Detection using Spark MLlib and Spark StreamingAnomaly Detection using Spark MLlib and Spark Streaming
Anomaly Detection using Spark MLlib and Spark StreamingKeira Zhou
 
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache SparkCloudera, Inc.
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaSpark Summit
 
Gentlest Introduction to Tensorflow - Part 2
Gentlest Introduction to Tensorflow - Part 2Gentlest Introduction to Tensorflow - Part 2
Gentlest Introduction to Tensorflow - Part 2Khor SoonHin
 
Anomaly Detection Via PCA
Anomaly Detection Via PCAAnomaly Detection Via PCA
Anomaly Detection Via PCADeepak Kumar
 
Neural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep LearningNeural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep LearningAsim Jalis
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksTaegyun Jeon
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Jen Aman
 
Kind of big data in info sec
Kind of big data in info secKind of big data in info sec
Kind of big data in info secBen Finke
 
Autonomous Analytics
Autonomous AnalyticsAutonomous Analytics
Autonomous AnalyticsAnodot
 
Containers - (Austin Cloud Meetup April 2016)
Containers - (Austin Cloud Meetup April 2016)Containers - (Austin Cloud Meetup April 2016)
Containers - (Austin Cloud Meetup April 2016)Derrick Wippler
 
Machine Learning for Threat Detection
Machine Learning for Threat DetectionMachine Learning for Threat Detection
Machine Learning for Threat DetectionNapier University
 
RSA Conference 2016: Who Are You? From Meat to Electrons and Back Again
RSA Conference 2016: Who Are You? From Meat to Electrons and Back AgainRSA Conference 2016: Who Are You? From Meat to Electrons and Back Again
RSA Conference 2016: Who Are You? From Meat to Electrons and Back AgainMike Schwartz
 
Analise NetFlow in Real Time
Analise NetFlow in Real TimeAnalise NetFlow in Real Time
Analise NetFlow in Real TimePiotr Perzyna
 
Some Important Streaming Algorithms You Should Know About-(Ted Dunning, MapR)
Some Important Streaming Algorithms You Should Know About-(Ted Dunning, MapR)Some Important Streaming Algorithms You Should Know About-(Ted Dunning, MapR)
Some Important Streaming Algorithms You Should Know About-(Ted Dunning, MapR)Spark Summit
 
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A..."Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...Dataconomy Media
 
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald NowlingInsights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald NowlingSpark Summit
 
Final report ethical hacking
Final report ethical hackingFinal report ethical hacking
Final report ethical hackingsamprada123
 
Optical network architecture
Optical network architectureOptical network architecture
Optical network architectureSiddharth Singh
 
Types of sql injection attacks
Types of sql injection attacksTypes of sql injection attacks
Types of sql injection attacksRespa Peter
 

Viewers also liked (20)

Anomaly Detection using Spark MLlib and Spark Streaming
Anomaly Detection using Spark MLlib and Spark StreamingAnomaly Detection using Spark MLlib and Spark Streaming
Anomaly Detection using Spark MLlib and Spark Streaming
 
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache Spark
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
 
Gentlest Introduction to Tensorflow - Part 2
Gentlest Introduction to Tensorflow - Part 2Gentlest Introduction to Tensorflow - Part 2
Gentlest Introduction to Tensorflow - Part 2
 
Anomaly Detection Via PCA
Anomaly Detection Via PCAAnomaly Detection Via PCA
Anomaly Detection Via PCA
 
Neural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep LearningNeural Networks, Spark MLlib, Deep Learning
Neural Networks, Spark MLlib, Deep Learning
 
Electricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural NetworksElectricity price forecasting with Recurrent Neural Networks
Electricity price forecasting with Recurrent Neural Networks
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
 
Kind of big data in info sec
Kind of big data in info secKind of big data in info sec
Kind of big data in info sec
 
Autonomous Analytics
Autonomous AnalyticsAutonomous Analytics
Autonomous Analytics
 
Containers - (Austin Cloud Meetup April 2016)
Containers - (Austin Cloud Meetup April 2016)Containers - (Austin Cloud Meetup April 2016)
Containers - (Austin Cloud Meetup April 2016)
 
Machine Learning for Threat Detection
Machine Learning for Threat DetectionMachine Learning for Threat Detection
Machine Learning for Threat Detection
 
RSA Conference 2016: Who Are You? From Meat to Electrons and Back Again
RSA Conference 2016: Who Are You? From Meat to Electrons and Back AgainRSA Conference 2016: Who Are You? From Meat to Electrons and Back Again
RSA Conference 2016: Who Are You? From Meat to Electrons and Back Again
 
Analise NetFlow in Real Time
Analise NetFlow in Real TimeAnalise NetFlow in Real Time
Analise NetFlow in Real Time
 
Some Important Streaming Algorithms You Should Know About-(Ted Dunning, MapR)
Some Important Streaming Algorithms You Should Know About-(Ted Dunning, MapR)Some Important Streaming Algorithms You Should Know About-(Ted Dunning, MapR)
Some Important Streaming Algorithms You Should Know About-(Ted Dunning, MapR)
 
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A..."Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...
"Building Anomaly Detection For Large Scale Analytics", Yonatan Ben Shimon, A...
 
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald NowlingInsights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
 
Final report ethical hacking
Final report ethical hackingFinal report ethical hacking
Final report ethical hacking
 
Optical network architecture
Optical network architectureOptical network architecture
Optical network architecture
 
Types of sql injection attacks
Types of sql injection attacksTypes of sql injection attacks
Types of sql injection attacks
 

Similar to Detecting Hacks: Anomaly Detection on Networking Data

Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataJames Sirota
 
Swisscom Network Analytics
Swisscom Network AnalyticsSwisscom Network Analytics
Swisscom Network Analyticsconfluent
 
Model-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data AnalyticsModel-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data AnalyticsCisco Canada
 
Swisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdf
Swisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdfSwisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdf
Swisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdfThomasGraf40
 
Botprobe - Reducing network threat intelligence big data
Botprobe - Reducing network threat intelligence big data Botprobe - Reducing network threat intelligence big data
Botprobe - Reducing network threat intelligence big data DATA SECURITY SOLUTIONS
 
breed_python_tx_redacted
breed_python_tx_redactedbreed_python_tx_redacted
breed_python_tx_redactedRyan Breed
 
Network Telemetry: Pushing Boundaries
Network Telemetry: Pushing BoundariesNetwork Telemetry: Pushing Boundaries
Network Telemetry: Pushing BoundariesRam (Ramki) Krishnan
 
Reactive Stream Processing in Industrial IoT using DDS and Rx
Reactive Stream Processing in Industrial IoT using DDS and RxReactive Stream Processing in Industrial IoT using DDS and Rx
Reactive Stream Processing in Industrial IoT using DDS and RxSumant Tambe
 
Model driven telemetry
Model driven telemetryModel driven telemetry
Model driven telemetryCisco Canada
 
Monitoring ICS Communications
Monitoring ICS CommunicationsMonitoring ICS Communications
Monitoring ICS CommunicationsDigital Bond
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
 
ONF & iSDX Webinar
ONF & iSDX WebinarONF & iSDX Webinar
ONF & iSDX WebinarKatie Hyman
 
ING CoreIntel - collect and process network logs across data centers in near ...
ING CoreIntel - collect and process network logs across data centers in near ...ING CoreIntel - collect and process network logs across data centers in near ...
ING CoreIntel - collect and process network logs across data centers in near ...Evention
 
Introduction To NIDS
Introduction To NIDSIntroduction To NIDS
Introduction To NIDSMichael Boman
 
TechWiseTV Workshop: Encrypted Traffic Analytics
TechWiseTV Workshop: Encrypted Traffic Analytics TechWiseTV Workshop: Encrypted Traffic Analytics
TechWiseTV Workshop: Encrypted Traffic Analytics Robb Boyd
 
Proposal for System Analysis and Desing
Proposal for System Analysis and DesingProposal for System Analysis and Desing
Proposal for System Analysis and DesingMd Khaza Main Uddin
 
MMIX Peering Forum and MMNOG 2020: Packet Analysis for Network Security
MMIX Peering Forum and MMNOG 2020: Packet Analysis for Network SecurityMMIX Peering Forum and MMNOG 2020: Packet Analysis for Network Security
MMIX Peering Forum and MMNOG 2020: Packet Analysis for Network SecurityAPNIC
 
Toolkit Titans - Crafting a Cutting-Edge, Open-Source Security Operations Too...
Toolkit Titans - Crafting a Cutting-Edge, Open-Source Security Operations Too...Toolkit Titans - Crafting a Cutting-Edge, Open-Source Security Operations Too...
Toolkit Titans - Crafting a Cutting-Edge, Open-Source Security Operations Too...Brandon DeVault
 

Similar to Detecting Hacks: Anomaly Detection on Networking Data (20)

Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 
Swisscom Network Analytics
Swisscom Network AnalyticsSwisscom Network Analytics
Swisscom Network Analytics
 
Model-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data AnalyticsModel-driven Telemetry: The Foundation of Big Data Analytics
Model-driven Telemetry: The Foundation of Big Data Analytics
 
Swisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdf
Swisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdfSwisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdf
Swisscom Network Analytics Data Mesh Architecture - ETH Viscon - 10-2022.pdf
 
Botprobe - Reducing network threat intelligence big data
Botprobe - Reducing network threat intelligence big data Botprobe - Reducing network threat intelligence big data
Botprobe - Reducing network threat intelligence big data
 
breed_python_tx_redacted
breed_python_tx_redactedbreed_python_tx_redacted
breed_python_tx_redacted
 
Network Telemetry: Pushing Boundaries
Network Telemetry: Pushing BoundariesNetwork Telemetry: Pushing Boundaries
Network Telemetry: Pushing Boundaries
 
Reactive Stream Processing in Industrial IoT using DDS and Rx
Reactive Stream Processing in Industrial IoT using DDS and RxReactive Stream Processing in Industrial IoT using DDS and Rx
Reactive Stream Processing in Industrial IoT using DDS and Rx
 
Model driven telemetry
Model driven telemetryModel driven telemetry
Model driven telemetry
 
Monitoring ICS Communications
Monitoring ICS CommunicationsMonitoring ICS Communications
Monitoring ICS Communications
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at NetflixYOW2018 Cloud Performance Root Cause Analysis at Netflix
YOW2018 Cloud Performance Root Cause Analysis at Netflix
 
ONF & iSDX Webinar
ONF & iSDX WebinarONF & iSDX Webinar
ONF & iSDX Webinar
 
Core intel
Core intelCore intel
Core intel
 
ING CoreIntel - collect and process network logs across data centers in near ...
ING CoreIntel - collect and process network logs across data centers in near ...ING CoreIntel - collect and process network logs across data centers in near ...
ING CoreIntel - collect and process network logs across data centers in near ...
 
Introduction To NIDS
Introduction To NIDSIntroduction To NIDS
Introduction To NIDS
 
TechWiseTV Workshop: Encrypted Traffic Analytics
TechWiseTV Workshop: Encrypted Traffic Analytics TechWiseTV Workshop: Encrypted Traffic Analytics
TechWiseTV Workshop: Encrypted Traffic Analytics
 
Proposal for System Analysis and Desing
Proposal for System Analysis and DesingProposal for System Analysis and Desing
Proposal for System Analysis and Desing
 
Shaping a Digital Vision
Shaping a Digital VisionShaping a Digital Vision
Shaping a Digital Vision
 
MMIX Peering Forum and MMNOG 2020: Packet Analysis for Network Security
MMIX Peering Forum and MMNOG 2020: Packet Analysis for Network SecurityMMIX Peering Forum and MMNOG 2020: Packet Analysis for Network Security
MMIX Peering Forum and MMNOG 2020: Packet Analysis for Network Security
 
Toolkit Titans - Crafting a Cutting-Edge, Open-Source Security Operations Too...
Toolkit Titans - Crafting a Cutting-Edge, Open-Source Security Operations Too...Toolkit Titans - Crafting a Cutting-Edge, Open-Source Security Operations Too...
Toolkit Titans - Crafting a Cutting-Edge, Open-Source Security Operations Too...
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 

Recently uploaded (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 

Detecting Hacks: Anomaly Detection on Networking Data

  • 1. 1© 2010 Cisco and/or its affiliates. All rights reserved. Detecting Hacks: Anomaly Detection on Networking Data James Sirota (@JamesSirota) Lead Data Scientist – Managed Threat Defense Chester Parrott (@ParrottSquawk) Data Scientist – Managed Threat Defense June 2015
  • 2. © 2015 Cisco and/or its affiliates. All rights reserved. 2 In the next few minutes… • Defense in Depth for Big Data • Network Anomaly Detection Overview • Volume Anomaly Detection • Feature Anomaly Detection • Model Architecture • Deployment on OpenSOC Platform • Questions
  • 3. © 2015 Cisco and/or its affiliates. All rights reserved. 3 Who are we? Big Data Security Analytics Open Source Managed Service
  • 4. © 2015 Cisco and/or its affiliates. All rights reserved. 4 The New Defense-In-Depth Defense Strategy Static Sandboxing Threat Intel Feeds Rules Engines Volume- Based Feature- Based NLP-Based Token Clustering User Profiling Asset Profiling Interaction Profiling Dynamic Sandboxing Malware Classifiers Script Classifiers Perimeter Monitoring Web Scraping Soc. Media Analytics Model Validators Training Set Generation Signature Matching Rules- Based Matching Network Anomaly Detection Log Anomaly Detection Behavioral Anomaly Detection Malware Family Script Family Scraping Honeypots Misuse Detection Intrusion Detection Supervised Class. Look- Ahead Analytics Legacy Mindset Generic Threats Targeted Threats Future Threats
  • 5. © 2015 Cisco and/or its affiliates. All rights reserved. 5 Network Anomaly Detection Network Anomaly Detection Volume- Based Feature- Based Statistical Process Control Frequency Domain Time series Forecasting Information Theory Principal Component Analysis Sketch- Based 3-sigma algorithms Exponential Smoothing ARIMA Fast Fourier Transform Wavelets Entropy Subspace Heavy Hitters Set Cardinality Probability Models Markov Models Bayes Nets Unsupervis ed ML Clustering Density Proximity Anomalous Traffic Patterns Interrelationships between Features
  • 6. © 2015 Cisco and/or its affiliates. All rights reserved. 6 Volume-Based vs. Feature Based Telemetry Volume-Based Feature-Based Encrypted Traffic (Raw Packet) YES NO Raw Packet + Header Metadata YES YES Machine Exhaust Data YES (online) NO DPI Metadata NO YES Netflow YES YES Enrichment Metadata YES YES Application Logs YES YES Other Alerts NO* YES
  • 7. © 2015 Cisco and/or its affiliates. All rights reserved. 7 Anomaly Detection: 3-Phase Process Unstructured Data Identify Anomaly Classify Alert Examine + Reinforce Training Set Historical Context
  • 8. © 2015 Cisco and/or its affiliates. All rights reserved. 8 Phase 1: Identify Unstructured Data Understandingof Normal Anomaly A Anomaly B Anomaly C Anomaly (N)
  • 9. © 2015 Cisco and/or its affiliates. All rights reserved. 9 Phase 2: Classify Full Packet Telemetry DPI Telemetry Telemetry (N) Outcome Volume Anomaly Entropy Anomaly Feature (x) Heavy Hitters Anomaly Volume Anomaly Cardinality Anomaly Feature (x) Protocol Anomaly Featur(x) Anomaly (A) Anomaly (B) Anomaly (N) Class Label x x x x x x x Port Scan x x x x x False Positive x x x x Network Scan x x x x Port Scan x x x x False Positive x x x x x x DDoS
  • 10. © 2015 Cisco and/or its affiliates. All rights reserved. 10 Phase 3: Examine + Reinforce Full Packet Telemetry DPI Telemetry Telemetry (N) Outcome Volume Anomaly Entropy Anomaly Feature (x) Heavy Hitters Anomaly Volume Anomaly Cardinality Anomaly Feature (x) Protocol Anomaly Featur(x) Anomaly (A) Anomaly (B) Anomaly (N) Class Label x x x x x x x Port Scan x x x x x False Positive x x x x Network Scan x x x x False Positive x x x x x x DDoS x x x x x x False Positive x x x x x x False Positive x x x x False Positive x x x x x x DDoS
  • 11. © 2015 Cisco and/or its affiliates. All rights reserved. 11 Basic Anomalies Anomaly Definition Alpha Flows Large volume point-to-point flows DoS Denial of service (distributed or single source) Flash Crowd Large volume of traffic to a single destination from a large number of sources Port Scan Probe to many destination ports on a small number of destination addresses Network Scan Probe to many destination addresses on a small number of destination ports Outage Events Traffic shifts because of equipment failures or maintenance Plateau Behavior Behavior caused by traffic reaching environmental limits Point-to-Multipoint Traffic from a single source to many destinations, e.g., content distribution Worms Scanning by worms for vulnerable hosts, which is a special case of network scan
  • 12. © 2015 Cisco and/or its affiliates. All rights reserved. 12 Batch Analytics Normalcy Models
  • 13. © 2015 Cisco and/or its affiliates. All rights reserved. 13 Implementation MAP MAP MAP Time Series DB Key: assetID-metricID-Bin RED RED RED Asset Bin Value Server 1 15 5pt * Server 2 15 5pt * Server (N) 15 5pt * assetID-metricID-Bin : 5pt Telemetry Anomaly? * 5-point summary (5pt): 1. the sample minimum (smallest observation) 2. the lower quartile or first quartile 3. the median (middle value) 4. the upper quartile or third quartile 5. the sample maximum (largest observation) Table Name: Metric ID (Cumulative Volume)
  • 14. © 2015 Cisco and/or its affiliates. All rights reserved. 14 Batch Analytics Forecasting Models Forecast Forecasting Algorithm (ARIMA/Holt-Winters, …)
  • 15. © 2015 Cisco and/or its affiliates. All rights reserved. 15 Implementation MAP MAP MAP Time Series DB Key: assetID-metricID-Bin RED RED RED Key: assetID-metricID-Bin: [Expected | STD] Telemetry Anomaly? Asset Bin Value Server 1 15 EX |STD Server 2 15 EX |STD Server (N) 15 EX |STD Table Name: Metric ID (Cumulative Volume)
  • 16. © 2015 Cisco and/or its affiliates. All rights reserved. 16 Time Series DB Batch Model Deployment Step 1: Bootstrap: Stream Data Unstructured Data OpenSOC OpenSOC JSON Step 2: Pre-Compute Expected Values (Batch) Timestamp HIVE Time Series DB MR/SparkMR/SparkMR/Spark Step 3: Generate Alerts (Online) Unstructured Data OpenSOC Expected Values Reference Cache Time Series DB OpenSOC JSON Timestamp HIVE Alert ES Expected Values Reference Cache
  • 17. © 2015 Cisco and/or its affiliates. All rights reserved. 17 Online Analytics Data Preparation Deseasonalizer AV CMA RAT UF RF DV
  • 18. © 2015 Cisco and/or its affiliates. All rights reserved. 18 Online Analytics Other things to check for Trend: Seasonal Variability: Evolution of Regularities:
  • 19. © 2015 Cisco and/or its affiliates. All rights reserved. 19 Online Processing 3-Sigma Algorithms Micro Forecasting Histogram Bins
  • 20. © 2015 Cisco and/or its affiliates. All rights reserved. 20 Frequency Domain High • Trendless • Noise • Spikes represent Anomalies Medium • Flatter • Finer-grained Trends Low • Seasonal & ‘Peaky’ • Weekly/Daily Trends
  • 21. © 2015 Cisco and/or its affiliates. All rights reserved. 21 Frequency Domain – Wavelet Separation
  • 22. © 2015 Cisco and/or its affiliates. All rights reserved. 22 Online Model Deployment Time Series DB Step 1: Bootstrap: Stream Data Unstructured Data OpenSOC OpenSOC JSON Step 2: Generate Adjuster Timestamp HIVE Time Series DB MR/Spark Adjuster / Decomposer Step 3: Generate Alerts (Online) Unstructured Data OpenSOC Time Series DB OpenSOC JSON Timestamp HIVE Alert ES Adjuster Decomposer MR/Spark MR/Spark
  • 23. © 2015 Cisco and/or its affiliates. All rights reserved. 23 Feature-Based Anomaly Detection Continuous Numeric Features* • Continuous Numeric Feature - can take on any value between its minimum value and its maximum value • Normalization - adjusting values measured on different scales to a notionally common scale 1. Proximity Based Techniques Example: K-Nearest Neighbors (KNN) 2. Clustering Example: K-Means 3. Density - Based MPS Anomaly KBps Anomaly Possible Explanation TOO HIGH TOO LOW Port Scan Network Scan TOO HIGH TOO HIGH DDoS TOO LOW TOO HIGH Control Traffic Anomaly OK OK No Anomaly Sample Anomalies Detected
  • 24. © 2015 Cisco and/or its affiliates. All rights reserved. 24 Feature-Based Anomaly Detection Categorical Features * • Categorical Features - can take on one of a limited, and usually fixed, number of possible values • Stream Sketch - algorithm produces an approximate answer based on a summary of the data stream in memory Example: Protocol {UDP|FTP|HTTP|…}, GEO-MET {PHOENIX | DALLAS | LONDON| …}, … Count-Min (CM) Sketch : number of occurrences of the element in a stream (Heavy Hitters) Why not count? Protocol: 42k elements per asset. GeoMet: 246k per asset Time Series DBCategorical Data CM Sketch Heavy Hitters Asset Bin Value Server 1 15 HH Server 2 15 HH Server (N) 15 HH MR Table Name: Protocol Unstructured Data CM Sketch Alert Expected: {HTTP, UDP, FTP, DNS} ACTUAL: {DNS, ICMP, HTP, FTP}
  • 25. © 2015 Cisco and/or its affiliates. All rights reserved. 25 Feature-Based Anomaly Detection Feature Ratios HyperLogLog: approximating the number of distinct elements in a multiset Useful Ratio: # distinct elements / total elements [0-1] • Digest- structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means Unstructured Data Hyper LogLog Distinct Src_port Dst_port Src_ip Dst_ip Storm Bolt Src_port Dst_port Src_ip Dst_ip Ack Total Ratios Digest * Alert FEATURE DT RATIO Anomaly Possible Reason SRC_IP ~1/~0 Flash Crowd/DDoS SRC_PORT ~1/~0 Failure Probing/App Hijack DST_IP ~1/~0 Network Scan/DDoS DST_PORT ~1/~0 Port Scan/Footprinting
  • 26. © 2015 Cisco and/or its affiliates. All rights reserved. 26 Feature-Based Anomaly Detection Correlation - Information Theory • Information Theory - study of fundamental limits on signal processing, compression, and storage • Entropy- a measure of unpredictability of information content Unstructured Data Anomaly-Free Training Set Entropy Summarizer Entropy Src_port Dst_port Src_ip Dst_ipTime Bin (n) SRC_I P SRC_POR T DST_I P DST_PORT SRC_IP - .95 .85 .75 SRC_PORT - .97 .76 DST_IP - - - .98 DST_PORT - - - - MR Alert Time Bin (n)
  • 27. © 2015 Cisco and/or its affiliates. All rights reserved. 27 Principal Component Analysis (PCA) Analysis Component Principal • Feature Selection Algorithm • Dimensionality Reduction • E.g. 4 features • ServerA (A) • ServerB (B) • ServerC (C) • Cumulative = A + B + C
  • 28. © 2015 Cisco and/or its affiliates. All rights reserved. 28 PCA – Component Construction ServerA Traffic X -0.5052803 ServerB Traffic X -0.4990556 ServerC Traffic X -0.4816276 Cumulative X -0.5134882 PC1 σ: 0.0135 ServerA Traffic X 0.2801275 ServerB Traffic X 0.4611079 ServerC Traffic X -0.8395562 Cumulative X 0.0636666 PC2 σ: 0.5773 ServerA Traffic X 0.6867089 ServerB Traffic X -0.6988557 ServerC Traffic X -0.1441834 Cumulative X 0.138718 PC3 σ: 0.5773 ServerA Traffic X -0.4411929 ServerB Traffic X -0.2234362 ServerC Traffic X -0.2058916 Cumulative X 0.8444132 PC4 σ: 0.5773
  • 29. © 2015 Cisco and/or its affiliates. All rights reserved. 29 PCA – Component Separation
  • 30. © 2015 Cisco and/or its affiliates. All rights reserved. 30 PCA – Component Separation
  • 31. © 2015 Cisco and/or its affiliates. All rights reserved. 31 Putting it All Together: OpenSOC RAW Transform Enrich Alert (Rules-Based) Enriched Filter Aggregators Router Model 1 Scorer HIVE + Hbase Long-Term Data Store Flume Kafka Storm Model 2 Model n OpenSOC-Streaming OpenSOC-Aggregation OpenSOC-ML SOC Alert Consumers UIUIUI UIUIWeb Services Secure Gateway Services External Alert Consumers Big Data Stores Elastic Search Real-Time Index and Search Hbase OpenTSDB Titan Graph Alerts ES/HIVE Alerts Store Remedy Ticketing System
  • 32. © 2015 Cisco and/or its affiliates. All rights reserved. 32 We are hiring… • Data Scientists (Security) • Aspiring Data Scientists • Security/Networking Experience Required • Software Engineering Experience Required • PhD not required • Background in stats or ML not required • Security Researchers *Please contact us via LinkedIn with your profile
  • 33. © 2015 Cisco and/or its affiliates. All rights reserved. 33 Book idea… Security Analytics on Hadoop • Anomaly Detection • Targeted Models • Deployment Best Practices • Alerts • Visualization Techniques • Etc… If interested in contributing please contact James Sirota on LinkedIn

Editor's Notes

  1. Views data as a continuous signal Split signal into frequency bands High – short term spikes Low – long term trends Fourier Transforms (FFT) Classically based & rigid Has inverse function Original signal can be reconstructed Wavelet Analysis More recently studied Handles discontinuities and spikes better than FFT Preferred over FFT FFT * Fourier analysis is the process of decomposing a complex periodic waveform into a set of sinusoids with different amplitudes, frequencies and phases. The sum of these sinusoids can exactly match the original waveform. This lossless transform presents a new perspective of the signal under study (in the frequency do- main), which has proved useful in very many applications. * The Inverse Discrete Fourier Transform (IDFT) is used to reconstruct the signal in the time domain; DFT and IDFT can be efficiently implemented by using the FFT. * filter out the low frequency components in the link traffic time series. In general, low frequency components capture the daily and weekly traffic patterns, while high frequency components represent the sudden changes in traffic behavior. Wavelet Analysis * Considered superior to traditional Fourier methods where signal contains transients such as discontinuities and sharp spikes * Wavelet techniques are one of the most up-to-date modeling tools to exploit both non-stationary and long-range dependence. ftp://net9.cs.utexas.edu/pub/techreports/tr05-38.pdf (Comparison of PCA to FFT & Wavelets & ARIMA) http://cegroup.ece.tamu.edu/techpubs/2003/TAMU-ECE-2003-03.pdf (wavelet feature based, batch/real-time, correlation of port-numbers) http://www.cs.cmu.edu/~srini/15-744/readings/BKPR02.pdf (wavelet signal based; generic features, exposes anomalies even when nested in large amounts of other traffic [noise]) http://www.cse.sc.edu/~huangct/wens06.pdf (framework paper, show various feature-based wavelets and their ability to detect anomalies) http://pages.cs.wisc.edu/~pb/paper_imw_02.pdf (wavelets & spline filters) http://users.ece.gatech.edu/~jic/anomaly-book-chap-09.pdf (2.2, background info)
  2. Feature selection algorithm Reduces dimensionality Iteratively select uncorrelated features with most variance Applicable to traffic volume & other features Source/Destination IP Address Source/Destination Ports Packet Size Volume compliments feature distribution Batch method of selecting representative data Sensitive to input parameters due to temporal correlation * Functional Combination of Features Variance (accounts for much of the variability of the data as possible) http://conferences.sigcomm.org/sigcomm/2004/papers/p405-lakhina111.pdf (volume-based, separate into normal component and noisy component [which contains spikes]) http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2006_868.pdfhttp://www.dtic.mil/dtic/tr/fulltext/u2/a465712.pdf (application to intra-network anomaly detection toward addressing scalability concerns, stochastic matrix perturbation theory; claims upper bound to false positives) http://db.cs.berkeley.edu/papers/infocom07-pca.pdf (stochastic matrix perturbation theory, reduces communication cost by 80-90%, allowing smaller time buckets than Lakhina by detecting at the sensor level, volume based) https://ics.forth.gr/netlab/mobile/Bibliography/LoadBalancing/LB/PCA_Anomaly_Deytection.pdf (Sensitivity of PCA to number of principle components, volume based) http://hal.univ-savoie.fr/file/index/docid/620090/filename/infocom2009.pdf (Shows temporal correlation of data breaks PCA to extend work by Ringberg, feature based, uses smoothing filter, shows application to stochastic processes; greater results by removing low—mid frequency trends [daily, weekly]) http://users.ece.gatech.edu/~jic/anomaly-book-chap-09.pdf (2.4, background info) http://www.researchgate.net/profile/Monowar_Bhuyan/publication/260521527_Network_Anomaly_Detection_Methods_Systems_and_Tools/links/00b49539bad485a81b000000.pdf (Recent Survey Paper)
  3. Feature selection algorithm Reduces dimensionality Iteratively select uncorrelated features with most variance Applicable to traffic volume & other features Source/Destination IP Address Source/Destination Ports Packet Size Volume compliments feature distribution Batch method of selecting representative data Sensitive to input parameters due to temporal correlation * Functional Combination of Features Variance (accounts for much of the variability of the data as possible) http://conferences.sigcomm.org/sigcomm/2004/papers/p405-lakhina111.pdf (volume-based, separate into normal component and noisy component [which contains spikes]) http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2006_868.pdfhttp://www.dtic.mil/dtic/tr/fulltext/u2/a465712.pdf (application to intra-network anomaly detection toward addressing scalability concerns, stochastic matrix perturbation theory; claims upper bound to false positives) http://db.cs.berkeley.edu/papers/infocom07-pca.pdf (stochastic matrix perturbation theory, reduces communication cost by 80-90%, allowing smaller time buckets than Lakhina by detecting at the sensor level, volume based) https://ics.forth.gr/netlab/mobile/Bibliography/LoadBalancing/LB/PCA_Anomaly_Deytection.pdf (Sensitivity of PCA to number of principle components, volume based) http://hal.univ-savoie.fr/file/index/docid/620090/filename/infocom2009.pdf (Shows temporal correlation of data breaks PCA to extend work by Ringberg, feature based, uses smoothing filter, shows application to stochastic processes; greater results by removing low—mid frequency trends [daily, weekly]) http://users.ece.gatech.edu/~jic/anomaly-book-chap-09.pdf (2.4, background info) http://www.researchgate.net/profile/Monowar_Bhuyan/publication/260521527_Network_Anomaly_Detection_Methods_Systems_and_Tools/links/00b49539bad485a81b000000.pdf (Recent Survey Paper)
  4. Feature selection algorithm Reduces dimensionality Iteratively select uncorrelated features with most variance Applicable to traffic volume & other features Source/Destination IP Address Source/Destination Ports Packet Size Volume compliments feature distribution Batch method of selecting representative data Sensitive to input parameters due to temporal correlation * Functional Combination of Features Variance (accounts for much of the variability of the data as possible) http://conferences.sigcomm.org/sigcomm/2004/papers/p405-lakhina111.pdf (volume-based, separate into normal component and noisy component [which contains spikes]) http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2006_868.pdfhttp://www.dtic.mil/dtic/tr/fulltext/u2/a465712.pdf (application to intra-network anomaly detection toward addressing scalability concerns, stochastic matrix perturbation theory; claims upper bound to false positives) http://db.cs.berkeley.edu/papers/infocom07-pca.pdf (stochastic matrix perturbation theory, reduces communication cost by 80-90%, allowing smaller time buckets than Lakhina by detecting at the sensor level, volume based) https://ics.forth.gr/netlab/mobile/Bibliography/LoadBalancing/LB/PCA_Anomaly_Deytection.pdf (Sensitivity of PCA to number of principle components, volume based) http://hal.univ-savoie.fr/file/index/docid/620090/filename/infocom2009.pdf (Shows temporal correlation of data breaks PCA to extend work by Ringberg, feature based, uses smoothing filter, shows application to stochastic processes; greater results by removing low—mid frequency trends [daily, weekly]) http://users.ece.gatech.edu/~jic/anomaly-book-chap-09.pdf (2.4, background info) http://www.researchgate.net/profile/Monowar_Bhuyan/publication/260521527_Network_Anomaly_Detection_Methods_Systems_and_Tools/links/00b49539bad485a81b000000.pdf (Recent Survey Paper)
  5. Feature selection algorithm Reduces dimensionality Iteratively select uncorrelated features with most variance Applicable to traffic volume & other features Source/Destination IP Address Source/Destination Ports Packet Size Volume compliments feature distribution Batch method of selecting representative data Sensitive to input parameters due to temporal correlation * Functional Combination of Features Variance (accounts for much of the variability of the data as possible) http://conferences.sigcomm.org/sigcomm/2004/papers/p405-lakhina111.pdf (volume-based, separate into normal component and noisy component [which contains spikes]) http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2006_868.pdfhttp://www.dtic.mil/dtic/tr/fulltext/u2/a465712.pdf (application to intra-network anomaly detection toward addressing scalability concerns, stochastic matrix perturbation theory; claims upper bound to false positives) http://db.cs.berkeley.edu/papers/infocom07-pca.pdf (stochastic matrix perturbation theory, reduces communication cost by 80-90%, allowing smaller time buckets than Lakhina by detecting at the sensor level, volume based) https://ics.forth.gr/netlab/mobile/Bibliography/LoadBalancing/LB/PCA_Anomaly_Deytection.pdf (Sensitivity of PCA to number of principle components, volume based) http://hal.univ-savoie.fr/file/index/docid/620090/filename/infocom2009.pdf (Shows temporal correlation of data breaks PCA to extend work by Ringberg, feature based, uses smoothing filter, shows application to stochastic processes; greater results by removing low—mid frequency trends [daily, weekly]) http://users.ece.gatech.edu/~jic/anomaly-book-chap-09.pdf (2.4, background info) http://www.researchgate.net/profile/Monowar_Bhuyan/publication/260521527_Network_Anomaly_Detection_Methods_Systems_and_Tools/links/00b49539bad485a81b000000.pdf (Recent Survey Paper)