SlideShare a Scribd company logo
1 of 40
Looking out
for
anomalies!
Sevvandi Kandanaarachchi, Rob Hyndman,
Hideya Ochiai, Asha Rao
Why anomalies?
• They tell a different story
• Fraudulent credit card transactions amongst billions of
legitimate transactions
• Computer network intrusions
• Astronomical anomalies – solar flares
• Weather anomalies – tsunamis
• Stock market anomalies – heralding a crash?
• Important to detect anomalies in a timely manner
Current
challenges
AD methods rank observations in terms of
anomalousness
• They don’t identify anomalies
• So, the user needs to define a threshold and
identify anomalies
High false positives
• Do not want an “alarm factory” – confidence in the
system goes down
Parameters need to be defined by the user
• But expert knowledge is needed
Overview
A real
world
application
Computer network security
lookout –
an
anomaly
detection
method
Uses topological data
analysis/persistent homology
Extreme value theory
Kernel density estimates
Sevvandi Kandanaarachchi, Rob Hyndman
Preprint - https://bit.ly/lookoutliers
Lookout – leave one
out kde for outlier
detection
Kernel density estimation(KDE)
• A density estimation technique using kernels
• A set of points on the real line
• Placing the kernel at every point
• Kernel function𝑓 𝑥, ℎ =
1
𝑛ℎ 𝑖 𝐾(
𝑥−𝑋𝑖
ℎ
)
• ℎ - the bandwidth parameter
• https://mathisonian.github.io/kde/
KDE for anomaly detection
• What do we want?
• Anomalies to have much lower kde values than other points.
• Why?
• Because anomalies are in low density regions.
• The literature on bandwidth selection focusses on representing the
data
• Minimize MISE (Mean Integrated Square Error)
• But, this doesn’t work for us.
Bandwidth, KDE and anomalies
• Anomalies in the middle
• Indices 1001 -1010
• Increasing bandwidth of KDE
• Lowest 10 KDE points (their indices)
• Want anomalies to have lowest KDE
0.05 0.2 0.35 0.5 0.65 0.8 0.95 1.1 1.25 1.4
232 232 1010 1010 1006 1006 1006 495 495 495
1010 446 1001 1001 1009 1009 1009 843 843 843
424 1010 1008 1008 1005 1005 1005 486 486 486
359 495 1004 1004 1002 1002 1002 1006 979 166
963 1001 1003 1002 1004 1004 1004 1009 166 979
814 975 1002 1003 1007 1007 1007 1005 948 948
70 1008 1007 1007 1003 1003 1003 1002 964 964
257 799 1006 1006 1008 1001 1001 1004 832 832
511 843 1009 1009 1001 1008 1008 1007 110 147
458 511 1005 1005 1010 1010 1010 1003 147 110
Bandwidth, KDE and anomalies
• The bandwidth minimising
MISE is 0.018
• Increasing bandwidth of KDE
• Lowest 10 KDE points (their indices)
• Want anomalies to have lowest KDE
0.05 0.2 0.35 0.5 0.65 0.8 0.95 1.1 1.25 1.4
232 232 1010 1010 1006 1006 1006 495 495 495
1010 446 1001 1001 1009 1009 1009 843 843 843
424 1010 1008 1008 1005 1005 1005 486 486 486
359 495 1004 1004 1002 1002 1002 1006 979 166
963 1001 1003 1002 1004 1004 1004 1009 166 979
814 975 1002 1003 1007 1007 1007 1005 948 948
70 1008 1007 1007 1003 1003 1003 1002 964 964
257 799 1006 1006 1008 1001 1001 1004 832 832
511 843 1009 1009 1001 1008 1008 1007 110 147
458 511 1005 1005 1010 1010 1010 1003 147 110
So we want a bigger bandwidth
for anomaly detection.
But not too big!
How do we select a bandwidth
appropriate for anomaly
detection?
In comes persistent homology
• Methodology in topological data analysis
Connected components and holes
Dimension 0 – connected components
Dimension 1 - holes
With an anomaly
Dimension 0 – connected components
We are interested in . . .
• The end-point diameter (death
diameters) sequences
• We want the maximum gap
• Diameter that starts the
maximum gap = 𝑑
• ℎ = 5 𝑑 for Epanechnikov
kernel
• Compute the kde values
• Anomalies will have the very low kde values
• We can rank the anomalies using the low kde values
• Low kde – anomalous
• High kde – not anomalous
Using this bandwidth
But, we want to identify anomalies!
Just because the kde is low, is it an
anomaly?
We want to have a cut off!
For that we use Extreme Value
Theory!
EVT – Peak Over Threshold method (POT)
• Pick a threshold – 90%
• Model the exceedences
• Generalized Pareto distribution
Method lookout
• Fit a GPD using the kde values
• Then use the leave one out kde values to determine the probability of
points according to the GPD
• We have a set of probabilities
• Low probabilities are more likely to be anomalies
• Have a pre-defined cut off 𝛼, this is your threshold
• If 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦(𝑥𝑖) < 𝛼, then 𝑥𝑖 is an anomaly.
• So you can identify anomalies.
Example • Lookout outliers with 𝛼 = 0.05
• Outliers Probability
• 1001 0.02344059
• 1002 0.02513530
• 1003 0.02501901
• 1004 0.02504691
• 1005 0.02654359
• 1006 0.02636139
• 1007 0.02625216
• 1008 0.02452614
• 1009 0.02644570
• 1010 0.02283989
Practical advantages of lookout
The user does not need
to specify a bandwidth
parameter
•The user can be
anyone – not
necessarily a
mathematician
EVT based methods
have low false positive
rates
•Attractive for many
applications
•Not an alarm factory
For the mathematician/statistician in me
• Coming together of
• Topological data analysis
• Extreme Value Theory
• Kernel density estimates
• To find anomalies
Anomaly persistence
Anomaly Persistence
• What if a data-point is identified
as an anomaly for different
bandwidth values?
• Visual representation of
anomaly persistence
• Big picture
Application: Computer
Networks Security
Honeyboost: Boosting honeypot performance with data fusion and anomaly
detection – Sevvandi Kandanaarachchi, Hideya Ochiai, Asha Rao
Preprint - https://arxiv.org/abs/2105.02526
LAN Security Monitoring
• ‘LAN-Security Monitoring Device’ to capture suspicious/malicious
activities that happen inside a LAN.
LAN: Local Area Network
LAN-Security Monitoring Device
Though it is not a real camera, it works
like a ‘cyber-space surveillance camera’.
Smartphones
Printer
Smart Appliances
Data Server
it captures all the broadcast packets,
and direct packets to the
monitoring device.
LAN Security Monitoring
• ‘LAN-Security Monitoring Device’ to capture suspicious/ malicious
activities that happen inside a LAN.
LAN: Local Area Network
LAN-Security Monitoring Device
Honeypot - a trap for attackers
Smartphones
Printer
Smart Appliances
Data Server
Honeypot data
• ARP data – a big shout out to everyone (broadcast to the network)
• These nodes do not access the honeypot
• Who has got this address – I need to communicate to you
• Generally not a suspicious activity
• But malicious nodes can also make ARP calls
• TCP and UDP data – targeted at the honeypot
• These nodes have accessed the honeypot using TCP/UDP protocols
• Oooh suspicious!
A bit more on honeypots
• An intruder can be there without accessing the honeypot
• Limited vision of honeypots
• Honeypots are never stand alone security devices
• Identifying anomalous nodes is important - Honeyboost
Generally . . .
• Anomalies detected based on individual packets – packet-based
• Packet features separately for each packet
• Of all the traffic, which packets are anomalous
• Our contribution: we find anomalous nodes – node-based
• Features of nodes using the traffic – using multivariate time series
• Of all the nodes, which nodes are anomalous
Varying-dimensional time series
• Different protocols have different header features
• Finding anomalies from varying dimensional time series
• 200 computers/nodes = 200 varying-dimensional time series
• Which one is anomalous, if at all?
time
Varying-
dimensional time
series for each node
multivariate time
series
Compute features
Window model and process
Feature space for
all nodes
Lookout
time
Varying-
dimensional time
series for each node
multivariate time
series
Timestamp Protocol ARP count ARP
degree
TCP PC1 TCP PC2 UDP PC1 UDP PC2
30 ARP 10 12 0 0 0 0
55 TCP 0 0 -2.15 1.75 0 0
85 UDP 0 0 0 0 3.56 0.45
Node A
multivariate time
series
Compute features
Timest
amp
Protoc
ol
ARP
count
ARP
degree
TCP
PC1
TCP
PC2
UDP
PC1
UDP
PC2
30 ARP 10 12 0 0 0 0
55 TCP 0 0 -2.15 1.75 0 0
85 UDP 0 0 0 0 3.56 0.45
Node A
𝑅17
MV time series for each
node gets transformed to a
point in 𝑅17
Feature space for
all nodes
Features
• The total length of line segments in 𝑅6
• The maximum time difference
• Number of protocols used
• Number of TCP calls/UDP calls
• Total length of line segments in each protocol space
• Line of best fit in in each protocol space
• Sum of errors squared for the line of best fit
TCP PC1
TCP PC2
Findings
• Suspicious nodes that do not
access the honeypot
Feature space for
all nodes
Lookout
This node
does not
access the
honeypot
This node
does not
access the
honeypot
Insights
• Identify some nodes before
they access the honeypot
• Gain insights – find anomalies
and look back at the original
data
• Anomaly has set
suspicious flags – PSH flag
and URG flag
• PSH flag – PUSH flag –
push packet to the
application layer
• URG flag – URGENT flag –
treat packet as urgent?
Why when accessing the
honeypot
• Can be used to derive new
rules
Summary
• Lookout - a EVT based method to find anomalies (using TDA)
• An application in computer network security
• R package lookout is on CRAN
• Both preprints available
• https://bit.ly/lookoutliers
• https://arxiv.org/abs/2105.02526
Thank you!

More Related Content

What's hot

Amaya_Presentation
Amaya_PresentationAmaya_Presentation
Amaya_PresentationIsaias Amaya
 
Datasheet Fluke 96000 Extended Specification. Hubungi PT. Siwali Swantika 021...
Datasheet Fluke 96000 Extended Specification. Hubungi PT. Siwali Swantika 021...Datasheet Fluke 96000 Extended Specification. Hubungi PT. Siwali Swantika 021...
Datasheet Fluke 96000 Extended Specification. Hubungi PT. Siwali Swantika 021...PT. Siwali Swantika
 
Serinus 10-ozone-o3-gas-analyser
Serinus 10-ozone-o3-gas-analyserSerinus 10-ozone-o3-gas-analyser
Serinus 10-ozone-o3-gas-analyserEuropean Tech Serv
 
Katalog agilent-digital-multimeter-L4411 a-system-tridinamika
Katalog agilent-digital-multimeter-L4411 a-system-tridinamikaKatalog agilent-digital-multimeter-L4411 a-system-tridinamika
Katalog agilent-digital-multimeter-L4411 a-system-tridinamikaPT. Tridinamika Jaya Instrument
 
OPINT at a glance
OPINT at a glanceOPINT at a glance
OPINT at a glanceAlon Cohen
 
Datasheet Fluke Automated AC Measurement Standard. Hubungi PT. Siwali Swantik...
Datasheet Fluke Automated AC Measurement Standard. Hubungi PT. Siwali Swantik...Datasheet Fluke Automated AC Measurement Standard. Hubungi PT. Siwali Swantik...
Datasheet Fluke Automated AC Measurement Standard. Hubungi PT. Siwali Swantik...PT. Siwali Swantika
 

What's hot (6)

Amaya_Presentation
Amaya_PresentationAmaya_Presentation
Amaya_Presentation
 
Datasheet Fluke 96000 Extended Specification. Hubungi PT. Siwali Swantika 021...
Datasheet Fluke 96000 Extended Specification. Hubungi PT. Siwali Swantika 021...Datasheet Fluke 96000 Extended Specification. Hubungi PT. Siwali Swantika 021...
Datasheet Fluke 96000 Extended Specification. Hubungi PT. Siwali Swantika 021...
 
Serinus 10-ozone-o3-gas-analyser
Serinus 10-ozone-o3-gas-analyserSerinus 10-ozone-o3-gas-analyser
Serinus 10-ozone-o3-gas-analyser
 
Katalog agilent-digital-multimeter-L4411 a-system-tridinamika
Katalog agilent-digital-multimeter-L4411 a-system-tridinamikaKatalog agilent-digital-multimeter-L4411 a-system-tridinamika
Katalog agilent-digital-multimeter-L4411 a-system-tridinamika
 
OPINT at a glance
OPINT at a glanceOPINT at a glance
OPINT at a glance
 
Datasheet Fluke Automated AC Measurement Standard. Hubungi PT. Siwali Swantik...
Datasheet Fluke Automated AC Measurement Standard. Hubungi PT. Siwali Swantik...Datasheet Fluke Automated AC Measurement Standard. Hubungi PT. Siwali Swantik...
Datasheet Fluke Automated AC Measurement Standard. Hubungi PT. Siwali Swantik...
 

Similar to Anomaly Detection Method Looks Out for Network Intrusions

Mathematics of anomalies
Mathematics of anomaliesMathematics of anomalies
Mathematics of anomaliesCSIRO
 
Here is the anomalow-down!
Here is the anomalow-down!Here is the anomalow-down!
Here is the anomalow-down!CSIRO
 
Ntc 362 effective communication uopstudy.com
Ntc 362 effective communication   uopstudy.comNtc 362 effective communication   uopstudy.com
Ntc 362 effective communication uopstudy.comULLPTT
 
Ntc 362 forecasting and strategic planning -uopstudy.com
Ntc 362 forecasting and strategic planning -uopstudy.comNtc 362 forecasting and strategic planning -uopstudy.com
Ntc 362 forecasting and strategic planning -uopstudy.comULLPTT
 
ハイブリッドLoRa-BLEモジュールとTTN対応キャリアグレードLoRaWANゲートウェイの紹介
ハイブリッドLoRa-BLEモジュールとTTN対応キャリアグレードLoRaWANゲートウェイの紹介ハイブリッドLoRa-BLEモジュールとTTN対応キャリアグレードLoRaWANゲートウェイの紹介
ハイブリッドLoRa-BLEモジュールとTTN対応キャリアグレードLoRaWANゲートウェイの紹介ShinsukeAiki1
 
60 hz Electromagnetic Field Detection-Interface System
60 hz Electromagnetic Field Detection-Interface System60 hz Electromagnetic Field Detection-Interface System
60 hz Electromagnetic Field Detection-Interface SystemGaurav Jaina
 
Measuring IPv6 Performance, RIPE73
Measuring IPv6 Performance, RIPE73Measuring IPv6 Performance, RIPE73
Measuring IPv6 Performance, RIPE73APNIC
 
Network State Awareness & Troubleshooting
Network State Awareness & TroubleshootingNetwork State Awareness & Troubleshooting
Network State Awareness & TroubleshootingAPNIC
 
(BDT207) Real-Time Analytics In Service Of Self-Healing Ecosystems
(BDT207) Real-Time Analytics In Service Of Self-Healing Ecosystems(BDT207) Real-Time Analytics In Service Of Self-Healing Ecosystems
(BDT207) Real-Time Analytics In Service Of Self-Healing EcosystemsAmazon Web Services
 
LoRa online training for utility guys
LoRa online training for utility guysLoRa online training for utility guys
LoRa online training for utility guysNikolay Milovanov
 
Advance Portable & Low Cost 3 Lead ECG(1).pptx
Advance Portable & Low Cost 3 Lead ECG(1).pptxAdvance Portable & Low Cost 3 Lead ECG(1).pptx
Advance Portable & Low Cost 3 Lead ECG(1).pptxMdSazzad28
 
Compromising Industrial Facilities From 40 Miles Away
Compromising Industrial Facilities From 40 Miles AwayCompromising Industrial Facilities From 40 Miles Away
Compromising Industrial Facilities From 40 Miles AwayEnergySec
 
InternEncoderPresentation
InternEncoderPresentationInternEncoderPresentation
InternEncoderPresentationClayton Monahan
 
adaptive_ecg_cdr_edittedforpublic.pptx
adaptive_ecg_cdr_edittedforpublic.pptxadaptive_ecg_cdr_edittedforpublic.pptx
adaptive_ecg_cdr_edittedforpublic.pptxssuser6f1a8e1
 
Accurate Synchronization of EtherCAT Systems Using Distributed Clocks
Accurate Synchronization of EtherCAT Systems Using Distributed ClocksAccurate Synchronization of EtherCAT Systems Using Distributed Clocks
Accurate Synchronization of EtherCAT Systems Using Distributed ClocksDesign World
 
Introduction_to_Mechatronics_Chapter4.pdf
Introduction_to_Mechatronics_Chapter4.pdfIntroduction_to_Mechatronics_Chapter4.pdf
Introduction_to_Mechatronics_Chapter4.pdfBereket Walle
 

Similar to Anomaly Detection Method Looks Out for Network Intrusions (20)

Mathematics of anomalies
Mathematics of anomaliesMathematics of anomalies
Mathematics of anomalies
 
Raptor codes
Raptor codesRaptor codes
Raptor codes
 
Here is the anomalow-down!
Here is the anomalow-down!Here is the anomalow-down!
Here is the anomalow-down!
 
A_Seyedolhosseini_Tir_95_1
A_Seyedolhosseini_Tir_95_1A_Seyedolhosseini_Tir_95_1
A_Seyedolhosseini_Tir_95_1
 
Ntc 362 effective communication uopstudy.com
Ntc 362 effective communication   uopstudy.comNtc 362 effective communication   uopstudy.com
Ntc 362 effective communication uopstudy.com
 
Ntc 362 forecasting and strategic planning -uopstudy.com
Ntc 362 forecasting and strategic planning -uopstudy.comNtc 362 forecasting and strategic planning -uopstudy.com
Ntc 362 forecasting and strategic planning -uopstudy.com
 
ハイブリッドLoRa-BLEモジュールとTTN対応キャリアグレードLoRaWANゲートウェイの紹介
ハイブリッドLoRa-BLEモジュールとTTN対応キャリアグレードLoRaWANゲートウェイの紹介ハイブリッドLoRa-BLEモジュールとTTN対応キャリアグレードLoRaWANゲートウェイの紹介
ハイブリッドLoRa-BLEモジュールとTTN対応キャリアグレードLoRaWANゲートウェイの紹介
 
ROBOTICS - Introduction to Robotics Microcontroller
ROBOTICS -  Introduction to Robotics MicrocontrollerROBOTICS -  Introduction to Robotics Microcontroller
ROBOTICS - Introduction to Robotics Microcontroller
 
60 hz Electromagnetic Field Detection-Interface System
60 hz Electromagnetic Field Detection-Interface System60 hz Electromagnetic Field Detection-Interface System
60 hz Electromagnetic Field Detection-Interface System
 
Measuring IPv6 Performance, RIPE73
Measuring IPv6 Performance, RIPE73Measuring IPv6 Performance, RIPE73
Measuring IPv6 Performance, RIPE73
 
Network State Awareness & Troubleshooting
Network State Awareness & TroubleshootingNetwork State Awareness & Troubleshooting
Network State Awareness & Troubleshooting
 
MSc_thesis_defence
MSc_thesis_defenceMSc_thesis_defence
MSc_thesis_defence
 
(BDT207) Real-Time Analytics In Service Of Self-Healing Ecosystems
(BDT207) Real-Time Analytics In Service Of Self-Healing Ecosystems(BDT207) Real-Time Analytics In Service Of Self-Healing Ecosystems
(BDT207) Real-Time Analytics In Service Of Self-Healing Ecosystems
 
LoRa online training for utility guys
LoRa online training for utility guysLoRa online training for utility guys
LoRa online training for utility guys
 
Advance Portable & Low Cost 3 Lead ECG(1).pptx
Advance Portable & Low Cost 3 Lead ECG(1).pptxAdvance Portable & Low Cost 3 Lead ECG(1).pptx
Advance Portable & Low Cost 3 Lead ECG(1).pptx
 
Compromising Industrial Facilities From 40 Miles Away
Compromising Industrial Facilities From 40 Miles AwayCompromising Industrial Facilities From 40 Miles Away
Compromising Industrial Facilities From 40 Miles Away
 
InternEncoderPresentation
InternEncoderPresentationInternEncoderPresentation
InternEncoderPresentation
 
adaptive_ecg_cdr_edittedforpublic.pptx
adaptive_ecg_cdr_edittedforpublic.pptxadaptive_ecg_cdr_edittedforpublic.pptx
adaptive_ecg_cdr_edittedforpublic.pptx
 
Accurate Synchronization of EtherCAT Systems Using Distributed Clocks
Accurate Synchronization of EtherCAT Systems Using Distributed ClocksAccurate Synchronization of EtherCAT Systems Using Distributed Clocks
Accurate Synchronization of EtherCAT Systems Using Distributed Clocks
 
Introduction_to_Mechatronics_Chapter4.pdf
Introduction_to_Mechatronics_Chapter4.pdfIntroduction_to_Mechatronics_Chapter4.pdf
Introduction_to_Mechatronics_Chapter4.pdf
 

More from CSIRO

The painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral dataThe painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral dataCSIRO
 
Explainable insights on algorithm performance
Explainable insights on algorithm performanceExplainable insights on algorithm performance
Explainable insights on algorithm performanceCSIRO
 
The painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS dataThe painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS dataCSIRO
 
Sophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data explorationSophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data explorationCSIRO
 
Explainable algorithm evaluation from lessons in education
Explainable algorithm evaluation from lessons in educationExplainable algorithm evaluation from lessons in education
Explainable algorithm evaluation from lessons in educationCSIRO
 
A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?CSIRO
 
Explainable algorithm evaluation.pptx
Explainable algorithm evaluation.pptxExplainable algorithm evaluation.pptx
Explainable algorithm evaluation.pptxCSIRO
 
Anomalous Networks
Anomalous NetworksAnomalous Networks
Anomalous NetworksCSIRO
 
Four, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonFour, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonCSIRO
 
Comparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial dataComparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial dataCSIRO
 
From ensembles to computer networks
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networksCSIRO
 
Algorithm evaluation using Item Response Theory
Algorithm evaluation using Item Response TheoryAlgorithm evaluation using Item Response Theory
Algorithm evaluation using Item Response TheoryCSIRO
 
Getting better at detecting anomalies by using ensembles
Getting better at detecting anomalies by using ensemblesGetting better at detecting anomalies by using ensembles
Getting better at detecting anomalies by using ensemblesCSIRO
 
Evaluating algorithms using Item Response Theory
Evaluating algorithms using Item Response TheoryEvaluating algorithms using Item Response Theory
Evaluating algorithms using Item Response TheoryCSIRO
 
Anomalies! You can't escape them.
Anomalies! You can't escape them.Anomalies! You can't escape them.
Anomalies! You can't escape them.CSIRO
 
Anomalies and events keep us on our toes
Anomalies and events keep us on our toesAnomalies and events keep us on our toes
Anomalies and events keep us on our toesCSIRO
 
Algorithm evaluation using item response theory
Algorithm evaluation using item response theoryAlgorithm evaluation using item response theory
Algorithm evaluation using item response theoryCSIRO
 

More from CSIRO (17)

The painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral dataThe painful removal of tiling artefacts in hypersprectral data
The painful removal of tiling artefacts in hypersprectral data
 
Explainable insights on algorithm performance
Explainable insights on algorithm performanceExplainable insights on algorithm performance
Explainable insights on algorithm performance
 
The painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS dataThe painful removal of tiling artefacts in ToF-SIMS data
The painful removal of tiling artefacts in ToF-SIMS data
 
Sophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data explorationSophisticated tools for spatio-temporal data exploration
Sophisticated tools for spatio-temporal data exploration
 
Explainable algorithm evaluation from lessons in education
Explainable algorithm evaluation from lessons in educationExplainable algorithm evaluation from lessons in education
Explainable algorithm evaluation from lessons in education
 
A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?A time series of networks. Is everything OK? Are there anomalies?
A time series of networks. Is everything OK? Are there anomalies?
 
Explainable algorithm evaluation.pptx
Explainable algorithm evaluation.pptxExplainable algorithm evaluation.pptx
Explainable algorithm evaluation.pptx
 
Anomalous Networks
Anomalous NetworksAnomalous Networks
Anomalous Networks
 
Four, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparisonFour, fast geostatistical methods - a comparison
Four, fast geostatistical methods - a comparison
 
Comparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial dataComparison of geostatistical methods for spatial data
Comparison of geostatistical methods for spatial data
 
From ensembles to computer networks
From ensembles to computer networksFrom ensembles to computer networks
From ensembles to computer networks
 
Algorithm evaluation using Item Response Theory
Algorithm evaluation using Item Response TheoryAlgorithm evaluation using Item Response Theory
Algorithm evaluation using Item Response Theory
 
Getting better at detecting anomalies by using ensembles
Getting better at detecting anomalies by using ensemblesGetting better at detecting anomalies by using ensembles
Getting better at detecting anomalies by using ensembles
 
Evaluating algorithms using Item Response Theory
Evaluating algorithms using Item Response TheoryEvaluating algorithms using Item Response Theory
Evaluating algorithms using Item Response Theory
 
Anomalies! You can't escape them.
Anomalies! You can't escape them.Anomalies! You can't escape them.
Anomalies! You can't escape them.
 
Anomalies and events keep us on our toes
Anomalies and events keep us on our toesAnomalies and events keep us on our toes
Anomalies and events keep us on our toes
 
Algorithm evaluation using item response theory
Algorithm evaluation using item response theoryAlgorithm evaluation using item response theory
Algorithm evaluation using item response theory
 

Recently uploaded

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 

Recently uploaded (20)

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 

Anomaly Detection Method Looks Out for Network Intrusions

  • 1. Looking out for anomalies! Sevvandi Kandanaarachchi, Rob Hyndman, Hideya Ochiai, Asha Rao
  • 2. Why anomalies? • They tell a different story • Fraudulent credit card transactions amongst billions of legitimate transactions • Computer network intrusions • Astronomical anomalies – solar flares • Weather anomalies – tsunamis • Stock market anomalies – heralding a crash? • Important to detect anomalies in a timely manner
  • 3. Current challenges AD methods rank observations in terms of anomalousness • They don’t identify anomalies • So, the user needs to define a threshold and identify anomalies High false positives • Do not want an “alarm factory” – confidence in the system goes down Parameters need to be defined by the user • But expert knowledge is needed
  • 4. Overview A real world application Computer network security lookout – an anomaly detection method Uses topological data analysis/persistent homology Extreme value theory Kernel density estimates
  • 5. Sevvandi Kandanaarachchi, Rob Hyndman Preprint - https://bit.ly/lookoutliers Lookout – leave one out kde for outlier detection
  • 6. Kernel density estimation(KDE) • A density estimation technique using kernels • A set of points on the real line • Placing the kernel at every point • Kernel function𝑓 𝑥, ℎ = 1 𝑛ℎ 𝑖 𝐾( 𝑥−𝑋𝑖 ℎ ) • ℎ - the bandwidth parameter • https://mathisonian.github.io/kde/
  • 7. KDE for anomaly detection • What do we want? • Anomalies to have much lower kde values than other points. • Why? • Because anomalies are in low density regions. • The literature on bandwidth selection focusses on representing the data • Minimize MISE (Mean Integrated Square Error) • But, this doesn’t work for us.
  • 8. Bandwidth, KDE and anomalies • Anomalies in the middle • Indices 1001 -1010 • Increasing bandwidth of KDE • Lowest 10 KDE points (their indices) • Want anomalies to have lowest KDE 0.05 0.2 0.35 0.5 0.65 0.8 0.95 1.1 1.25 1.4 232 232 1010 1010 1006 1006 1006 495 495 495 1010 446 1001 1001 1009 1009 1009 843 843 843 424 1010 1008 1008 1005 1005 1005 486 486 486 359 495 1004 1004 1002 1002 1002 1006 979 166 963 1001 1003 1002 1004 1004 1004 1009 166 979 814 975 1002 1003 1007 1007 1007 1005 948 948 70 1008 1007 1007 1003 1003 1003 1002 964 964 257 799 1006 1006 1008 1001 1001 1004 832 832 511 843 1009 1009 1001 1008 1008 1007 110 147 458 511 1005 1005 1010 1010 1010 1003 147 110
  • 9. Bandwidth, KDE and anomalies • The bandwidth minimising MISE is 0.018 • Increasing bandwidth of KDE • Lowest 10 KDE points (their indices) • Want anomalies to have lowest KDE 0.05 0.2 0.35 0.5 0.65 0.8 0.95 1.1 1.25 1.4 232 232 1010 1010 1006 1006 1006 495 495 495 1010 446 1001 1001 1009 1009 1009 843 843 843 424 1010 1008 1008 1005 1005 1005 486 486 486 359 495 1004 1004 1002 1002 1002 1006 979 166 963 1001 1003 1002 1004 1004 1004 1009 166 979 814 975 1002 1003 1007 1007 1007 1005 948 948 70 1008 1007 1007 1003 1003 1003 1002 964 964 257 799 1006 1006 1008 1001 1001 1004 832 832 511 843 1009 1009 1001 1008 1008 1007 110 147 458 511 1005 1005 1010 1010 1010 1003 147 110
  • 10. So we want a bigger bandwidth for anomaly detection. But not too big!
  • 11. How do we select a bandwidth appropriate for anomaly detection?
  • 12. In comes persistent homology • Methodology in topological data analysis
  • 13. Connected components and holes Dimension 0 – connected components Dimension 1 - holes
  • 14. With an anomaly Dimension 0 – connected components
  • 15. We are interested in . . . • The end-point diameter (death diameters) sequences • We want the maximum gap • Diameter that starts the maximum gap = 𝑑 • ℎ = 5 𝑑 for Epanechnikov kernel
  • 16. • Compute the kde values • Anomalies will have the very low kde values • We can rank the anomalies using the low kde values • Low kde – anomalous • High kde – not anomalous Using this bandwidth
  • 17. But, we want to identify anomalies! Just because the kde is low, is it an anomaly?
  • 18. We want to have a cut off! For that we use Extreme Value Theory!
  • 19. EVT – Peak Over Threshold method (POT) • Pick a threshold – 90% • Model the exceedences • Generalized Pareto distribution
  • 20. Method lookout • Fit a GPD using the kde values • Then use the leave one out kde values to determine the probability of points according to the GPD • We have a set of probabilities • Low probabilities are more likely to be anomalies • Have a pre-defined cut off 𝛼, this is your threshold • If 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦(𝑥𝑖) < 𝛼, then 𝑥𝑖 is an anomaly. • So you can identify anomalies.
  • 21. Example • Lookout outliers with 𝛼 = 0.05 • Outliers Probability • 1001 0.02344059 • 1002 0.02513530 • 1003 0.02501901 • 1004 0.02504691 • 1005 0.02654359 • 1006 0.02636139 • 1007 0.02625216 • 1008 0.02452614 • 1009 0.02644570 • 1010 0.02283989
  • 22. Practical advantages of lookout The user does not need to specify a bandwidth parameter •The user can be anyone – not necessarily a mathematician EVT based methods have low false positive rates •Attractive for many applications •Not an alarm factory
  • 23. For the mathematician/statistician in me • Coming together of • Topological data analysis • Extreme Value Theory • Kernel density estimates • To find anomalies
  • 25. Anomaly Persistence • What if a data-point is identified as an anomaly for different bandwidth values? • Visual representation of anomaly persistence • Big picture
  • 26. Application: Computer Networks Security Honeyboost: Boosting honeypot performance with data fusion and anomaly detection – Sevvandi Kandanaarachchi, Hideya Ochiai, Asha Rao Preprint - https://arxiv.org/abs/2105.02526
  • 27. LAN Security Monitoring • ‘LAN-Security Monitoring Device’ to capture suspicious/malicious activities that happen inside a LAN. LAN: Local Area Network LAN-Security Monitoring Device Though it is not a real camera, it works like a ‘cyber-space surveillance camera’. Smartphones Printer Smart Appliances Data Server it captures all the broadcast packets, and direct packets to the monitoring device.
  • 28. LAN Security Monitoring • ‘LAN-Security Monitoring Device’ to capture suspicious/ malicious activities that happen inside a LAN. LAN: Local Area Network LAN-Security Monitoring Device Honeypot - a trap for attackers Smartphones Printer Smart Appliances Data Server
  • 29. Honeypot data • ARP data – a big shout out to everyone (broadcast to the network) • These nodes do not access the honeypot • Who has got this address – I need to communicate to you • Generally not a suspicious activity • But malicious nodes can also make ARP calls • TCP and UDP data – targeted at the honeypot • These nodes have accessed the honeypot using TCP/UDP protocols • Oooh suspicious!
  • 30. A bit more on honeypots • An intruder can be there without accessing the honeypot • Limited vision of honeypots • Honeypots are never stand alone security devices • Identifying anomalous nodes is important - Honeyboost
  • 31. Generally . . . • Anomalies detected based on individual packets – packet-based • Packet features separately for each packet • Of all the traffic, which packets are anomalous • Our contribution: we find anomalous nodes – node-based • Features of nodes using the traffic – using multivariate time series • Of all the nodes, which nodes are anomalous
  • 32. Varying-dimensional time series • Different protocols have different header features • Finding anomalies from varying dimensional time series • 200 computers/nodes = 200 varying-dimensional time series • Which one is anomalous, if at all? time
  • 33. Varying- dimensional time series for each node multivariate time series Compute features Window model and process Feature space for all nodes Lookout time
  • 34. Varying- dimensional time series for each node multivariate time series Timestamp Protocol ARP count ARP degree TCP PC1 TCP PC2 UDP PC1 UDP PC2 30 ARP 10 12 0 0 0 0 55 TCP 0 0 -2.15 1.75 0 0 85 UDP 0 0 0 0 3.56 0.45 Node A
  • 35. multivariate time series Compute features Timest amp Protoc ol ARP count ARP degree TCP PC1 TCP PC2 UDP PC1 UDP PC2 30 ARP 10 12 0 0 0 0 55 TCP 0 0 -2.15 1.75 0 0 85 UDP 0 0 0 0 3.56 0.45 Node A 𝑅17 MV time series for each node gets transformed to a point in 𝑅17 Feature space for all nodes
  • 36. Features • The total length of line segments in 𝑅6 • The maximum time difference • Number of protocols used • Number of TCP calls/UDP calls • Total length of line segments in each protocol space • Line of best fit in in each protocol space • Sum of errors squared for the line of best fit TCP PC1 TCP PC2
  • 37. Findings • Suspicious nodes that do not access the honeypot Feature space for all nodes Lookout This node does not access the honeypot This node does not access the honeypot
  • 38. Insights • Identify some nodes before they access the honeypot • Gain insights – find anomalies and look back at the original data • Anomaly has set suspicious flags – PSH flag and URG flag • PSH flag – PUSH flag – push packet to the application layer • URG flag – URGENT flag – treat packet as urgent? Why when accessing the honeypot • Can be used to derive new rules
  • 39. Summary • Lookout - a EVT based method to find anomalies (using TDA) • An application in computer network security • R package lookout is on CRAN • Both preprints available • https://bit.ly/lookoutliers • https://arxiv.org/abs/2105.02526