SlideShare a Scribd company logo
1 of 19
Download to read offline
Adam Gibson
adam@skymind.io
● Brief overview of Network Intrusion
● Anomaly Detection in general
● Alternative example as case study
● Why Deep Learning for this? Isn’t it overkill?
● Example walk through
● Demo
● End
Network Intrusion with Deep Learning
● Network Intrusion: Detect the hackers. Finding
malicious traffic.
● Malicious traffic attempting to compromise servers
for financial gain or other reasons
● Traffic could be internal or external
● Bad actors exists in and outside your network
Brief overview of Network Intrusion
● Anomaly Detection: Finding a rare pattern among
normal patterns
● These “patterns” are defined as “any activity over
time”
● Other examples include credit card fraud, identity
theft, or broken hardware
Anomaly Detection in General
Simbox fraud for
telco
● Costs telco over 3 billion yearly
● Route calls for free over a carrier network
● Need to mine raw call detail records to find
● Find and cluster fraudulent CDRs with
autoencoders (unsupervised)
● Beats current rules and supervised based
approaches
● Deep Learning is good at surfacing patterns in
behavior
● We look at the billions of dollars of R&D being spent
by adtech to surface *your* behavior to target you
with ads
● Your behavior can just be “tends to watch cat
pictures on youtube at 2pm” - google wants to know
that
● We typically think of Deep Learning for media
Why Deep Learning for this? Isn’t it overkill?
We start with a real-time data source: logs streaming in from Wifi, Ethernet,
the Network...
Data Source
An NGINX web server is recording HTTP requests and sending them
straight to a stream such as streamsets or kafka
Data Source Data Spout
The data lands in a data lake, a unified file system like MapR's NFS or HDFS,
where it is available for analytics.
Data Source Data Spout
Data Source
At this step, we can begin preprocessing the raw data. Up to and including
this step, everything has run on CPUs, because they involve a constant
stream of small operations.
Data Spout
Data Source
Preprocessing the data includes steps like normalization; standardizing
diverse data to a uniform format; shuffling log columns; running find and
replace; and translating log elements to a binary format.
Data Spout
Data Source Data Spout
From there, the vectorized logs are fed into a neural net in batches, and
each batch is dispatched to a different neural net model on a distributed
cluster.
In this case, we've trained a neural net to detect network intrusion
anomalies as part of a project with Canonical.
Data Source Data Spout
The last three stages, and above all training, run on GPUs.
Training Inference DataVizLake ETL
Data Source Data Spout
From there, the vectorized logs are fed into a neural net in batches, and
each batch is dispatched to a different neural net model on a distributed
cluster.
In this case, we've trained a neural net to detect network intrusion
anomalies as part of a project with Canonical.
New data can be fed to the trained models from multiple sources via
Kafka/MAPR Streams or via a REST API. All connected to TensorRT.
Data Source Data Spout Training Inference DataVizLake ETL
TensorRT
For example, web logs record the user's browser: Internet Explorer, Google Chrome or
Mozilla Firefox. These three browsers have to be represented with numbers in a vector.
The method of representation is called one-hot encoding, where a data scientist will use
three digits and call IE 100, Chrome 010 and Firefox 001. Which is how browsers are
engineered to become numerical features that can be monitored for significance.
They will be added to a vector that contains other information about users, such as the
region in which their IP address is located. Geographic regions, too, are converted into
binary and stacked on top of other log elements.
Over the course of training, neural nets learn how significant or irrelevant those features
are for the detection of anomalies and assign them weights.
Big Data Analytics Tokyo
Big Data Analytics Tokyo

More Related Content

What's hot

Spark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed AwanSpark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed AwanSpark Summit
 
Skymind Open Power Summit ISV Round Table
Skymind Open Power Summit ISV Round TableSkymind Open Power Summit ISV Round Table
Skymind Open Power Summit ISV Round TableAdam Gibson
 
Distributed deep learning
Distributed deep learningDistributed deep learning
Distributed deep learningMehdi Shibahara
 
Dl4j in the wild
Dl4j in the wildDl4j in the wild
Dl4j in the wildAdam Gibson
 
Productionizing dl from the ground up
Productionizing dl from the ground upProductionizing dl from the ground up
Productionizing dl from the ground upAdam Gibson
 
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka Hua Chu
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016MLconf
 
Deep Learning on Apache Spark
Deep Learning on Apache SparkDeep Learning on Apache Spark
Deep Learning on Apache SparkDash Desai
 
Deploy Deep Learning Models with TensorFlow + Lambda
Deploy Deep Learning Models with TensorFlow + LambdaDeploy Deep Learning Models with TensorFlow + Lambda
Deploy Deep Learning Models with TensorFlow + LambdaGreg Werner
 
SKIL - Dl4j in the wild meetup
SKIL - Dl4j in the wild meetupSKIL - Dl4j in the wild meetup
SKIL - Dl4j in the wild meetupAdam Gibson
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad ranaData Con LA
 
Tactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark TogetherTactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark TogetherDatabricks
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learningAmer Ather
 
From R Script to Production Using rsparkling with Navdeep Gill
From R Script to Production Using rsparkling with Navdeep GillFrom R Script to Production Using rsparkling with Navdeep Gill
From R Script to Production Using rsparkling with Navdeep GillDatabricks
 
IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...
IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...
IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...In-Memory Computing Summit
 
Scalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduScalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduJen Aman
 
Improving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch FixImproving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch FixStitch Fix Algorithms
 

What's hot (20)

Spark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed AwanSpark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed Awan
 
Skymind Open Power Summit ISV Round Table
Skymind Open Power Summit ISV Round TableSkymind Open Power Summit ISV Round Table
Skymind Open Power Summit ISV Round Table
 
Distributed deep learning
Distributed deep learningDistributed deep learning
Distributed deep learning
 
Scrappy
ScrappyScrappy
Scrappy
 
Dl4j in the wild
Dl4j in the wildDl4j in the wild
Dl4j in the wild
 
Productionizing dl from the ground up
Productionizing dl from the ground upProductionizing dl from the ground up
Productionizing dl from the ground up
 
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
Deep Learning on Apache Spark
Deep Learning on Apache SparkDeep Learning on Apache Spark
Deep Learning on Apache Spark
 
Deploy Deep Learning Models with TensorFlow + Lambda
Deploy Deep Learning Models with TensorFlow + LambdaDeploy Deep Learning Models with TensorFlow + Lambda
Deploy Deep Learning Models with TensorFlow + Lambda
 
SKIL - Dl4j in the wild meetup
SKIL - Dl4j in the wild meetupSKIL - Dl4j in the wild meetup
SKIL - Dl4j in the wild meetup
 
Lighthouse
LighthouseLighthouse
Lighthouse
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
 
Tactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark TogetherTactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark Together
 
Netflix machine learning
Netflix machine learningNetflix machine learning
Netflix machine learning
 
From R Script to Production Using rsparkling with Navdeep Gill
From R Script to Production Using rsparkling with Navdeep GillFrom R Script to Production Using rsparkling with Navdeep Gill
From R Script to Production Using rsparkling with Navdeep Gill
 
IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...
IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...
IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...
 
Scalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In BaiduScalable Deep Learning Platform On Spark In Baidu
Scalable Deep Learning Platform On Spark In Baidu
 
Optimizing Spark
Optimizing SparkOptimizing Spark
Optimizing Spark
 
Improving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch FixImproving ad hoc and production workflows at Stitch Fix
Improving ad hoc and production workflows at Stitch Fix
 

Viewers also liked

Deep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayDeep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayAdam Gibson
 
Distributed deep rl on spark strata singapore
Distributed deep rl on spark   strata singaporeDistributed deep rl on spark   strata singapore
Distributed deep rl on spark strata singaporeAdam Gibson
 
Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016Adam Gibson
 
Deep learning in production with the best
Deep learning in production   with the bestDeep learning in production   with the best
Deep learning in production with the bestAdam Gibson
 
Going bananas with recursion schemes for fixed point data types
Going bananas with recursion schemes for fixed point data typesGoing bananas with recursion schemes for fixed point data types
Going bananas with recursion schemes for fixed point data typesPawel Szulc
 
Van laarhoven lens
Van laarhoven lensVan laarhoven lens
Van laarhoven lensNaoki Aoyama
 
Reducing Boilerplate and Combining Effects: A Monad Transformer Example
Reducing Boilerplate and Combining Effects: A Monad Transformer ExampleReducing Boilerplate and Combining Effects: A Monad Transformer Example
Reducing Boilerplate and Combining Effects: A Monad Transformer ExampleConnie Chen
 
Preparing for distributed system failures using akka #ScalaMatsuri
Preparing for distributed system failures using akka #ScalaMatsuriPreparing for distributed system failures using akka #ScalaMatsuri
Preparing for distributed system failures using akka #ScalaMatsuriTIS Inc.
 
Akka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming WorldAkka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming WorldKonrad Malawski
 
The state of sbt 0.13, sbt server, and sbt 1.0 (ScalaMatsuri ver)
The state of sbt 0.13, sbt server, and sbt 1.0 (ScalaMatsuri ver)The state of sbt 0.13, sbt server, and sbt 1.0 (ScalaMatsuri ver)
The state of sbt 0.13, sbt server, and sbt 1.0 (ScalaMatsuri ver)Eugene Yokota
 
Akka Cluster and Auto-scaling
Akka Cluster and Auto-scalingAkka Cluster and Auto-scaling
Akka Cluster and Auto-scalingIkuo Matsumura
 
Deadly Code! (seriously) Blocking & Hyper Context Switching Pattern
Deadly Code! (seriously) Blocking & Hyper Context Switching PatternDeadly Code! (seriously) Blocking & Hyper Context Switching Pattern
Deadly Code! (seriously) Blocking & Hyper Context Switching Patternchibochibo
 
Make your programs Free
Make your programs FreeMake your programs Free
Make your programs FreePawel Szulc
 
Scala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.jsScala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.jstakezoe
 
Hadoop summit 2016
Hadoop summit 2016Hadoop summit 2016
Hadoop summit 2016Adam Gibson
 
Recurrent nets and sensors
Recurrent nets and sensorsRecurrent nets and sensors
Recurrent nets and sensorsAdam Gibson
 
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshopDeep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshopHortonworks
 

Viewers also liked (20)

Deep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayDeep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the Bay
 
Distributed deep rl on spark strata singapore
Distributed deep rl on spark   strata singaporeDistributed deep rl on spark   strata singapore
Distributed deep rl on spark strata singapore
 
Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016Wrangleconf Big Data Malaysia 2016
Wrangleconf Big Data Malaysia 2016
 
Deep learning in production with the best
Deep learning in production   with the bestDeep learning in production   with the best
Deep learning in production with the best
 
Going bananas with recursion schemes for fixed point data types
Going bananas with recursion schemes for fixed point data typesGoing bananas with recursion schemes for fixed point data types
Going bananas with recursion schemes for fixed point data types
 
Van laarhoven lens
Van laarhoven lensVan laarhoven lens
Van laarhoven lens
 
Reducing Boilerplate and Combining Effects: A Monad Transformer Example
Reducing Boilerplate and Combining Effects: A Monad Transformer ExampleReducing Boilerplate and Combining Effects: A Monad Transformer Example
Reducing Boilerplate and Combining Effects: A Monad Transformer Example
 
Preparing for distributed system failures using akka #ScalaMatsuri
Preparing for distributed system failures using akka #ScalaMatsuriPreparing for distributed system failures using akka #ScalaMatsuri
Preparing for distributed system failures using akka #ScalaMatsuri
 
Akka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming WorldAkka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming World
 
Pratical eff
Pratical effPratical eff
Pratical eff
 
The state of sbt 0.13, sbt server, and sbt 1.0 (ScalaMatsuri ver)
The state of sbt 0.13, sbt server, and sbt 1.0 (ScalaMatsuri ver)The state of sbt 0.13, sbt server, and sbt 1.0 (ScalaMatsuri ver)
The state of sbt 0.13, sbt server, and sbt 1.0 (ScalaMatsuri ver)
 
Akka Cluster and Auto-scaling
Akka Cluster and Auto-scalingAkka Cluster and Auto-scaling
Akka Cluster and Auto-scaling
 
Scala Matsuri 2017
Scala Matsuri 2017Scala Matsuri 2017
Scala Matsuri 2017
 
Deadly Code! (seriously) Blocking & Hyper Context Switching Pattern
Deadly Code! (seriously) Blocking & Hyper Context Switching PatternDeadly Code! (seriously) Blocking & Hyper Context Switching Pattern
Deadly Code! (seriously) Blocking & Hyper Context Switching Pattern
 
Make your programs Free
Make your programs FreeMake your programs Free
Make your programs Free
 
Scala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.jsScala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.js
 
Hadoop summit 2016
Hadoop summit 2016Hadoop summit 2016
Hadoop summit 2016
 
Recurrent nets and sensors
Recurrent nets and sensorsRecurrent nets and sensors
Recurrent nets and sensors
 
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshopDeep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 

Similar to Big Data Analytics Tokyo

Network security monitoring elastic webinar - 16 june 2021
Network security monitoring   elastic webinar - 16 june 2021Network security monitoring   elastic webinar - 16 june 2021
Network security monitoring elastic webinar - 16 june 2021Mouaz Alnouri
 
Proposal for System Analysis and Desing
Proposal for System Analysis and DesingProposal for System Analysis and Desing
Proposal for System Analysis and DesingMd Khaza Main Uddin
 
Network Project Report
Network Project ReportNetwork Project Report
Network Project ReportTiffany Graham
 
Multi port network ethernet performance improvement techniques
Multi port network ethernet performance improvement techniquesMulti port network ethernet performance improvement techniques
Multi port network ethernet performance improvement techniquesIJARIIT
 
For your final step, you will synthesize the previous steps and la
For your final step, you will synthesize the previous steps and laFor your final step, you will synthesize the previous steps and la
For your final step, you will synthesize the previous steps and laShainaBoling829
 
Network Analysis Mini Project 2.pptx
Network Analysis Mini Project 2.pptxNetwork Analysis Mini Project 2.pptx
Network Analysis Mini Project 2.pptxtalkaton
 
Network Analysis Mini Project 2.pdf
Network Analysis Mini Project 2.pdfNetwork Analysis Mini Project 2.pdf
Network Analysis Mini Project 2.pdftalkaton
 
Network traffic analysis with cyber security
Network traffic analysis with cyber securityNetwork traffic analysis with cyber security
Network traffic analysis with cyber securityKAMALI PRIYA P
 
Lecture notes -001
Lecture notes -001Lecture notes -001
Lecture notes -001Eric Rotich
 
Chapter 1 organizing data vantage domain action and validity
Chapter 1  organizing data  vantage domain action and validityChapter 1  organizing data  vantage domain action and validity
Chapter 1 organizing data vantage domain action and validityPhu Nguyen
 
Security Delivery Platform: Best practices
Security Delivery Platform: Best practicesSecurity Delivery Platform: Best practices
Security Delivery Platform: Best practicesMihajlo Prerad
 
Basic Foundation For Cybersecurity
Basic Foundation For CybersecurityBasic Foundation For Cybersecurity
Basic Foundation For CybersecurityMohammed Adam
 
Dist sniffing & scanning project
Dist sniffing & scanning projectDist sniffing & scanning project
Dist sniffing & scanning projectRishu Seth
 
Introduction to cyber forensics
Introduction to cyber forensicsIntroduction to cyber forensics
Introduction to cyber forensicsAnpumathews
 
Protecting location privacy in sensor networks against a global eavesdropper
Protecting location privacy in sensor networks against a global eavesdropperProtecting location privacy in sensor networks against a global eavesdropper
Protecting location privacy in sensor networks against a global eavesdropperShakas Technologies
 
Protecting location privacy in sensor networks against a global eavesdropper
Protecting location privacy in sensor networks against a global eavesdropperProtecting location privacy in sensor networks against a global eavesdropper
Protecting location privacy in sensor networks against a global eavesdropperShakas Technologies
 
Database techniques for resilient network monitoring and inspection
Database techniques for resilient network monitoring and inspectionDatabase techniques for resilient network monitoring and inspection
Database techniques for resilient network monitoring and inspectionTELKOMNIKA JOURNAL
 
Cyber_Threat_Intelligent_Cyber_Operation_Contest
Cyber_Threat_Intelligent_Cyber_Operation_ContestCyber_Threat_Intelligent_Cyber_Operation_Contest
Cyber_Threat_Intelligent_Cyber_Operation_Contestnkrafacyberclub
 

Similar to Big Data Analytics Tokyo (20)

Network security monitoring elastic webinar - 16 june 2021
Network security monitoring   elastic webinar - 16 june 2021Network security monitoring   elastic webinar - 16 june 2021
Network security monitoring elastic webinar - 16 june 2021
 
Proposal for System Analysis and Desing
Proposal for System Analysis and DesingProposal for System Analysis and Desing
Proposal for System Analysis and Desing
 
Network Project Report
Network Project ReportNetwork Project Report
Network Project Report
 
Multi port network ethernet performance improvement techniques
Multi port network ethernet performance improvement techniquesMulti port network ethernet performance improvement techniques
Multi port network ethernet performance improvement techniques
 
For your final step, you will synthesize the previous steps and la
For your final step, you will synthesize the previous steps and laFor your final step, you will synthesize the previous steps and la
For your final step, you will synthesize the previous steps and la
 
Internet census 2012
Internet census 2012Internet census 2012
Internet census 2012
 
Network Analysis Mini Project 2.pptx
Network Analysis Mini Project 2.pptxNetwork Analysis Mini Project 2.pptx
Network Analysis Mini Project 2.pptx
 
Network Analysis Mini Project 2.pdf
Network Analysis Mini Project 2.pdfNetwork Analysis Mini Project 2.pdf
Network Analysis Mini Project 2.pdf
 
Network traffic analysis with cyber security
Network traffic analysis with cyber securityNetwork traffic analysis with cyber security
Network traffic analysis with cyber security
 
Lecture notes -001
Lecture notes -001Lecture notes -001
Lecture notes -001
 
Forensic tools
Forensic toolsForensic tools
Forensic tools
 
Chapter 1 organizing data vantage domain action and validity
Chapter 1  organizing data  vantage domain action and validityChapter 1  organizing data  vantage domain action and validity
Chapter 1 organizing data vantage domain action and validity
 
Security Delivery Platform: Best practices
Security Delivery Platform: Best practicesSecurity Delivery Platform: Best practices
Security Delivery Platform: Best practices
 
Basic Foundation For Cybersecurity
Basic Foundation For CybersecurityBasic Foundation For Cybersecurity
Basic Foundation For Cybersecurity
 
Dist sniffing & scanning project
Dist sniffing & scanning projectDist sniffing & scanning project
Dist sniffing & scanning project
 
Introduction to cyber forensics
Introduction to cyber forensicsIntroduction to cyber forensics
Introduction to cyber forensics
 
Protecting location privacy in sensor networks against a global eavesdropper
Protecting location privacy in sensor networks against a global eavesdropperProtecting location privacy in sensor networks against a global eavesdropper
Protecting location privacy in sensor networks against a global eavesdropper
 
Protecting location privacy in sensor networks against a global eavesdropper
Protecting location privacy in sensor networks against a global eavesdropperProtecting location privacy in sensor networks against a global eavesdropper
Protecting location privacy in sensor networks against a global eavesdropper
 
Database techniques for resilient network monitoring and inspection
Database techniques for resilient network monitoring and inspectionDatabase techniques for resilient network monitoring and inspection
Database techniques for resilient network monitoring and inspection
 
Cyber_Threat_Intelligent_Cyber_Operation_Contest
Cyber_Threat_Intelligent_Cyber_Operation_ContestCyber_Threat_Intelligent_Cyber_Operation_Contest
Cyber_Threat_Intelligent_Cyber_Operation_Contest
 

More from Adam Gibson

End to end MLworkflows
End to end MLworkflowsEnd to end MLworkflows
End to end MLworkflowsAdam Gibson
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018Adam Gibson
 
Boolan machine learning summit
Boolan machine learning summitBoolan machine learning summit
Boolan machine learning summitAdam Gibson
 
Strata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on SparkStrata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on SparkAdam Gibson
 
Skymind - Udacity China presentation
Skymind - Udacity China presentationSkymind - Udacity China presentation
Skymind - Udacity China presentationAdam Gibson
 
Anomaly Detection in Deep Learning (Updated)
Anomaly Detection in Deep Learning (Updated)Anomaly Detection in Deep Learning (Updated)
Anomaly Detection in Deep Learning (Updated)Adam Gibson
 
Anomaly detection in deep learning
Anomaly detection in deep learningAnomaly detection in deep learning
Anomaly detection in deep learningAdam Gibson
 
Advanced spark deep learning
Advanced spark deep learningAdvanced spark deep learning
Advanced spark deep learningAdam Gibson
 
Nd4 j slides.pptx
Nd4 j slides.pptxNd4 j slides.pptx
Nd4 j slides.pptxAdam Gibson
 
Deep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextMLDeep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextMLAdam Gibson
 
Skymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the EnterpriseSkymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the EnterpriseAdam Gibson
 
Sf data mining_meetup
Sf data mining_meetupSf data mining_meetup
Sf data mining_meetupAdam Gibson
 

More from Adam Gibson (12)

End to end MLworkflows
End to end MLworkflowsEnd to end MLworkflows
End to end MLworkflows
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 
Boolan machine learning summit
Boolan machine learning summitBoolan machine learning summit
Boolan machine learning summit
 
Strata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on SparkStrata Beijing - Deep Learning in Production on Spark
Strata Beijing - Deep Learning in Production on Spark
 
Skymind - Udacity China presentation
Skymind - Udacity China presentationSkymind - Udacity China presentation
Skymind - Udacity China presentation
 
Anomaly Detection in Deep Learning (Updated)
Anomaly Detection in Deep Learning (Updated)Anomaly Detection in Deep Learning (Updated)
Anomaly Detection in Deep Learning (Updated)
 
Anomaly detection in deep learning
Anomaly detection in deep learningAnomaly detection in deep learning
Anomaly detection in deep learning
 
Advanced spark deep learning
Advanced spark deep learningAdvanced spark deep learning
Advanced spark deep learning
 
Nd4 j slides.pptx
Nd4 j slides.pptxNd4 j slides.pptx
Nd4 j slides.pptx
 
Deep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextMLDeep learning on Hadoop/Spark -NextML
Deep learning on Hadoop/Spark -NextML
 
Skymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the EnterpriseSkymind & Deeplearning4j: Deep Learning for the Enterprise
Skymind & Deeplearning4j: Deep Learning for the Enterprise
 
Sf data mining_meetup
Sf data mining_meetupSf data mining_meetup
Sf data mining_meetup
 

Recently uploaded

DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 

Recently uploaded (20)

DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 

Big Data Analytics Tokyo

  • 2. ● Brief overview of Network Intrusion ● Anomaly Detection in general ● Alternative example as case study ● Why Deep Learning for this? Isn’t it overkill? ● Example walk through ● Demo ● End Network Intrusion with Deep Learning
  • 3. ● Network Intrusion: Detect the hackers. Finding malicious traffic. ● Malicious traffic attempting to compromise servers for financial gain or other reasons ● Traffic could be internal or external ● Bad actors exists in and outside your network Brief overview of Network Intrusion
  • 4. ● Anomaly Detection: Finding a rare pattern among normal patterns ● These “patterns” are defined as “any activity over time” ● Other examples include credit card fraud, identity theft, or broken hardware Anomaly Detection in General
  • 5. Simbox fraud for telco ● Costs telco over 3 billion yearly ● Route calls for free over a carrier network ● Need to mine raw call detail records to find ● Find and cluster fraudulent CDRs with autoencoders (unsupervised) ● Beats current rules and supervised based approaches
  • 6. ● Deep Learning is good at surfacing patterns in behavior ● We look at the billions of dollars of R&D being spent by adtech to surface *your* behavior to target you with ads ● Your behavior can just be “tends to watch cat pictures on youtube at 2pm” - google wants to know that ● We typically think of Deep Learning for media Why Deep Learning for this? Isn’t it overkill?
  • 7.
  • 8. We start with a real-time data source: logs streaming in from Wifi, Ethernet, the Network... Data Source
  • 9. An NGINX web server is recording HTTP requests and sending them straight to a stream such as streamsets or kafka Data Source Data Spout
  • 10. The data lands in a data lake, a unified file system like MapR's NFS or HDFS, where it is available for analytics. Data Source Data Spout
  • 11. Data Source At this step, we can begin preprocessing the raw data. Up to and including this step, everything has run on CPUs, because they involve a constant stream of small operations. Data Spout
  • 12. Data Source Preprocessing the data includes steps like normalization; standardizing diverse data to a uniform format; shuffling log columns; running find and replace; and translating log elements to a binary format. Data Spout
  • 13. Data Source Data Spout From there, the vectorized logs are fed into a neural net in batches, and each batch is dispatched to a different neural net model on a distributed cluster. In this case, we've trained a neural net to detect network intrusion anomalies as part of a project with Canonical.
  • 14. Data Source Data Spout The last three stages, and above all training, run on GPUs. Training Inference DataVizLake ETL
  • 15. Data Source Data Spout From there, the vectorized logs are fed into a neural net in batches, and each batch is dispatched to a different neural net model on a distributed cluster. In this case, we've trained a neural net to detect network intrusion anomalies as part of a project with Canonical.
  • 16. New data can be fed to the trained models from multiple sources via Kafka/MAPR Streams or via a REST API. All connected to TensorRT. Data Source Data Spout Training Inference DataVizLake ETL TensorRT
  • 17. For example, web logs record the user's browser: Internet Explorer, Google Chrome or Mozilla Firefox. These three browsers have to be represented with numbers in a vector. The method of representation is called one-hot encoding, where a data scientist will use three digits and call IE 100, Chrome 010 and Firefox 001. Which is how browsers are engineered to become numerical features that can be monitored for significance. They will be added to a vector that contains other information about users, such as the region in which their IP address is located. Geographic regions, too, are converted into binary and stacked on top of other log elements. Over the course of training, neural nets learn how significant or irrelevant those features are for the detection of anomalies and assign them weights.