Big Data Analytics Tokyo

•

1 like•1,048 views

Adam Gibson

Big Data Analytics Tokyo slides: Network Intrusion with deep learning

Data & Analytics

● Brief overview of Network Intrusion
● Anomaly Detection in general
● Alternative example as case study
● Why Deep Learning for this? Isn’t it overkill?
● Example walk through
● Demo
● End
Network Intrusion with Deep Learning

● Network Intrusion: Detect the hackers. Finding
malicious traffic.
● Malicious traffic attempting to compromise servers
for financial gain or other reasons
● Traffic could be internal or external
● Bad actors exists in and outside your network
Brief overview of Network Intrusion

● Anomaly Detection: Finding a rare pattern among
normal patterns
● These “patterns” are defined as “any activity over
time”
● Other examples include credit card fraud, identity
theft, or broken hardware
Anomaly Detection in General

Simbox fraud for
telco
● Costs telco over 3 billion yearly
● Route calls for free over a carrier network
● Need to mine raw call detail records to find
● Find and cluster fraudulent CDRs with
autoencoders (unsupervised)
● Beats current rules and supervised based
approaches

● Deep Learning is good at surfacing patterns in
behavior
● We look at the billions of dollars of R&D being spent
by adtech to surface *your* behavior to target you
with ads
● Your behavior can just be “tends to watch cat
pictures on youtube at 2pm” - google wants to know
that
● We typically think of Deep Learning for media
Why Deep Learning for this? Isn’t it overkill?

We start with a real-time data source: logs streaming in from Wifi, Ethernet,
the Network...
Data Source

An NGINX web server is recording HTTP requests and sending them
straight to a stream such as streamsets or kafka
Data Source Data Spout

The data lands in a data lake, a unified file system like MapR's NFS or HDFS,
where it is available for analytics.
Data Source Data Spout

Data Source
At this step, we can begin preprocessing the raw data. Up to and including
this step, everything has run on CPUs, because they involve a constant
stream of small operations.
Data Spout

Data Source
Preprocessing the data includes steps like normalization; standardizing
diverse data to a uniform format; shuffling log columns; running find and
replace; and translating log elements to a binary format.
Data Spout

Data Source Data Spout
From there, the vectorized logs are fed into a neural net in batches, and
each batch is dispatched to a different neural net model on a distributed
cluster.
In this case, we've trained a neural net to detect network intrusion
anomalies as part of a project with Canonical.

Data Source Data Spout
The last three stages, and above all training, run on GPUs.
Training Inference DataVizLake ETL

New data can be fed to the trained models from multiple sources via
Kafka/MAPR Streams or via a REST API. All connected to TensorRT.
Data Source Data Spout Training Inference DataVizLake ETL
TensorRT

For example, web logs record the user's browser: Internet Explorer, Google Chrome or
Mozilla Firefox. These three browsers have to be represented with numbers in a vector.
The method of representation is called one-hot encoding, where a data scientist will use
three digits and call IE 100, Chrome 010 and Firefox 001. Which is how browsers are
engineered to become numerical features that can be monitored for significance.
They will be added to a vector that contains other information about users, such as the
region in which their IP address is located. Geographic regions, too, are converted into
binary and stacked on top of other log elements.
Over the course of training, neural nets learn how significant or irrelevant those features
are for the detection of anomalies and assign them weights.

What's hot

Spark Summit EU talk by Ahsan Javed AwanSpark Summit

Skymind Open Power Summit ISV Round TableAdam Gibson

Distributed deep learningMehdi Shibahara

ScrappyVishwas N

Dl4j in the wildAdam Gibson

Productionizing dl from the ground upAdam Gibson

PyCon HK 2018 - Heterogeneous job processing with Apache Kafka Hua Chu

Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016MLconf

Deep Learning on Apache SparkDash Desai

Deploy Deep Learning Models with TensorFlow + LambdaGreg Werner

SKIL - Dl4j in the wild meetupAdam Gibson

LighthouseKris Peeters

Impala presentation ahad ranaData Con LA

Tactical Data Science Tips: Python and Spark TogetherDatabricks

Netflix machine learningAmer Ather

From R Script to Production Using rsparkling with Navdeep GillDatabricks

IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...In-Memory Computing Summit

Scalable Deep Learning Platform On Spark In BaiduJen Aman

Optimizing SparkStitch Fix Algorithms

Improving ad hoc and production workflows at Stitch FixStitch Fix Algorithms

What's hot (20)

Spark Summit EU talk by Ahsan Javed Awan

Skymind Open Power Summit ISV Round Table

Distributed deep learning

Scrappy

Dl4j in the wild

Productionizing dl from the ground up

PyCon HK 2018 - Heterogeneous job processing with Apache Kafka

Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016

Deep Learning on Apache Spark

Deploy Deep Learning Models with TensorFlow + Lambda

SKIL - Dl4j in the wild meetup

Lighthouse

Impala presentation ahad rana

Tactical Data Science Tips: Python and Spark Together

Netflix machine learning

From R Script to Production Using rsparkling with Navdeep Gill

IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...

Scalable Deep Learning Platform On Spark In Baidu

Optimizing Spark

Improving ad hoc and production workflows at Stitch Fix

Viewers also liked

Deep Learning with GPUs in Production - AI By the BayAdam Gibson

Distributed deep rl on spark strata singaporeAdam Gibson

Wrangleconf Big Data Malaysia 2016Adam Gibson

Deep learning in production with the bestAdam Gibson

Going bananas with recursion schemes for fixed point data typesPawel Szulc

Van laarhoven lensNaoki Aoyama

Reducing Boilerplate and Combining Effects: A Monad Transformer ExampleConnie Chen

Preparing for distributed system failures using akka #ScalaMatsuriTIS Inc.

Akka-chan's Survival Guide for the Streaming WorldKonrad Malawski

Pratical effEric Torreborre

The state of sbt 0.13, sbt server, and sbt 1.0 (ScalaMatsuri ver)Eugene Yokota

Akka Cluster and Auto-scalingIkuo Matsumura

Scala Matsuri 2017Yoshitaka Fujii

Deadly Code! (seriously) Blocking & Hyper Context Switching Patternchibochibo

Make your programs FreePawel Szulc

Scala Warrior and type-safe front-end development with Scala.jstakezoe

Hadoop summit 2016Adam Gibson

Recurrent nets and sensorsAdam Gibson

Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshopHortonworks

Deep Learning using Spark and DL4J for fun and profitDataWorks Summit/Hadoop Summit

Viewers also liked (20)

Deep Learning with GPUs in Production - AI By the Bay

Distributed deep rl on spark strata singapore

Wrangleconf Big Data Malaysia 2016

Deep learning in production with the best

Going bananas with recursion schemes for fixed point data types

Van laarhoven lens

Reducing Boilerplate and Combining Effects: A Monad Transformer Example

Preparing for distributed system failures using akka #ScalaMatsuri

Akka-chan's Survival Guide for the Streaming World

Pratical eff

The state of sbt 0.13, sbt server, and sbt 1.0 (ScalaMatsuri ver)

Akka Cluster and Auto-scaling

Scala Matsuri 2017

Deadly Code! (seriously) Blocking & Hyper Context Switching Pattern

Make your programs Free

Scala Warrior and type-safe front-end development with Scala.js

Hadoop summit 2016

Recurrent nets and sensors

Deep learning with Hortonworks and Apache Spark - Hortonworks technical workshop

Deep Learning using Spark and DL4J for fun and profit

Similar to Big Data Analytics Tokyo

Network security monitoring elastic webinar - 16 june 2021Mouaz Alnouri

Proposal for System Analysis and DesingMd Khaza Main Uddin

Network Project ReportTiffany Graham

Multi port network ethernet performance improvement techniquesIJARIIT

For your final step, you will synthesize the previous steps and laShainaBoling829

Internet census 2012Giuliano Tavaroli

Network Analysis Mini Project 2.pptxtalkaton

Network Analysis Mini Project 2.pdftalkaton

Network traffic analysis with cyber securityKAMALI PRIYA P

Lecture notes -001Eric Rotich

Forensic toolsVenkata Sreeram

Chapter 1 organizing data vantage domain action and validityPhu Nguyen

Security Delivery Platform: Best practicesMihajlo Prerad

Basic Foundation For CybersecurityMohammed Adam

Dist sniffing & scanning projectRishu Seth

Introduction to cyber forensicsAnpumathews

Protecting location privacy in sensor networks against a global eavesdropperShakas Technologies

Database techniques for resilient network monitoring and inspectionTELKOMNIKA JOURNAL

Cyber_Threat_Intelligent_Cyber_Operation_Contestnkrafacyberclub

Similar to Big Data Analytics Tokyo (20)

Network security monitoring elastic webinar - 16 june 2021

Proposal for System Analysis and Desing

Network Project Report

Multi port network ethernet performance improvement techniques

For your final step, you will synthesize the previous steps and la

Internet census 2012

Network Analysis Mini Project 2.pptx

Network Analysis Mini Project 2.pdf

Network traffic analysis with cyber security

Lecture notes -001

Forensic tools

Chapter 1 organizing data vantage domain action and validity

Security Delivery Platform: Best practices

Basic Foundation For Cybersecurity

Dist sniffing & scanning project

Introduction to cyber forensics

Protecting location privacy in sensor networks against a global eavesdropper

Database techniques for resilient network monitoring and inspection

Cyber_Threat_Intelligent_Cyber_Operation_Contest

Recently uploaded

DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett

Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly

1:1定制(UQ毕业证）昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort

While-For-loop in python used in collegessuser7a7cd61

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375

How we prevented account sharing with MFAAndrei Kaleshka

RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Universitat Politècnica de Catalunya

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss

ASML's Taxonomy Adventure by Daniel Cantervoginip

NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics

INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss

PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava

Recently uploaded (20)

DBA Basics: Getting Started with Performance Tuning.pdf

Generative AI for Social Good at Open Data Science East 2024

1:1定制(UQ毕业证）昆士兰大学毕业证成绩单修改留信学历认证原版一模一样

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service

While-For-loop in python used in college

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...

How we prevented account sharing with MFA

RABBIT: A CLI tool for identifying bots based on their GitHub events.

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree

ASML's Taxonomy Adventure by Daniel Canter

NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理

PKS-TGC-1084-630 - Stage 1 Proposal.pptx

Big Data Analytics Tokyo

1. Adam Gibson adam@skymind.io

2. ● Brief overview of Network Intrusion ● Anomaly Detection in general ● Alternative example as case study ● Why Deep Learning for this? Isn’t it overkill? ● Example walk through ● Demo ● End Network Intrusion with Deep Learning

3. ● Network Intrusion: Detect the hackers. Finding malicious traffic. ● Malicious traffic attempting to compromise servers for financial gain or other reasons ● Traffic could be internal or external ● Bad actors exists in and outside your network Brief overview of Network Intrusion

4. ● Anomaly Detection: Finding a rare pattern among normal patterns ● These “patterns” are defined as “any activity over time” ● Other examples include credit card fraud, identity theft, or broken hardware Anomaly Detection in General

5. Simbox fraud for telco ● Costs telco over 3 billion yearly ● Route calls for free over a carrier network ● Need to mine raw call detail records to find ● Find and cluster fraudulent CDRs with autoencoders (unsupervised) ● Beats current rules and supervised based approaches

6. ● Deep Learning is good at surfacing patterns in behavior ● We look at the billions of dollars of R&D being spent by adtech to surface *your* behavior to target you with ads ● Your behavior can just be “tends to watch cat pictures on youtube at 2pm” - google wants to know that ● We typically think of Deep Learning for media Why Deep Learning for this? Isn’t it overkill?

8. We start with a real-time data source: logs streaming in from Wifi, Ethernet, the Network... Data Source

9. An NGINX web server is recording HTTP requests and sending them straight to a stream such as streamsets or kafka Data Source Data Spout

10. The data lands in a data lake, a unified file system like MapR's NFS or HDFS, where it is available for analytics. Data Source Data Spout

11. Data Source At this step, we can begin preprocessing the raw data. Up to and including this step, everything has run on CPUs, because they involve a constant stream of small operations. Data Spout

12. Data Source Preprocessing the data includes steps like normalization; standardizing diverse data to a uniform format; shuffling log columns; running find and replace; and translating log elements to a binary format. Data Spout

13. Data Source Data Spout From there, the vectorized logs are fed into a neural net in batches, and each batch is dispatched to a different neural net model on a distributed cluster. In this case, we've trained a neural net to detect network intrusion anomalies as part of a project with Canonical.

14. Data Source Data Spout The last three stages, and above all training, run on GPUs. Training Inference DataVizLake ETL

15. Data Source Data Spout From there, the vectorized logs are fed into a neural net in batches, and each batch is dispatched to a different neural net model on a distributed cluster. In this case, we've trained a neural net to detect network intrusion anomalies as part of a project with Canonical.

16. New data can be fed to the trained models from multiple sources via Kafka/MAPR Streams or via a REST API. All connected to TensorRT. Data Source Data Spout Training Inference DataVizLake ETL TensorRT

17. For example, web logs record the user's browser: Internet Explorer, Google Chrome or Mozilla Firefox. These three browsers have to be represented with numbers in a vector. The method of representation is called one-hot encoding, where a data scientist will use three digits and call IE 100, Chrome 010 and Firefox 001. Which is how browsers are engineered to become numerical features that can be monitored for significance. They will be added to a vector that contains other information about users, such as the region in which their IP address is located. Geographic regions, too, are converted into binary and stacked on top of other log elements. Over the course of training, neural nets learn how significant or irrelevant those features are for the detection of anomalies and assign them weights.

Big Data Analytics Tokyo

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Big Data Analytics Tokyo

Similar to Big Data Analytics Tokyo (20)

More from Adam Gibson

More from Adam Gibson (12)

Recently uploaded

Recently uploaded (20)

Big Data Analytics Tokyo