SlideShare a Scribd company logo
1 of 25
Download to read offline
background image: 960x540 pixels - send to back of slide and set to 80% transparency
Using Data Science for
Cybersecurity
Anirudh Kondaveeti, Principal Data Scientist, Pivotal
Jeff Kelly, Principal Product Marketing Manager, Pivotal
Today’s Speakers
2
Using Data Science for Cybersecurity
Anirudh Kondaveeti
Principal Data Scientist, Pivotal
Jeff Kelly
Product Marketing, Pivotal
Moderator Presenter
cover this square with an image (540 x 480 pixels)
●  Cybercrime costs average US
enterprise $17m per year*
●  Cost grew at 15% CAGR over last three
years
●  Any given cybercrime can cost
significantly more
●  Target’s 2014 hack cost company
approximately $162m
●  Costs not just financial, also reputational
Cost of Cybercrime on
the Rise
*Source: 2016 Cost of Cyber Crime Study & the Risk of Business Innovation,
Ponemon Institute
cover this square with an image (540 x 480 pixels)
●  Amateur hackers giving way to
professionals
●  Developing new, more sophisticated,
methods
●  Professional hackers make their
services available for a fee
●  Costs to commit cybercrime dropping
●  Average subscription fee for a one hour/
month DDoS package is roughly $38*
Hackers Growing More
Sophisticated
*Source: Q2 2015 Global DDoS Threat Landscape, Incapsula
cover this square with an image (540 x 480 pixels)
●  Defending the perimeter no longer
enough
●  No 100%, fool-proof way to keep bad
actors out
●  Some threats come from within
●  The idea of a perimeter becoming
obsolete with mobile, cloud, IoT
●  Need better methods for threat
detection inside the network
Perimeter Defense
Inadequate
Data Science for Cybersecurity
Security must move beyond signature-based matching
•  Necessary defense direction: Find the Unknown
•  Need an advanced platform: Security is a Big Data problem
•  Multiple decentralized sources of traditional or unconventional data
•  Need a platform for better BI, reporting, and cross-source correlation
•  Develop intelligence: Security is an Advanced Analytics problem
BI and
Compliance-
driven
Investigation-
driven
Behavior-
metrics
Investigation-
driven
Data-science
driven
Background
8
Lateral Movement Detection
Advanced Persistent Threat (APT)
A handful of users are
targeted by two
phishing attacks: one
user opens Zero day
payload
(CVE-02011-0609)
The user machine is
accessed remotely
by Poison Ivy tool
Attacker elevates
access to important
user, service and
admin accounts, and
specific systems
Data is acquired from
target servers and
staged for exfiltration
Data is exfiltrated via
encrypted files over ftp to
external, compromised
machine at a hosting
provider
Phishing and
Zero Day Attack
Back Door
Lateral
Movement
Data Gathering Exfiltrate
1 2 3 4 5
APT Kill Chain
What: Identify anomalous user-level access to hosts
How: Look at People & Machines
•  Users (User Behavior Models)
•  Network, Servers (User Peer Models)
Scenarios:
Network reconnaissance from remote adversary on hijacked device
Ill-intentioned activities by legitimate employee
Access policy abuse
Business values:
Immediate security alert generation
Enhanced SIEM alert queue prioritization
Focused monitoring
Future integration with other analytic models for 360° attack view
Lateral Movement Detection
Data Computing Appliance
Logs
Active Directory Activity
Active Directory Metadata
Server Information
Structured
ExternalTables
Semi-structured
Regression Based Model
Cluster Based Model
Recommendation System
Based
User Behavioral Model
Anomalous Users
Greenplum
DIA
LDAP Activity
Lateral Movement Detection (LMD) – Flow Diagram
Model to identify users with unusual
variation in the number of servers
accessed over time
Build a regression model for each user
(Y = aX + b)
No. of servers accessed each week (Y)
~ Week Index (X)
Find the slope of the regression line for
each user (a)
Identify users who have a high positive
or negative slope to find users with
unusual activity
NumberofServers
Week of the year
Regression plot of number of servers for a user
Regression-Based Model
Build historical behavioral profile for each user
based on following features:
•  Servers accessed
•  IP addresses logged in from
•  Geographical information of login
Models stress individual user/job log-in
frequency
Multiple Feature Generations reduce false
alarms:
•  Aggregate servers to respective server group
•  Incorporate server criticality
•  Assign more weight to less popular servers and IP
addresses
•  E.g. print servers are low-weighted
•  Use recommendation engine to suggest servers to users
based on job roles and peers
Servers
s1s2s3s4s5s6s7s8s9s10
Typically uses only
a few servers
Begins logging
into a lot of
new servers
User Behavior Models (UBM)
Week1 Week2 . Week10 Week 11 . Week15
server1 2 3 1 0 . 0
server2 4 7 1 3 . 7
server3 0 2 0 0 . 0
. . . . . .
server25 1 3 5 8 . 1
PCA Model Built per User (Training Data) Testing Data
User behavior matrix is created using ‘x’ weeks of history for a user. The current week is
used as test data.
PCA is dimensionality reduction technique used to capture the components set of
multidimensional vector which account for most of the variance.
Principal dimensions are calculated from the training data.
Principal Component Analysis (PCA) Scoring
Reconstruction Error
Training Data
(User Behavior
Matrix)
Run PCA
Principal
Dimensions
Reconstruct
Project onto
Principal
Dimensions
Test Vector
(User data for
new week)
Reconstructed
Test Vector
Difference
between
two vectors
Anomaly
Score
Ref: A Lakhina, M Crovella, C Diot, Diagnosing network-wide traffic anomalies
Principal Component Analysis (PCA) Scoring
Oversampling PCA
Reference and Image Source: YR Yeh, ZY Lee, YJ Lee, Anomaly Detection via Over-sampling Principal Component Analysis
Training Data
(User Behavior
Matrix)
Run PCA
Oversampled
Test Data
Training Data
(User Behavior
Matrix)
Run PCA
First Principal Vector
Difference
in angle
between them
Anomaly
Score
First Principal Vector
after oversampling
Test Data
Principal Component Analysis (PCA) Scoring
R Code to find the Principal Components (using SVD)
SQL & R
User1
Data
User2
Data
User3
Data
User4
Data
User5
Data
User1
Model
User2
Model
User3
Model
User4
Model
User5
Model
PLR wrapper over the R Code to run in parallel
Parallelized PCA using PL/R
Users rate items
To recommend items to a particular user A
•  Find other users U similar to A
•  Identify the set of items I accessed by U
•  Recommend these items I to A
Users = Employees
Items = Servers accessed
Image Source: http://dataconomy.com/2015/03/an-introduction-to-recommendation-engines/
Recommendation System-Based Model
Ÿ  Historical profile for each user
based on number of days per
week for a particular server
weighted by
recommendations
Ÿ  AD Logs, LDAP data (job title,
dept, etc)
Ÿ  Heat Map (Top figure)
–  X-Axis : Week Index
–  Y-Axis : Server
–  Value: Number of days per
week weighted by
recommendations
Ÿ  Outlier Plot (Bottom Figure)
–  X-Axis : Week Index
–  Y-Axis : Outlier Score
Heat map before recommendations Heat map after recommendations
Servers g3 & g4 are recommended, hence
weight is decreased
Outlier score in test week decreases because the
new servers that the user accesses are
recommended for his job profile
g1g2g3g4g5
g1g2g3g4g5
Recommendation System-Based Model
Using historical windows events data to
build graphs* of typical user behavior
•  Which machines does the user log into?
•  Which machines does the user log in from?
•  How often?
•  In which order?
Ask if this behavior is typical
•  Is it typical for this user?
•  Is it typical for someone in a particular department?
•  Is this typical for someone in the user’s job role?
Graph models are sensitive to direction,
order, and frequency
34.23.123.4
Typical Behavior
Anomalous Behavior
DB with financial
information
34.23.123.51
34.23.1.1
34.23.0.1
34.23.2.8
34.23.123.4
34.23.1.1
34.23.0.1
34.23.2.8
34.23.123.51
*Reference: Alexander D. Kenta, Lorie M. Liebrockb, Joshua C. Neila. Authentication graphs: Analyzing user behavior within an enterprise network.
Graph Model
Challenge:
•  Cybersecurity threats, data privacy, data protection and fraudulent
behavior going undetected, leaving customer vulnerable to security
risks, loss of money
•  Need to gain timely insight into unusual/suspicious internal behavior
to allow for proper action
•  Tools in place cannot be customized to leverage historical security
data and allow for predictive analytics
Solution:
•  Leveraged Data Science to show use cases analyzing their active
directory data, identifying fraud, unapproved file sharing, etc.
•  Utilized Big Data Suite, specifically Greenplum + MADlib + R to
store and analyze data with potential to build out Hadoop data lake
with HDB (aka HAWQ)
Pivotal Solution includes: Pivotal
Greenplum, Pivotal HDB, Apache MADlib
Fortune 100 Companies Leverage Pivotal to Tackle
Enterprise-wide Security Risks with Analytics
•  Pivotal Data Science expertise and partnership with customers to
identify high-value use cases to solve and build data science center
of excellence for security analytics
•  Tight integration to Analytical Tools that run in-database and
across all of the data, to cover the most possible use cases
•  Scalable Solution that can grow as data needs grow, leveraging
commodity hardware to keep costs low as data volume increases
•  Join key Pivotal customers in the Security Advisory Council for
collaboration and knowledge sharing
Why Pivotal for Security Analytics
Additional Resources
& Next Steps
Read: Pivotal Data Science Blog
https://blog.pivotal.io/channels/data-science-pivotal
Strategic: Pivotal Data Science Analytics Road
mapping Engagement https://pivotal.io/contact
Tune in: Next data science webinar: “Using Data
Science to Detect Healthcare Fraud, Waste, and
Abuse,” March 14, 2017
https://pivotal.io/resources/1/webinars
Hands on:
Pivotal Greenplum Sandbox
https://network.pivotal.io/products/pivotal-gpdb
Apache MADlib (incubating)
http://madlib.incubator.apache.org/
Questions?
Using Data Science for Cybersecurity
Using Data Science for Cybersecurity

More Related Content

What's hot

6 Steps for Operationalizing Threat Intelligence
6 Steps for Operationalizing Threat Intelligence6 Steps for Operationalizing Threat Intelligence
6 Steps for Operationalizing Threat IntelligenceSirius
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
Phishing Detection using Machine Learning
Phishing Detection using Machine LearningPhishing Detection using Machine Learning
Phishing Detection using Machine LearningArjun BM
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data MiningKamal Acharya
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learningmahutte
 
Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018Paolo Cremonesi
 
AI for security or security for AI - Sergey Gordeychik
AI for security or security for AI - Sergey GordeychikAI for security or security for AI - Sergey Gordeychik
AI for security or security for AI - Sergey GordeychikSergey Gordeychik
 
Email Security : PGP & SMIME
Email Security : PGP & SMIMEEmail Security : PGP & SMIME
Email Security : PGP & SMIMERohit Soni
 
Machine learning and types
Machine learning and typesMachine learning and types
Machine learning and typesPadma Metta
 
Recommendation Systems - Why How and Real Life Applications
Recommendation Systems - Why How and Real Life ApplicationsRecommendation Systems - Why How and Real Life Applications
Recommendation Systems - Why How and Real Life ApplicationsLiron Zighelnic
 
Chapter 1 Introduction of Cryptography and Network security
Chapter 1 Introduction of Cryptography and Network security Chapter 1 Introduction of Cryptography and Network security
Chapter 1 Introduction of Cryptography and Network security Dr. Kapil Gupta
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleImpetus Technologies
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Types of machine learning
Types of machine learningTypes of machine learning
Types of machine learningHimaniAloona
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNNŞeyda Hatipoğlu
 
Cybersecurity Fundamentals | Understanding Cybersecurity Basics | Cybersecuri...
Cybersecurity Fundamentals | Understanding Cybersecurity Basics | Cybersecuri...Cybersecurity Fundamentals | Understanding Cybersecurity Basics | Cybersecuri...
Cybersecurity Fundamentals | Understanding Cybersecurity Basics | Cybersecuri...Edureka!
 

What's hot (20)

6 Steps for Operationalizing Threat Intelligence
6 Steps for Operationalizing Threat Intelligence6 Steps for Operationalizing Threat Intelligence
6 Steps for Operationalizing Threat Intelligence
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Phishing Detection using Machine Learning
Phishing Detection using Machine LearningPhishing Detection using Machine Learning
Phishing Detection using Machine Learning
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Introduction to Statistical Machine Learning
Introduction to Statistical Machine LearningIntroduction to Statistical Machine Learning
Introduction to Statistical Machine Learning
 
Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018Tutorial on sequence aware recommender systems - UMAP 2018
Tutorial on sequence aware recommender systems - UMAP 2018
 
AI for security or security for AI - Sergey Gordeychik
AI for security or security for AI - Sergey GordeychikAI for security or security for AI - Sergey Gordeychik
AI for security or security for AI - Sergey Gordeychik
 
Email Security : PGP & SMIME
Email Security : PGP & SMIMEEmail Security : PGP & SMIME
Email Security : PGP & SMIME
 
Machine learning and types
Machine learning and typesMachine learning and types
Machine learning and types
 
Recommendation Systems - Why How and Real Life Applications
Recommendation Systems - Why How and Real Life ApplicationsRecommendation Systems - Why How and Real Life Applications
Recommendation Systems - Why How and Real Life Applications
 
Chapter 1 Introduction of Cryptography and Network security
Chapter 1 Introduction of Cryptography and Network security Chapter 1 Introduction of Cryptography and Network security
Chapter 1 Introduction of Cryptography and Network security
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scale
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Intruders
IntrudersIntruders
Intruders
 
Types of machine learning
Types of machine learningTypes of machine learning
Types of machine learning
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
 
Cybersecurity Fundamentals | Understanding Cybersecurity Basics | Cybersecuri...
Cybersecurity Fundamentals | Understanding Cybersecurity Basics | Cybersecuri...Cybersecurity Fundamentals | Understanding Cybersecurity Basics | Cybersecuri...
Cybersecurity Fundamentals | Understanding Cybersecurity Basics | Cybersecuri...
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
IP Security
IP SecurityIP Security
IP Security
 

Viewers also liked

Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)
Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)
Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)VMware Tanzu
 
Part 3: Enabling Continuous Delivery (Pivotal Cloud Platform Roadshow)
Part 3: Enabling Continuous Delivery (Pivotal Cloud Platform Roadshow)Part 3: Enabling Continuous Delivery (Pivotal Cloud Platform Roadshow)
Part 3: Enabling Continuous Delivery (Pivotal Cloud Platform Roadshow)VMware Tanzu
 
Keynote: Architecting for Continuous Delivery (Pivotal Cloud Platform Roadshow)
Keynote: Architecting for Continuous Delivery (Pivotal Cloud Platform Roadshow)Keynote: Architecting for Continuous Delivery (Pivotal Cloud Platform Roadshow)
Keynote: Architecting for Continuous Delivery (Pivotal Cloud Platform Roadshow)VMware Tanzu
 
LIVE DEMO: Pivotal Cloud Foundry
LIVE DEMO: Pivotal Cloud FoundryLIVE DEMO: Pivotal Cloud Foundry
LIVE DEMO: Pivotal Cloud FoundryVMware Tanzu
 
Part 1: The Developer Experience (Pivotal Cloud Platform Roadshow)
Part 1: The Developer Experience (Pivotal Cloud Platform Roadshow)Part 1: The Developer Experience (Pivotal Cloud Platform Roadshow)
Part 1: The Developer Experience (Pivotal Cloud Platform Roadshow)VMware Tanzu
 
Part 2: Architecture and the Operator Experience (Pivotal Cloud Platform Road...
Part 2: Architecture and the Operator Experience (Pivotal Cloud Platform Road...Part 2: Architecture and the Operator Experience (Pivotal Cloud Platform Road...
Part 2: Architecture and the Operator Experience (Pivotal Cloud Platform Road...VMware Tanzu
 
Data Science Driven Malware Detection
Data Science Driven Malware DetectionData Science Driven Malware Detection
Data Science Driven Malware DetectionVMware Tanzu
 
Pivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical OverviewPivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical OverviewVMware Tanzu
 

Viewers also liked (8)

Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)
Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)
Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)
 
Part 3: Enabling Continuous Delivery (Pivotal Cloud Platform Roadshow)
Part 3: Enabling Continuous Delivery (Pivotal Cloud Platform Roadshow)Part 3: Enabling Continuous Delivery (Pivotal Cloud Platform Roadshow)
Part 3: Enabling Continuous Delivery (Pivotal Cloud Platform Roadshow)
 
Keynote: Architecting for Continuous Delivery (Pivotal Cloud Platform Roadshow)
Keynote: Architecting for Continuous Delivery (Pivotal Cloud Platform Roadshow)Keynote: Architecting for Continuous Delivery (Pivotal Cloud Platform Roadshow)
Keynote: Architecting for Continuous Delivery (Pivotal Cloud Platform Roadshow)
 
LIVE DEMO: Pivotal Cloud Foundry
LIVE DEMO: Pivotal Cloud FoundryLIVE DEMO: Pivotal Cloud Foundry
LIVE DEMO: Pivotal Cloud Foundry
 
Part 1: The Developer Experience (Pivotal Cloud Platform Roadshow)
Part 1: The Developer Experience (Pivotal Cloud Platform Roadshow)Part 1: The Developer Experience (Pivotal Cloud Platform Roadshow)
Part 1: The Developer Experience (Pivotal Cloud Platform Roadshow)
 
Part 2: Architecture and the Operator Experience (Pivotal Cloud Platform Road...
Part 2: Architecture and the Operator Experience (Pivotal Cloud Platform Road...Part 2: Architecture and the Operator Experience (Pivotal Cloud Platform Road...
Part 2: Architecture and the Operator Experience (Pivotal Cloud Platform Road...
 
Data Science Driven Malware Detection
Data Science Driven Malware DetectionData Science Driven Malware Detection
Data Science Driven Malware Detection
 
Pivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical OverviewPivotal Cloud Foundry: A Technical Overview
Pivotal Cloud Foundry: A Technical Overview
 

Similar to Using Data Science for Cybersecurity

Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data productsVikas Sardana
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data editionMark Kerzner
 
Applying Auto-Data Classification Techniques for Large Data Sets
Applying Auto-Data Classification Techniques for Large Data SetsApplying Auto-Data Classification Techniques for Large Data Sets
Applying Auto-Data Classification Techniques for Large Data SetsPriyanka Aash
 
Splunk for Security: Background & Customer Case Study
Splunk for Security: Background & Customer Case StudySplunk for Security: Background & Customer Case Study
Splunk for Security: Background & Customer Case StudyAndrew Gerber
 
2016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V42016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V4Janani Eshwaran
 
2016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V42016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V4Janani Eshwaran
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Piyush Kumar
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming AnalyticsGuido Schmutz
 
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced AnalyticsAWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced AnalyticsAmazon Web Services
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise deteo
 
A Study in Borderless Over Perimeter
A Study in Borderless Over PerimeterA Study in Borderless Over Perimeter
A Study in Borderless Over PerimeterForgeRock
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformArvind Sathi
 
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...Amazon Web Services
 
How big data and AI saved the day: critical IP almost walked out the door
How big data and AI saved the day: critical IP almost walked out the doorHow big data and AI saved the day: critical IP almost walked out the door
How big data and AI saved the day: critical IP almost walked out the doorDataWorks Summit
 

Similar to Using Data Science for Cybersecurity (20)

Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data products
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
Applying Auto-Data Classification Techniques for Large Data Sets
Applying Auto-Data Classification Techniques for Large Data SetsApplying Auto-Data Classification Techniques for Large Data Sets
Applying Auto-Data Classification Techniques for Large Data Sets
 
Splunk for Security: Background & Customer Case Study
Splunk for Security: Background & Customer Case StudySplunk for Security: Background & Customer Case Study
Splunk for Security: Background & Customer Case Study
 
2016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V42016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V4
 
2016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V42016 DSG Webinar Azure HDInsight 2 V4
2016 DSG Webinar Azure HDInsight 2 V4
 
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
Importance of ‘Centralized Event collection’ and BigData platform for Analysis !
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced AnalyticsAWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
AWS July Webinar Series: Amazon Redshift Reporting and Advanced Analytics
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Automated Analytics at Scale
Automated Analytics at ScaleAutomated Analytics at Scale
Automated Analytics at Scale
 
Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
A Study in Borderless Over Perimeter
A Study in Borderless Over PerimeterA Study in Borderless Over Perimeter
A Study in Borderless Over Perimeter
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics Platform
 
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
 
How big data and AI saved the day: critical IP almost walked out the door
How big data and AI saved the day: critical IP almost walked out the doorHow big data and AI saved the day: critical IP almost walked out the door
How big data and AI saved the day: critical IP almost walked out the door
 

More from VMware Tanzu

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItVMware Tanzu
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023VMware Tanzu
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleVMware Tanzu
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023VMware Tanzu
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductVMware Tanzu
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready AppsVMware Tanzu
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And BeyondVMware Tanzu
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023VMware Tanzu
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptxVMware Tanzu
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchVMware Tanzu
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishVMware Tanzu
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVMware Tanzu
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - FrenchVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023VMware Tanzu
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootVMware Tanzu
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerVMware Tanzu
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeVMware Tanzu
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsVMware Tanzu
 

More from VMware Tanzu (20)

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
 

Recently uploaded

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Using Data Science for Cybersecurity

  • 1. background image: 960x540 pixels - send to back of slide and set to 80% transparency Using Data Science for Cybersecurity Anirudh Kondaveeti, Principal Data Scientist, Pivotal Jeff Kelly, Principal Product Marketing Manager, Pivotal
  • 2. Today’s Speakers 2 Using Data Science for Cybersecurity Anirudh Kondaveeti Principal Data Scientist, Pivotal Jeff Kelly Product Marketing, Pivotal Moderator Presenter
  • 3. cover this square with an image (540 x 480 pixels) ●  Cybercrime costs average US enterprise $17m per year* ●  Cost grew at 15% CAGR over last three years ●  Any given cybercrime can cost significantly more ●  Target’s 2014 hack cost company approximately $162m ●  Costs not just financial, also reputational Cost of Cybercrime on the Rise *Source: 2016 Cost of Cyber Crime Study & the Risk of Business Innovation, Ponemon Institute
  • 4. cover this square with an image (540 x 480 pixels) ●  Amateur hackers giving way to professionals ●  Developing new, more sophisticated, methods ●  Professional hackers make their services available for a fee ●  Costs to commit cybercrime dropping ●  Average subscription fee for a one hour/ month DDoS package is roughly $38* Hackers Growing More Sophisticated *Source: Q2 2015 Global DDoS Threat Landscape, Incapsula
  • 5. cover this square with an image (540 x 480 pixels) ●  Defending the perimeter no longer enough ●  No 100%, fool-proof way to keep bad actors out ●  Some threats come from within ●  The idea of a perimeter becoming obsolete with mobile, cloud, IoT ●  Need better methods for threat detection inside the network Perimeter Defense Inadequate
  • 6. Data Science for Cybersecurity
  • 7. Security must move beyond signature-based matching •  Necessary defense direction: Find the Unknown •  Need an advanced platform: Security is a Big Data problem •  Multiple decentralized sources of traditional or unconventional data •  Need a platform for better BI, reporting, and cross-source correlation •  Develop intelligence: Security is an Advanced Analytics problem BI and Compliance- driven Investigation- driven Behavior- metrics Investigation- driven Data-science driven Background
  • 9. Advanced Persistent Threat (APT) A handful of users are targeted by two phishing attacks: one user opens Zero day payload (CVE-02011-0609) The user machine is accessed remotely by Poison Ivy tool Attacker elevates access to important user, service and admin accounts, and specific systems Data is acquired from target servers and staged for exfiltration Data is exfiltrated via encrypted files over ftp to external, compromised machine at a hosting provider Phishing and Zero Day Attack Back Door Lateral Movement Data Gathering Exfiltrate 1 2 3 4 5 APT Kill Chain
  • 10. What: Identify anomalous user-level access to hosts How: Look at People & Machines •  Users (User Behavior Models) •  Network, Servers (User Peer Models) Scenarios: Network reconnaissance from remote adversary on hijacked device Ill-intentioned activities by legitimate employee Access policy abuse Business values: Immediate security alert generation Enhanced SIEM alert queue prioritization Focused monitoring Future integration with other analytic models for 360° attack view Lateral Movement Detection
  • 11. Data Computing Appliance Logs Active Directory Activity Active Directory Metadata Server Information Structured ExternalTables Semi-structured Regression Based Model Cluster Based Model Recommendation System Based User Behavioral Model Anomalous Users Greenplum DIA LDAP Activity Lateral Movement Detection (LMD) – Flow Diagram
  • 12. Model to identify users with unusual variation in the number of servers accessed over time Build a regression model for each user (Y = aX + b) No. of servers accessed each week (Y) ~ Week Index (X) Find the slope of the regression line for each user (a) Identify users who have a high positive or negative slope to find users with unusual activity NumberofServers Week of the year Regression plot of number of servers for a user Regression-Based Model
  • 13. Build historical behavioral profile for each user based on following features: •  Servers accessed •  IP addresses logged in from •  Geographical information of login Models stress individual user/job log-in frequency Multiple Feature Generations reduce false alarms: •  Aggregate servers to respective server group •  Incorporate server criticality •  Assign more weight to less popular servers and IP addresses •  E.g. print servers are low-weighted •  Use recommendation engine to suggest servers to users based on job roles and peers Servers s1s2s3s4s5s6s7s8s9s10 Typically uses only a few servers Begins logging into a lot of new servers User Behavior Models (UBM)
  • 14. Week1 Week2 . Week10 Week 11 . Week15 server1 2 3 1 0 . 0 server2 4 7 1 3 . 7 server3 0 2 0 0 . 0 . . . . . . server25 1 3 5 8 . 1 PCA Model Built per User (Training Data) Testing Data User behavior matrix is created using ‘x’ weeks of history for a user. The current week is used as test data. PCA is dimensionality reduction technique used to capture the components set of multidimensional vector which account for most of the variance. Principal dimensions are calculated from the training data. Principal Component Analysis (PCA) Scoring
  • 15. Reconstruction Error Training Data (User Behavior Matrix) Run PCA Principal Dimensions Reconstruct Project onto Principal Dimensions Test Vector (User data for new week) Reconstructed Test Vector Difference between two vectors Anomaly Score Ref: A Lakhina, M Crovella, C Diot, Diagnosing network-wide traffic anomalies Principal Component Analysis (PCA) Scoring
  • 16. Oversampling PCA Reference and Image Source: YR Yeh, ZY Lee, YJ Lee, Anomaly Detection via Over-sampling Principal Component Analysis Training Data (User Behavior Matrix) Run PCA Oversampled Test Data Training Data (User Behavior Matrix) Run PCA First Principal Vector Difference in angle between them Anomaly Score First Principal Vector after oversampling Test Data Principal Component Analysis (PCA) Scoring
  • 17. R Code to find the Principal Components (using SVD) SQL & R User1 Data User2 Data User3 Data User4 Data User5 Data User1 Model User2 Model User3 Model User4 Model User5 Model PLR wrapper over the R Code to run in parallel Parallelized PCA using PL/R
  • 18. Users rate items To recommend items to a particular user A •  Find other users U similar to A •  Identify the set of items I accessed by U •  Recommend these items I to A Users = Employees Items = Servers accessed Image Source: http://dataconomy.com/2015/03/an-introduction-to-recommendation-engines/ Recommendation System-Based Model
  • 19. Ÿ  Historical profile for each user based on number of days per week for a particular server weighted by recommendations Ÿ  AD Logs, LDAP data (job title, dept, etc) Ÿ  Heat Map (Top figure) –  X-Axis : Week Index –  Y-Axis : Server –  Value: Number of days per week weighted by recommendations Ÿ  Outlier Plot (Bottom Figure) –  X-Axis : Week Index –  Y-Axis : Outlier Score Heat map before recommendations Heat map after recommendations Servers g3 & g4 are recommended, hence weight is decreased Outlier score in test week decreases because the new servers that the user accesses are recommended for his job profile g1g2g3g4g5 g1g2g3g4g5 Recommendation System-Based Model
  • 20. Using historical windows events data to build graphs* of typical user behavior •  Which machines does the user log into? •  Which machines does the user log in from? •  How often? •  In which order? Ask if this behavior is typical •  Is it typical for this user? •  Is it typical for someone in a particular department? •  Is this typical for someone in the user’s job role? Graph models are sensitive to direction, order, and frequency 34.23.123.4 Typical Behavior Anomalous Behavior DB with financial information 34.23.123.51 34.23.1.1 34.23.0.1 34.23.2.8 34.23.123.4 34.23.1.1 34.23.0.1 34.23.2.8 34.23.123.51 *Reference: Alexander D. Kenta, Lorie M. Liebrockb, Joshua C. Neila. Authentication graphs: Analyzing user behavior within an enterprise network. Graph Model
  • 21. Challenge: •  Cybersecurity threats, data privacy, data protection and fraudulent behavior going undetected, leaving customer vulnerable to security risks, loss of money •  Need to gain timely insight into unusual/suspicious internal behavior to allow for proper action •  Tools in place cannot be customized to leverage historical security data and allow for predictive analytics Solution: •  Leveraged Data Science to show use cases analyzing their active directory data, identifying fraud, unapproved file sharing, etc. •  Utilized Big Data Suite, specifically Greenplum + MADlib + R to store and analyze data with potential to build out Hadoop data lake with HDB (aka HAWQ) Pivotal Solution includes: Pivotal Greenplum, Pivotal HDB, Apache MADlib Fortune 100 Companies Leverage Pivotal to Tackle Enterprise-wide Security Risks with Analytics
  • 22. •  Pivotal Data Science expertise and partnership with customers to identify high-value use cases to solve and build data science center of excellence for security analytics •  Tight integration to Analytical Tools that run in-database and across all of the data, to cover the most possible use cases •  Scalable Solution that can grow as data needs grow, leveraging commodity hardware to keep costs low as data volume increases •  Join key Pivotal customers in the Security Advisory Council for collaboration and knowledge sharing Why Pivotal for Security Analytics
  • 23. Additional Resources & Next Steps Read: Pivotal Data Science Blog https://blog.pivotal.io/channels/data-science-pivotal Strategic: Pivotal Data Science Analytics Road mapping Engagement https://pivotal.io/contact Tune in: Next data science webinar: “Using Data Science to Detect Healthcare Fraud, Waste, and Abuse,” March 14, 2017 https://pivotal.io/resources/1/webinars Hands on: Pivotal Greenplum Sandbox https://network.pivotal.io/products/pivotal-gpdb Apache MADlib (incubating) http://madlib.incubator.apache.org/
  • 24. Questions? Using Data Science for Cybersecurity