SlideShare a Scribd company logo
1 of 14
Download to read offline
EAGLE: User Profile-based Anomaly
Detection for Securing Hadoop Clusters
01 NOV, 2015
CHAITALI GUPTA, RANJAN SINHA, YONG ZHANG
Outline
Why EAGLE?
Architecture of EAGLE
User Profiles in EAGLE
Experiments
Performance Results
Future Work
Big Data @ eBay
800M
Listings *
159M
Global Active Buyers *
*Q3 2015 data
7
Hadoop Clusters*
800M
HDFS operations
(single cluster)*
120 PB
Data*
Motivation
Who is accessing the data?
What data are they accessing?
Is someone trying to access data that they don’t have access to?
Are there any anomalous access patterns?
Is there a security threat?
How to monitor and get notified during or prior to an anomalous event occurring?
ARCHITECTURE
STREAM PROCESSING
ENGINE
DataCollector
Kafka
HDFS, Audit, Security
METADATA
MANAGER
DATASTORES
REMEDIATION
ENGINE
Apache
Ranger
MACHINE
LEARNING
MODULE
Custom
module
Alerts
Activities
Alerts
PolicyThresholds
User
properties
MLThresholds
Real Time Alert
Dashboard
HDFS Archive
Security Analyst
Admin
Console
Security Engineer
Insights
Metadata
Management
MACHINE LEARNING
TRAINING MODULE
USER PROFILE ALGORITHMS
Density Estimation
• Compute mean and standard deviation
• Compute probability density estimation
• Detect anomaly if probability density below minimum probability density seen
so far from training set
m =
1
N
x(i)
i=1
N
å s =
1
N
2
(xi -m)
i=1
N
å
p(x) = p(xj;mj
j=1
m
Õ ,s j ) =
1
s j 2pj=1
m
Õ e
-(xj -mj )2
/2s j
2
USER PROFILE ALGORITHMS…
Eigen Value Decomposition
• Compute mean and variance
• Compute Eigen Vectors and determine Principal
Components
• Normal data points lie near first few principal
components
• Abnormal data points lie further from first few
principal components and closer to later
components
USER PROFILE ARCHITECTURE
EXPERIMENTAL METHODOLOGY
User Population
• 1500 ebay users accessing Hadoop clusters
Features
• HDFS operation frequencies aggregated across one
minute interval
• Examples
• Command frequencies
• Time of the job
EXPERIMENTAL METHODOLOGY…
Determine users who are behaviorally different
• Compute Mahalanobis distance between users data
,where are mean and standard deviation
• Compute clusters
• Use behaviorally different users from a user as cross-
validation set
Dm (x) = (x -m)T
s-1
(x -m)
m,s
PERFORMANCE RESULTS
Sensitivity
FUTURE WORK
• Apache incubation releases
• Twitter feed: https://twitter.com/theapacheeagle
• Extend to HIVE, HBASE, Pig and other Big Data Technologies
• Explore alternative algorithms
• Consider more features
APACHE EAGLE - OPEN SOURCE
Eagle Site:
http://goeagle.io
Tech Blog:
http://www.ebaytechblog.com
Github Repo:
https://github.com/eBay/Eagle
Apache Incubator Project:
Oct 26, 2015
Thank You!

More Related Content

Viewers also liked

How to Make a GIF from a Video File Using GIMP
How to Make a GIF from a Video File Using GIMPHow to Make a GIF from a Video File Using GIMP
How to Make a GIF from a Video File Using GIMPKeef21Moon
 
Demystifying Big Data, Data Science and Statistics, along with Machine Intell...
Demystifying Big Data, Data Science and Statistics, along with Machine Intell...Demystifying Big Data, Data Science and Statistics, along with Machine Intell...
Demystifying Big Data, Data Science and Statistics, along with Machine Intell...Prof. Dr. Diego Kuonen
 
1.3.23 Решения для отраслевых задач от siemens и дкс. организация ввода и рас...
1.3.23 Решения для отраслевых задач от siemens и дкс. организация ввода и рас...1.3.23 Решения для отраслевых задач от siemens и дкс. организация ввода и рас...
1.3.23 Решения для отраслевых задач от siemens и дкс. организация ввода и рас...Igor Golovin
 
Enterprise Solutions
Enterprise SolutionsEnterprise Solutions
Enterprise SolutionsGetty Images
 
¿Podemos identificar maestros efectivos al momento de contratación? Las clase...
¿Podemos identificar maestros efectivos al momento de contratación? Las clase...¿Podemos identificar maestros efectivos al momento de contratación? Las clase...
¿Podemos identificar maestros efectivos al momento de contratación? Las clase...Instituto Nacional de Evaluación Educativa
 
Ontsnapping uit Lantin door verouderde beveiliging?
Ontsnapping uit Lantin door verouderde beveiliging?Ontsnapping uit Lantin door verouderde beveiliging?
Ontsnapping uit Lantin door verouderde beveiliging?Thierry Debels
 
Turnover aziendale: costo, vantaggio o opportunità?
Turnover aziendale: costo, vantaggio o opportunità?Turnover aziendale: costo, vantaggio o opportunità?
Turnover aziendale: costo, vantaggio o opportunità?Altamira HRM
 

Viewers also liked (8)

How to Make a GIF from a Video File Using GIMP
How to Make a GIF from a Video File Using GIMPHow to Make a GIF from a Video File Using GIMP
How to Make a GIF from a Video File Using GIMP
 
Demystifying Big Data, Data Science and Statistics, along with Machine Intell...
Demystifying Big Data, Data Science and Statistics, along with Machine Intell...Demystifying Big Data, Data Science and Statistics, along with Machine Intell...
Demystifying Big Data, Data Science and Statistics, along with Machine Intell...
 
Modafinil 68693-11-8-api
Modafinil 68693-11-8-apiModafinil 68693-11-8-api
Modafinil 68693-11-8-api
 
1.3.23 Решения для отраслевых задач от siemens и дкс. организация ввода и рас...
1.3.23 Решения для отраслевых задач от siemens и дкс. организация ввода и рас...1.3.23 Решения для отраслевых задач от siemens и дкс. организация ввода и рас...
1.3.23 Решения для отраслевых задач от siemens и дкс. организация ввода и рас...
 
Enterprise Solutions
Enterprise SolutionsEnterprise Solutions
Enterprise Solutions
 
¿Podemos identificar maestros efectivos al momento de contratación? Las clase...
¿Podemos identificar maestros efectivos al momento de contratación? Las clase...¿Podemos identificar maestros efectivos al momento de contratación? Las clase...
¿Podemos identificar maestros efectivos al momento de contratación? Las clase...
 
Ontsnapping uit Lantin door verouderde beveiliging?
Ontsnapping uit Lantin door verouderde beveiliging?Ontsnapping uit Lantin door verouderde beveiliging?
Ontsnapping uit Lantin door verouderde beveiliging?
 
Turnover aziendale: costo, vantaggio o opportunità?
Turnover aziendale: costo, vantaggio o opportunità?Turnover aziendale: costo, vantaggio o opportunità?
Turnover aziendale: costo, vantaggio o opportunità?
 

Recently uploaded

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 

Recently uploaded (20)

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 

Apache Eagle @ IEEE International Conference

  • 1. EAGLE: User Profile-based Anomaly Detection for Securing Hadoop Clusters 01 NOV, 2015 CHAITALI GUPTA, RANJAN SINHA, YONG ZHANG
  • 2. Outline Why EAGLE? Architecture of EAGLE User Profiles in EAGLE Experiments Performance Results Future Work
  • 3. Big Data @ eBay 800M Listings * 159M Global Active Buyers * *Q3 2015 data 7 Hadoop Clusters* 800M HDFS operations (single cluster)* 120 PB Data*
  • 4. Motivation Who is accessing the data? What data are they accessing? Is someone trying to access data that they don’t have access to? Are there any anomalous access patterns? Is there a security threat? How to monitor and get notified during or prior to an anomalous event occurring?
  • 5. ARCHITECTURE STREAM PROCESSING ENGINE DataCollector Kafka HDFS, Audit, Security METADATA MANAGER DATASTORES REMEDIATION ENGINE Apache Ranger MACHINE LEARNING MODULE Custom module Alerts Activities Alerts PolicyThresholds User properties MLThresholds Real Time Alert Dashboard HDFS Archive Security Analyst Admin Console Security Engineer Insights Metadata Management MACHINE LEARNING TRAINING MODULE
  • 6. USER PROFILE ALGORITHMS Density Estimation • Compute mean and standard deviation • Compute probability density estimation • Detect anomaly if probability density below minimum probability density seen so far from training set m = 1 N x(i) i=1 N å s = 1 N 2 (xi -m) i=1 N å p(x) = p(xj;mj j=1 m Õ ,s j ) = 1 s j 2pj=1 m Õ e -(xj -mj )2 /2s j 2
  • 7. USER PROFILE ALGORITHMS… Eigen Value Decomposition • Compute mean and variance • Compute Eigen Vectors and determine Principal Components • Normal data points lie near first few principal components • Abnormal data points lie further from first few principal components and closer to later components
  • 9. EXPERIMENTAL METHODOLOGY User Population • 1500 ebay users accessing Hadoop clusters Features • HDFS operation frequencies aggregated across one minute interval • Examples • Command frequencies • Time of the job
  • 10. EXPERIMENTAL METHODOLOGY… Determine users who are behaviorally different • Compute Mahalanobis distance between users data ,where are mean and standard deviation • Compute clusters • Use behaviorally different users from a user as cross- validation set Dm (x) = (x -m)T s-1 (x -m) m,s
  • 12. FUTURE WORK • Apache incubation releases • Twitter feed: https://twitter.com/theapacheeagle • Extend to HIVE, HBASE, Pig and other Big Data Technologies • Explore alternative algorithms • Consider more features
  • 13. APACHE EAGLE - OPEN SOURCE Eagle Site: http://goeagle.io Tech Blog: http://www.ebaytechblog.com Github Repo: https://github.com/eBay/Eagle Apache Incubator Project: Oct 26, 2015