Apache Eagle is an Open Source Monitoring solution, contributed by eBay Inc, to instantly identify access to sensitive data, recognize attacks, malicious activities in Hadoop and take actions in real time.
3. Big Data @ eBay
800M
Listings *
159M
Global Active Buyers *
*Q3 2015 data
7
Hadoop Clusters*
800M
HDFS operations
(single cluster)*
120 PB
Data*
4. Motivation
Who is accessing the data?
What data are they accessing?
Is someone trying to access data that they don’t have access to?
Are there any anomalous access patterns?
Is there a security threat?
How to monitor and get notified during or prior to an anomalous event occurring?
6. USER PROFILE ALGORITHMS
Density Estimation
• Compute mean and standard deviation
• Compute probability density estimation
• Detect anomaly if probability density below minimum probability density seen
so far from training set
m =
1
N
x(i)
i=1
N
å s =
1
N
2
(xi -m)
i=1
N
å
p(x) = p(xj;mj
j=1
m
Õ ,s j ) =
1
s j 2pj=1
m
Õ e
-(xj -mj )2
/2s j
2
7. USER PROFILE ALGORITHMS…
Eigen Value Decomposition
• Compute mean and variance
• Compute Eigen Vectors and determine Principal
Components
• Normal data points lie near first few principal
components
• Abnormal data points lie further from first few
principal components and closer to later
components
9. EXPERIMENTAL METHODOLOGY
User Population
• 1500 ebay users accessing Hadoop clusters
Features
• HDFS operation frequencies aggregated across one
minute interval
• Examples
• Command frequencies
• Time of the job
10. EXPERIMENTAL METHODOLOGY…
Determine users who are behaviorally different
• Compute Mahalanobis distance between users data
,where are mean and standard deviation
• Compute clusters
• Use behaviorally different users from a user as cross-
validation set
Dm (x) = (x -m)T
s-1
(x -m)
m,s
12. FUTURE WORK
• Apache incubation releases
• Twitter feed: https://twitter.com/theapacheeagle
• Extend to HIVE, HBASE, Pig and other Big Data Technologies
• Explore alternative algorithms
• Consider more features