2. Agenda
•About Eagle
•Front End
– Evolution
– Modularization
– Features
•Back End
– Architecture
– Tech Highlights
– Integration
•Q & A
2
3. 3
Apache Eagle is a distributed real-time monitoring and
alerting engine for Hadoop from eBay
Open sourced as Apache Incubator Project on Oct 26th 2015
See http://eagle.incubator.apache.org or http://goeagle.io
11. Architecture
11
STREAM PROCESSING
ENGINE
User Profile
based Anomaly
detection
Policy evaluation
based
Framework
Eagle Storage
(Metadata,
metrics,
alerts…
User Profile
training
Eagle Query
DataCollection(Kafka,Yarn
API…)
Had
oop
jmx
DataSink(email,Kakfa…)
Other
Remediation
Systems
…
12. Tech Highlights
•Data Collection
•Stream Processing DSL
•Distributed Policy Engine
•ML-based anomaly detection
•Query Framework
NOTE {NAME}-{NUMBER} like HDFS-6914 means open source project ticket id contributed by us
12
13. Apache Eagle – Data Collection
Decoupled with Apache Kafka
• high-throughput distributed messaging
• Easy to inject various kinds of data sources
• Python/Java/C++ Kafka clients
Current data sources support
• Hadoop data
HDFS, HBase audit log
GC logs
JMX metrics
History/Running MR job data
• …
• Generic format data
13
17. Apache Eagle – Distributed Real-time Policy
Engine
17
Distributed Real-time Policy Engine
Siddhi CEP
Policy
Evaluator
Machine
Learning Policy
Evaluator
Extensibility
• Default is WSO2 Siddhi CEP
• Powerful SQL-Like event stream
processing
• Open to other customized policy engine
Extensible Policy
Evaluator
public interface PolicyEvaluatorServiceProvider {
public String getPolicyType(); // literal string to identify one type of policy
public Class<? extends PolicyEvaluator<T>> getPolicyEvaluator(); // get policy evaluator
implementation
public List getBindingModules(); // policy text with json format to object mapping
}
public interface PolicyEvaluator {
public void evaluate(ValuesArray input) throws Exception; // evaluate input event
public void onPolicyUpdate(AlertDefinitionAPIEntity newAlertDef);// policy update
public void onPolicyDelete(); // invoked when policy is deleted
}
METADATA MANAGER
Policy/Metadata
18. Apache Eagle – Distributed Real-time Policy
Engine
18
METADATA MANAGER
Distributed Streaming Cluster Environment
Real Time
Alerts
Alerts
Policy
Management
Policy
Dynamical Policy Deployment
Usability
• Powerful SQL-Like CEP CQL
for Policy Definition
• Dynamical Policy Lifecycle
Management
(Deployment/Update)
• Easy-to-use Policy
management and Alert
analytics UI
from metricStream[(name == 'ReplLag')
and (value > 1000)] select * insert into
outputStream;
22. Apache Eagle – Distributed Real-time Policy
Engine
22
Distributed Streaming Cluster Environment
AlertExecutor_{1}
AlertExecutor_{2}
…
AlertExecutor_{N}
Stream_{1}
Stream_{*}
Stream
Processing
Scalability
• Policy scalability: policy partitioning
• Event scalability: grouping
• Example: N Users with 3 partitions, M policies with 2 partitions, then 3*2 physical tasks
23. Apache Eagle – Query Framework
23
Query Syntax
• Full-function SQL-Like REST
Query (aggregation, sorting…)
Eagle Storage
• NOSQL storage like HBase
• RDMS
• Other storage systems
24. Apache Eagle – ML-based Anomaly Detection
24
User Activity Anomaly
Detection
• User profile feature
selection
• Offline user profile
generation
• Online Anomaly
detection
Useful link
• Eagle: User profile-
based anomaly
detection for securing
Hadoop clusters
25. Apache Eagle – Integration I
25
• Eagle in Apache Ambari
– natively be part of hadoop ecosystem
– http://eagle.incubator.apache.org/docs/ambari-plugin-install.html
• Eagle in Docker
– natively fly on Cloud/Container
– https://github.com/apache/incubator-eagle
26. Apache Eagle – Integration II
26
•Apache Ranger
– remediation engine
– Eagle data source
•Splunk
– Eagle alert consumer
– EAGLE alert output is the 1st abstraction of analytics and Splunk is the 2nd abstraction
• Dataguise, Apache knox
– Eagle data source
27. Learn more about Apache Eagle
27
• EAGLE: USER PROFILE-BASED ANOMALY DETECTION IN HADOOP CLUSTER (IEEE)
• EAGLE: DISTRIBUTED REALTIME MONITORING FRAMEWORK FOR HADOOP
CLUSTER
We use WSO2 Siddhi as first class policy engine, but CEP engine can’t cover everything, for example node anomaly detection – we compare all the nodes in the cluster in some time window.
Eagle Policy evaluation引擎默认使用Siddhi作为底层 实时事件处理器
CEP
1)实时复杂数据流处理引擎
2)提供Powerful SQL-like的事件流处理逻辑定义
除了基于siddhi的policy evaluator,用户还可以轻松定义自己的Policy evaluator,比如后面提到的基于ML training model的policy evaluator。