SlideShare a Scribd company logo
1 of 28
Apache
• 来自eBay的分布式实时Hadoop数据安全引擎
蒋吉麟 | 赵晴雯
eBay
Agenda
•About Eagle
•Front End
– Evolution
– Modularization
– Features
•Back End
– Architecture
– Tech Highlights
– Integration
•Q & A
2
3
Apache Eagle is a distributed real-time monitoring and
alerting engine for Hadoop from eBay
Open sourced as Apache Incubator Project on Oct 26th 2015
See http://eagle.incubator.apache.org or http://goeagle.io
Hadoop @eBay
4
1-10 nodes
2007
100+ nodes
1000 + core
1 PB
2010 2011
1000+ node
10,000+ core
10+ PB
4000+ node
40,000+ core
50+ PB
2013
2015
10,000+ nodes
150,000+ cores
150+ PB
2009
10+ nodes
•swf
•exe
5
Features
•common
•metadata
•classification
•metrics
6
common
•Policies
•Alerts
7
metadata
8
classification
•Tree View
•Table View
9
metrics
10
Architecture
11
STREAM PROCESSING
ENGINE
User Profile
based Anomaly
detection
Policy evaluation
based
Framework
Eagle Storage
(Metadata,
metrics,
alerts…
User Profile
training
Eagle Query
DataCollection(Kafka,Yarn
API…)
Had
oop
jmx
DataSink(email,Kakfa…)
Other
Remediation
Systems
…
Tech Highlights
•Data Collection
•Stream Processing DSL
•Distributed Policy Engine
•ML-based anomaly detection
•Query Framework
NOTE {NAME}-{NUMBER} like HDFS-6914 means open source project ticket id contributed by us
12
Apache Eagle – Data Collection
Decoupled with Apache Kafka
• high-throughput distributed messaging
• Easy to inject various kinds of data sources
• Python/Java/C++ Kafka clients
Current data sources support
• Hadoop data
 HDFS, HBase audit log
 GC logs
 JMX metrics
 History/Running MR job data
• …
• Generic format data
13
Apache Eagle – Stream Processing DSL
Easy use
– Easily assemble data transformation, filtering, join…
Flexibility
– Physical execution platform independent
14
STREAM PROCESSING ENGINE
STREAM PROCESSING ENGINE
.flatMap(AuditLogTransformer)
.groupBy(_.user)
.flatMap(UserProfileAggregator);
env.fromKafka (KafkaConfig)
.alert.persistAndEmail
val env = ExecutionEnvironment.getStorm()
env.execute()
Apache Eagle – Stream Processing DSL
15
.flatMap(AuditLogTransformer)
.groupBy(_.user)
.flatMap(UserProfileAggregator);
env.fromKafka (KafkaConfig)
.alert.persistAndEmail
val env = ExecutionEnvironment.getStorm()
env.execute();
Distributed Streaming Cluster Environment
AlertExecutor_{1}
AlertExecutor_{2}
…
AlertExecutor_{N}
Alerts
Real-time
Event Stream
Stream_{1}
Stream_{*}
Stream
Processing
env.execute()
Apache Eagle - Distributed Real-time Policy
Engine
Features
• Extensibility
• Usability
• Real-time
• Scalability
• Metadata-driven
16
METADATA MANAGER
Distributed Streaming Cluster Environment
AlertExecutor_{1}
AlertExecutor_{2}
…
AlertExecutor_{N}
Real Time
Alerts
Alerts
Policy
Management
Policy
Dynamical Policy Deployment
Real-time
Event Stream
Stream_{1}
Stream_{*}
Dynamical Stream Schema
Stream
Processing
Apache Eagle – Distributed Real-time Policy
Engine
17
Distributed Real-time Policy Engine
Siddhi CEP
Policy
Evaluator
Machine
Learning Policy
Evaluator
Extensibility
• Default is WSO2 Siddhi CEP
• Powerful SQL-Like event stream
processing
• Open to other customized policy engine
Extensible Policy
Evaluator
public interface PolicyEvaluatorServiceProvider {
public String getPolicyType(); // literal string to identify one type of policy
public Class<? extends PolicyEvaluator<T>> getPolicyEvaluator(); // get policy evaluator
implementation
public List getBindingModules(); // policy text with json format to object mapping
}
public interface PolicyEvaluator {
public void evaluate(ValuesArray input) throws Exception; // evaluate input event
public void onPolicyUpdate(AlertDefinitionAPIEntity newAlertDef);// policy update
public void onPolicyDelete(); // invoked when policy is deleted
}
METADATA MANAGER
Policy/Metadata
Apache Eagle – Distributed Real-time Policy
Engine
18
METADATA MANAGER
Distributed Streaming Cluster Environment
Real Time
Alerts
Alerts
Policy
Management
Policy
Dynamical Policy Deployment
Usability
• Powerful SQL-Like CEP CQL
for Policy Definition
• Dynamical Policy Lifecycle
Management
(Deployment/Update)
• Easy-to-use Policy
management and Alert
analytics UI
from metricStream[(name == 'ReplLag')
and (value > 1000)] select * insert into
outputStream;
Apache Eagle – Distributed Real-time Policy
Engine
19
Apache Eagle – Distributed Real-time Policy
Engine
20
Real-time
• Stream events are
processed and alerts are
evaluated during
streaming
Distributed Streaming
AlertExecutor_{1}
AlertExecutor_{2}
…
AlertExecutor_{N}
Real Time
Alerts
Alerts
Stream_{1}
Stream_{*}
Stream
Processing
Real-time
Event Stream
Apache Eagle – Distributed Real-time Policy
Engine
21
Metadata-Driven
• Stream Schema:
AlertStreamSchemaEntity
• Policy Definition: AlertDefinitionAPIEntity
@Table("alertdef")
@ColumnFamily("f")
@Prefix("alertdef")
@Service(AlertConstants.ALERT_DEFINITION_SERVICE_ENDPOINT_NAME)
@JsonIgnoreProperties(ignoreUnknown = true)
@TimeSeries(false)
@Tags({"site", "dataSource", "alertExecutorId", "policyId", "policyType"})
@Indexes({
@Index(name="Index_1_alertExecutorId", columns = { "alertExecutorID" }, unique = true),
})
public class AlertDefinitionAPIEntity extends TaggedLogAPIEntity{
@Column("a")
private String desc;
@Column("b")
private String policyDef;
@Column("c")
private String dedupeDef;
METADATA MANAGER
Distributed Real-time Policy Engine
Dynamic Metadata Loading
Apache Eagle – Distributed Real-time Policy
Engine
22
Distributed Streaming Cluster Environment
AlertExecutor_{1}
AlertExecutor_{2}
…
AlertExecutor_{N}
Stream_{1}
Stream_{*}
Stream
Processing
Scalability
• Policy scalability: policy partitioning
• Event scalability: grouping
• Example: N Users with 3 partitions, M policies with 2 partitions, then 3*2 physical tasks
Apache Eagle – Query Framework
23
Query Syntax
• Full-function SQL-Like REST
Query (aggregation, sorting…)
Eagle Storage
• NOSQL storage like HBase
• RDMS
• Other storage systems
Apache Eagle – ML-based Anomaly Detection
24
User Activity Anomaly
Detection
• User profile feature
selection
• Offline user profile
generation
• Online Anomaly
detection
Useful link
• Eagle: User profile-
based anomaly
detection for securing
Hadoop clusters
Apache Eagle – Integration I
25
• Eagle in Apache Ambari
– natively be part of hadoop ecosystem
– http://eagle.incubator.apache.org/docs/ambari-plugin-install.html
• Eagle in Docker
– natively fly on Cloud/Container
– https://github.com/apache/incubator-eagle
Apache Eagle – Integration II
26
•Apache Ranger
– remediation engine
– Eagle data source
•Splunk
– Eagle alert consumer
– EAGLE alert output is the 1st abstraction of analytics and Splunk is the 2nd abstraction
• Dataguise, Apache knox
– Eagle data source
Learn more about Apache Eagle
27
• EAGLE: USER PROFILE-BASED ANOMALY DETECTION IN HADOOP CLUSTER (IEEE)
• EAGLE: DISTRIBUTED REALTIME MONITORING FRAMEWORK FOR HADOOP
CLUSTER
Q&A
28
apache/incubator-eagle
@TheApacheEagle
@ApacheEagle
http://eagle.incubator.apache.org

More Related Content

What's hot

Apache Storm
Apache StormApache Storm
Apache Storm
Edureka!
 

What's hot (20)

Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics ...
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
 
Building large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twillBuilding large scale applications in yarn with apache twill
Building large scale applications in yarn with apache twill
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014Storm – Streaming Data Analytics at Scale - StampedeCon 2014
Storm – Streaming Data Analytics at Scale - StampedeCon 2014
 
Comparing Accumulo, Cassandra, and HBase
Comparing Accumulo, Cassandra, and HBaseComparing Accumulo, Cassandra, and HBase
Comparing Accumulo, Cassandra, and HBase
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Data automation 101
Data automation 101Data automation 101
Data automation 101
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
 
Opal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsOpal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific Applications
 
Druid Scaling Realtime Analytics
Druid Scaling Realtime AnalyticsDruid Scaling Realtime Analytics
Druid Scaling Realtime Analytics
 
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache TezYahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
 
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
Towards Benchmaking Modern Distruibuted Systems-(Grace Huang, Intel)
 
Conviva spark
Conviva sparkConviva spark
Conviva spark
 
Harnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache TwillHarnessing the power of YARN with Apache Twill
Harnessing the power of YARN with Apache Twill
 

Similar to Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

Monitoring docker, k8s and your applications with the elastic stack
Monitoring docker, k8s and your applications with the elastic stackMonitoring docker, k8s and your applications with the elastic stack
Monitoring docker, k8s and your applications with the elastic stack
SmartWave
 
Oscon London 2016 - Docker from Development to Production
Oscon London 2016 - Docker from Development to ProductionOscon London 2016 - Docker from Development to Production
Oscon London 2016 - Docker from Development to Production
Patrick Chanezon
 

Similar to Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎 (20)

Apache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real TimeApache Eagle: Secure Hadoop in Real Time
Apache Eagle: Secure Hadoop in Real Time
 
ThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptxThroughTheLookingGlass_EffectiveObservability.pptx
ThroughTheLookingGlass_EffectiveObservability.pptx
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
4aa5 3404
4aa5 34044aa5 3404
4aa5 3404
 
Microservices and modularity with java
Microservices and modularity with javaMicroservices and modularity with java
Microservices and modularity with java
 
Monitoring Kubernetes with Elasticsearch Services - Ted Jung, Consulting Arch...
Monitoring Kubernetes with Elasticsearch Services - Ted Jung, Consulting Arch...Monitoring Kubernetes with Elasticsearch Services - Ted Jung, Consulting Arch...
Monitoring Kubernetes with Elasticsearch Services - Ted Jung, Consulting Arch...
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Monitoring docker, k8s and your applications with the elastic stack
Monitoring docker, k8s and your applications with the elastic stackMonitoring docker, k8s and your applications with the elastic stack
Monitoring docker, k8s and your applications with the elastic stack
 
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics ReadinessAlabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
Alabama CyberNow 2018: Cloud Hardening and Digital Forensics Readiness
 
DCEU 18: Docker Enterprise Platform and Architecture
DCEU 18: Docker Enterprise Platform and ArchitectureDCEU 18: Docker Enterprise Platform and Architecture
DCEU 18: Docker Enterprise Platform and Architecture
 
Oscon London 2016 - Docker from Development to Production
Oscon London 2016 - Docker from Development to ProductionOscon London 2016 - Docker from Development to Production
Oscon London 2016 - Docker from Development to Production
 
L04 base patterns
L04 base patternsL04 base patterns
L04 base patterns
 
API Tips & Tricks - Policy Management and Elastic Deployment
API Tips & Tricks - Policy Management and Elastic DeploymentAPI Tips & Tricks - Policy Management and Elastic Deployment
API Tips & Tricks - Policy Management and Elastic Deployment
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
 
Security on AWS
Security on AWSSecurity on AWS
Security on AWS
 
2020 07-30 elastic agent + ingest management
2020 07-30 elastic agent + ingest management2020 07-30 elastic agent + ingest management
2020 07-30 elastic agent + ingest management
 
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus Monitoring, the Prometheus Way - Julius Voltz, Prometheus
Monitoring, the Prometheus Way - Julius Voltz, Prometheus
 
What's new in the July 2017 Update for Dynamics 365 - Developer features
What's new in the July 2017 Update for Dynamics 365 - Developer featuresWhat's new in the July 2017 Update for Dynamics 365 - Developer features
What's new in the July 2017 Update for Dynamics 365 - Developer features
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
NextGenML
NextGenML NextGenML
NextGenML
 

Recently uploaded

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 

Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎

  • 2. Agenda •About Eagle •Front End – Evolution – Modularization – Features •Back End – Architecture – Tech Highlights – Integration •Q & A 2
  • 3. 3 Apache Eagle is a distributed real-time monitoring and alerting engine for Hadoop from eBay Open sourced as Apache Incubator Project on Oct 26th 2015 See http://eagle.incubator.apache.org or http://goeagle.io
  • 4. Hadoop @eBay 4 1-10 nodes 2007 100+ nodes 1000 + core 1 PB 2010 2011 1000+ node 10,000+ core 10+ PB 4000+ node 40,000+ core 50+ PB 2013 2015 10,000+ nodes 150,000+ cores 150+ PB 2009 10+ nodes
  • 11. Architecture 11 STREAM PROCESSING ENGINE User Profile based Anomaly detection Policy evaluation based Framework Eagle Storage (Metadata, metrics, alerts… User Profile training Eagle Query DataCollection(Kafka,Yarn API…) Had oop jmx DataSink(email,Kakfa…) Other Remediation Systems …
  • 12. Tech Highlights •Data Collection •Stream Processing DSL •Distributed Policy Engine •ML-based anomaly detection •Query Framework NOTE {NAME}-{NUMBER} like HDFS-6914 means open source project ticket id contributed by us 12
  • 13. Apache Eagle – Data Collection Decoupled with Apache Kafka • high-throughput distributed messaging • Easy to inject various kinds of data sources • Python/Java/C++ Kafka clients Current data sources support • Hadoop data  HDFS, HBase audit log  GC logs  JMX metrics  History/Running MR job data • … • Generic format data 13
  • 14. Apache Eagle – Stream Processing DSL Easy use – Easily assemble data transformation, filtering, join… Flexibility – Physical execution platform independent 14 STREAM PROCESSING ENGINE STREAM PROCESSING ENGINE .flatMap(AuditLogTransformer) .groupBy(_.user) .flatMap(UserProfileAggregator); env.fromKafka (KafkaConfig) .alert.persistAndEmail val env = ExecutionEnvironment.getStorm() env.execute()
  • 15. Apache Eagle – Stream Processing DSL 15 .flatMap(AuditLogTransformer) .groupBy(_.user) .flatMap(UserProfileAggregator); env.fromKafka (KafkaConfig) .alert.persistAndEmail val env = ExecutionEnvironment.getStorm() env.execute(); Distributed Streaming Cluster Environment AlertExecutor_{1} AlertExecutor_{2} … AlertExecutor_{N} Alerts Real-time Event Stream Stream_{1} Stream_{*} Stream Processing env.execute()
  • 16. Apache Eagle - Distributed Real-time Policy Engine Features • Extensibility • Usability • Real-time • Scalability • Metadata-driven 16 METADATA MANAGER Distributed Streaming Cluster Environment AlertExecutor_{1} AlertExecutor_{2} … AlertExecutor_{N} Real Time Alerts Alerts Policy Management Policy Dynamical Policy Deployment Real-time Event Stream Stream_{1} Stream_{*} Dynamical Stream Schema Stream Processing
  • 17. Apache Eagle – Distributed Real-time Policy Engine 17 Distributed Real-time Policy Engine Siddhi CEP Policy Evaluator Machine Learning Policy Evaluator Extensibility • Default is WSO2 Siddhi CEP • Powerful SQL-Like event stream processing • Open to other customized policy engine Extensible Policy Evaluator public interface PolicyEvaluatorServiceProvider { public String getPolicyType(); // literal string to identify one type of policy public Class<? extends PolicyEvaluator<T>> getPolicyEvaluator(); // get policy evaluator implementation public List getBindingModules(); // policy text with json format to object mapping } public interface PolicyEvaluator { public void evaluate(ValuesArray input) throws Exception; // evaluate input event public void onPolicyUpdate(AlertDefinitionAPIEntity newAlertDef);// policy update public void onPolicyDelete(); // invoked when policy is deleted } METADATA MANAGER Policy/Metadata
  • 18. Apache Eagle – Distributed Real-time Policy Engine 18 METADATA MANAGER Distributed Streaming Cluster Environment Real Time Alerts Alerts Policy Management Policy Dynamical Policy Deployment Usability • Powerful SQL-Like CEP CQL for Policy Definition • Dynamical Policy Lifecycle Management (Deployment/Update) • Easy-to-use Policy management and Alert analytics UI from metricStream[(name == 'ReplLag') and (value > 1000)] select * insert into outputStream;
  • 19. Apache Eagle – Distributed Real-time Policy Engine 19
  • 20. Apache Eagle – Distributed Real-time Policy Engine 20 Real-time • Stream events are processed and alerts are evaluated during streaming Distributed Streaming AlertExecutor_{1} AlertExecutor_{2} … AlertExecutor_{N} Real Time Alerts Alerts Stream_{1} Stream_{*} Stream Processing Real-time Event Stream
  • 21. Apache Eagle – Distributed Real-time Policy Engine 21 Metadata-Driven • Stream Schema: AlertStreamSchemaEntity • Policy Definition: AlertDefinitionAPIEntity @Table("alertdef") @ColumnFamily("f") @Prefix("alertdef") @Service(AlertConstants.ALERT_DEFINITION_SERVICE_ENDPOINT_NAME) @JsonIgnoreProperties(ignoreUnknown = true) @TimeSeries(false) @Tags({"site", "dataSource", "alertExecutorId", "policyId", "policyType"}) @Indexes({ @Index(name="Index_1_alertExecutorId", columns = { "alertExecutorID" }, unique = true), }) public class AlertDefinitionAPIEntity extends TaggedLogAPIEntity{ @Column("a") private String desc; @Column("b") private String policyDef; @Column("c") private String dedupeDef; METADATA MANAGER Distributed Real-time Policy Engine Dynamic Metadata Loading
  • 22. Apache Eagle – Distributed Real-time Policy Engine 22 Distributed Streaming Cluster Environment AlertExecutor_{1} AlertExecutor_{2} … AlertExecutor_{N} Stream_{1} Stream_{*} Stream Processing Scalability • Policy scalability: policy partitioning • Event scalability: grouping • Example: N Users with 3 partitions, M policies with 2 partitions, then 3*2 physical tasks
  • 23. Apache Eagle – Query Framework 23 Query Syntax • Full-function SQL-Like REST Query (aggregation, sorting…) Eagle Storage • NOSQL storage like HBase • RDMS • Other storage systems
  • 24. Apache Eagle – ML-based Anomaly Detection 24 User Activity Anomaly Detection • User profile feature selection • Offline user profile generation • Online Anomaly detection Useful link • Eagle: User profile- based anomaly detection for securing Hadoop clusters
  • 25. Apache Eagle – Integration I 25 • Eagle in Apache Ambari – natively be part of hadoop ecosystem – http://eagle.incubator.apache.org/docs/ambari-plugin-install.html • Eagle in Docker – natively fly on Cloud/Container – https://github.com/apache/incubator-eagle
  • 26. Apache Eagle – Integration II 26 •Apache Ranger – remediation engine – Eagle data source •Splunk – Eagle alert consumer – EAGLE alert output is the 1st abstraction of analytics and Splunk is the 2nd abstraction • Dataguise, Apache knox – Eagle data source
  • 27. Learn more about Apache Eagle 27 • EAGLE: USER PROFILE-BASED ANOMALY DETECTION IN HADOOP CLUSTER (IEEE) • EAGLE: DISTRIBUTED REALTIME MONITORING FRAMEWORK FOR HADOOP CLUSTER

Editor's Notes

  1. 三部分: 数据收集 + 实时流处理 + metric/Alerts sink Eagle实时流处理框架中重点两部分: 基于Policy evaluation的Alerting框架 和 基于ML的自动异常检测模块
  2. 数据接入以Apache Kafka解耦, Kafka作为数据源接口,1)提供分布式高吞吐消息传输 2)有各种语言支持的client方便数据导入 Eagle处理目前可以处理Hadoop领域各种logs(GC, audit log), Jmx Metrics并提供预警功能;同时,也支持用户自定义格数的数据处理
  3. Eagle提供流处理DSL,用户可以轻松定义数据处理逻辑图 并且保持底层执行平台独立性 用户可以自由选择底层的数据流处理平台,目前eagle默认使用storm,也可以很方便切换到 Flink, Spark等流处理引擎上。
  4. 这个Eagle代码到逻辑流处理图的映射图
  5. Policy Engine位于 Eagle数据流处理的最后一个节点AlertExecutor,负责动态policy加载、对事件流进行policy评估等。 具有 Extensibility Usability Real-time Scalability Meta-driven
  6. We use WSO2 Siddhi as first class policy engine, but CEP engine can’t cover everything, for example node anomaly detection – we compare all the nodes in the cluster in some time window. Eagle Policy evaluation引擎默认使用Siddhi作为底层 实时事件处理器 CEP 1)实时复杂数据流处理引擎 2)提供Powerful SQL-like的事件流处理逻辑定义 除了基于siddhi的policy evaluator,用户还可以轻松定义自己的Policy evaluator,比如后面提到的基于ML training model的policy evaluator。
  7. 1)EAGLE默认支持的 Siddhi CEP具有很强大的复杂逻辑处理能力,以支持各种复杂的policy定义 2)Eagle有动态Policy生命周期管理,可以实时更新policy 3)很友好的policy定义界面,用户不需要关心底层复杂policy定义语法。
  8. Metadata 来自两个方面: 数据流本身的schema定义 和 policy定义的metadata 基于metadata设计的好处是:不论数据流怎么变化,policy定义怎么负责,Policy Engine一样可以工作。
  9. ML-based Anomaly Detection 的基本思路 1)选取描述用户行为的基本特征的feature 2)Offline模式training出用户异常行为的model/policy 3)基于ML policy evaluator做实时异常detection
  10. Eagle作为Hadoop生态圈的产物,提供了Ambari的插件,让Eagle更好地称为生态圈的一部分。 同时,由于EAGLE依赖的多个底层组件(HBASE, KAFKA, STORM,HDFS),我们提供了Docker的部署方式,用户可以方便的搭建起EAGLE
  11. Eagle provides comprehensive solution to secure sensitive data stored in Hadoop. EAGLE提供更全面的数据数据安全解决方案。