SlideShare a Scribd company logo
1 of 32
Download to read offline
Apache Eagle: Secure Hadoop in Real Time
http://eagle.incubator.apache.org/
Hao Chen (@haozch) | Ralph Su (@raphaelsu)
About Us
Apache Eagle Co-creator,PMC & Committer
hao@apache.org
Sr. Software Engineer at eBay
hchen9@ebay.com
Apache Eagle PMC & Committer
ralphsu@apache.org
MTS. Software Engineer at eBay
liasu@ebay.com
Ralph Su / 苏良飞
Hao Chen / 陈浩
Agenda
• Introduction
• Architecture
• Case Study
• Q & A
is a distributed real-time monitoring and alerting
engine for hadoop from eBay
Open sourced as Apache Incubator Project on Oct 26th 2015
Secure Hadoop in Realtime a data activity monitoring solution to instantly identify
access to sensitive data, recognize attacks/ malicious activity and block access in real
time.
See http://eagle.incubator.apache.org or http://github.com/apache/incubator-eagle
Apache Eagle
Eagle	
  was	
  initialized	
  by	
  end	
  of	
  2013	
  for	
  hadoop ecosystem	
  monitoring	
  as	
  any	
  existing	
  
tool	
  like	
  zabbix,	
  ganglia	
  can	
  not	
  handle	
  the	
  huge	
  volume	
  of	
  metrics/logs	
  generated	
  by	
  
hadoop system	
  in	
  eBay.
2014/2015
10,000	
  nodes
150,000+	
  cores
200	
  PB
2000+	
  user
3000+	
  nodes
10,000+	
  cores
50+	
  PB
2012
2011
1000+	
  nodes
10,000+	
  cores
10+	
  PB
100+	
  nodes
1000	
  +	
  cores
1	
  PB
2010
2009
50+	
  nodes
2007
1-­‐10	
  nodes
Hadoop	
  Data	
  
• Security
• Activity
Hadoop	
  Platform	
  
• Heath
• Availability
• Performance
Hadoop	
  @	
  eBay	
  Inc
Initiative	
  – Why	
  build	
  Eagle?
7+	
   CLUSTERS
10000+ NODES
200+	
  PB DATA
10	
  B+	
   EVENTS	
  /	
  DAY
500+	
   METRIC	
  TYPES
50,000+	
   JOBS	
  /	
  DAY
50,000,000+	
   TASKS	
  /	
  DAY
Initiative	
  – Core	
  Challenges
Large	
  Scale	
  Processing	
  and	
  Storage
• Scale	
  IO	
  Complexity	
  (Stream)
• Scale	
  Computation	
  Complexity	
  (Policy)
• Scale	
  Storage	
  &	
  Query	
  (Event,	
  Log,	
  Metric)
1
Real-­‐Time	
  Alerting
• Real-­‐time	
  Data	
  Collection
• Real-­‐time	
  Stream	
  Processing
• Real-­‐time	
  Alerting
2
Expressive Correlation	
  Model
• Complex	
  policy	
  model
• Stream	
  GroupBy,	
  Join,	
  Window
• Machine	
  Learning
3
Hadoop	
  Ecosystem	
  Integration
• Data	
  Source	
  Integration
• Anomaly	
  Detection	
  Model
4
1
Eagle	
  Framework
Distributed	
  real-­‐time	
  framework	
  for	
  efficiently	
  developing	
   highly	
  
scalable	
  monitoring	
   applications
Eagle	
  Ecosystem
2 Eagle	
  Apps
Security/	
  Hadoop/	
  Operational	
  Intelligence	
  /	
  …
3 Eagle	
  Interface
REST	
  Service	
  /	
  Management	
  UI	
  /	
  Customizable	
  Analytics	
  
Visualization
4 Eagle	
  Integration
Ambari /	
  Docker /	
  Ranger	
  /	
  Dataguise
Apps
à Security
à Hadoop
à Cloud
à Database
Interface
à Web Portal
à REST Services
à Analytics Visualization
Integration
à Ambari
à Docker
à Ranger
à Dataguise Eagle
Framework
Open	
  Source
Community-­‐driven	
   and	
  Cross-­‐community	
  cooperation
5
Eagle	
  @	
  eBay	
  Inc.
7+	
   CLUSTERS
10000+ NODES
200+	
  PB DATA
10	
  B+	
   EVENTS	
  /	
  DAY
500+	
   METRIC	
  TYPES
50,000+	
   JOBS	
  /	
  DAY
50,000,000+	
   TASKS	
  /	
  DAY
Eagle	
  Deployment	
  at	
  eBay	
  Production
• 100+	
  security	
  policies
• 8	
  nodes
• 30	
  worker	
  process
• 64	
  kafka partition
Eagle	
  Performance
• Avg Latency:	
   ~	
  50	
  ms
• Max	
  Throughput/Cluster:	
  300	
  k	
  /s
Eagle	
  Architecture
Distributed	
  Policy	
  Engine
METADATA   MANAGER
AlertExecutor_{1}
AlertExecutor_{2}
…
AlertExecutor_{N}
Real	
  Time	
  Alerts
Alerts
Policy	
  
Management
Policy
Dynamical	
   Policy	
   Deployment
Real-­‐time	
  
Event	
  Stream
Stream_{1}
Stream_{*}
Dynamical	
   Stream	
  Schema
Stream	
  
Processing
• Real-­‐time	
  Streaming: Apache	
  Storm	
  (Execution	
  Engine)	
  +	
  Kafka	
  (Message	
  Bus)
• Declarative	
  Policy: SQL	
  (CEP)	
  on	
  Streaming	
  +	
  Hot	
  Deploy
• Linear	
  Scalability:	
  Data	
  volume scale +	
  Computation	
  scale
• Metadata-­‐Driven:	
  Schema	
  Management	
  and	
  Dynamical	
  Policy	
  Lifecycle
METADATA   MANAGER
AlertExecutor_{1}
AlertExecutor_{2}
…
AlertExecutor_{N}
Real	
  Time	
  Alerts
Alerts
Policy	
  
Management
Policy
Dynamical	
   Policy	
   Deployment
Real-­‐time	
  
Event	
  Stream
Stream_{1}
Stream_{*}
Dynamical	
   Stream	
  Schema
Stream	
  
Processing
from MetricStream[(name == 'ReplLag') and (value > 1000)]
select * insert into outputStream;
• Real-­‐time	
  Streaming: Apache	
  Storm	
  (Execution	
  Engine)	
  +	
  Kafka	
  (Message	
  Bus)
• Declarative	
  Policy: SQL	
  (CEP)	
  on	
  Streaming	
  +	
  Hot	
  Deploy
• Linear	
  Scalability:	
  Data	
  volume scale +	
  Computation	
  scale
• Metadata-­‐Driven:	
  Schema	
  Management	
  and	
  Dynamical	
  Policy	
  Lifecycle
Distributed	
  Policy	
  Engine
Declarative	
  Policy
• Filter
• Join
• Aggregation:	
  Avg,	
  Sum	
  ,	
  Min,	
  Max,	
  etc
• Group by
• Having
• Stream handlers	
  for	
  window:	
  TimeWindow,	
  Batch	
  Window,	
  
Length	
  Window	
  
• Conditions	
  and	
  Expressions:	
  and,	
  or,	
  not,	
  ==,!=,	
  >=,	
  >,	
  <=,	
  <,	
  
and	
  arithmetic	
  operations
• Pattern	
  Processing
• Sequence	
  processing
• Event	
  Tables:	
  intergrate historical	
  data	
  in	
  realtime processing
• SQL-­‐Like	
  Query:	
  Query,	
  Stream	
  Definition	
   and	
  Query	
  Plan	
  
compilation
Distributed	
  SQL	
  on	
  Streaming	
  :	
  
Siddhi	
  CEP	
  +	
  Storm	
  by	
  default
from MetricStream[(name == 'ReplLag') and (value > 1000)] select * insert into
outputStream;
Declarative	
  Policy	
  -­‐ Examples
from hadoopJmxMetricEventStream
[metric == "hadoop.namenode.fsnamesystemstate.capacityused" and value > 0.9]
select metric, host, value, timestamp, component, site insert into alertStream;
Example	
  1:	
  Alert	
  if	
  hadoop namenode capacity	
  usage	
  exceed	
  90	
  percentages
from every
a = hadoopJmxMetricEventStream[metric=="hadoop.namenode.fsnamesystem.hastate"]
->
b = hadoopJmxMetricEventStream[metric==a.metric and b.host == a.host and
a.value != value)]
within 10 min
select a.host, a.value as oldHaState, b.value as newHaState, b.timestamp as
timestamp, b.metric as metric, b.component as component, b.site as site insert
into alertStream;
Example	
  2:	
  Alert	
  if	
  hadoop namenode HA	
  switches
1
Distributed	
  Policy	
  Engine	
  -­‐ Scalability
2
Distributed   Streaming   Cluster  Environment
AlertExecutor_{1}
AlertExecutor_{2}
…
AlertExecutor_{N}
Stream_{1}
Stream_{*}
Stream	
  
Processing
Dynamic	
  policy	
  partition	
  by	
  {event}	
  *	
  {policy}
• N	
  Users	
  with	
  3	
  partitions,	
  M	
  policies	
  with	
  2	
  partitions,	
  then	
  3*2	
  physical	
  tasks
• Physical	
  partition	
  +	
  policy-­‐level	
  partition
Linear	
  Scalability Principle
3
Algorithm Weights	
  of	
  Executors	
   By	
  Partition	
   User
Random 0.0484 0.152 0.3535 0.105 0.203 0.072 0.042 0.024
Greedy 0.0837 0.0837 0.0837 0.0837 0.0737 0.0637 0.0437 0.0837
Stream	
  Partition	
  Skew	
  (15:1)
Distributed	
  Policy	
  Engine – Optimization
Stream	
  Partition	
  Problem	
  	
  
https://en.wikipedia.org/wiki/Partition_problem
Distributed  Real-­time    Policy  Engine
Siddhi	
  CEP	
  Policy	
  
Evaluator
Machine	
  Learning	
  
Policy	
  Evaluator• Support	
  WSO2	
  Siddhi	
  CEP	
  as	
  first	
  class
• Extensible	
  Policy	
  Engine	
  Implementation
• Extensible	
  Policy	
  Lifecycle	
  Management
• Metadata-­‐based	
  Module	
  Management
Extensible	
   Policy	
  
Evaluator
public interface PolicyEvaluatorServiceProvider {
public String getPolicyType(); // literal string to identify one type of policy
public Class getPolicyEvaluator(); // get policy evaluator implementation
public List getBindingModules(); // policy text with json format to object mapping
}
METADATA  MANAGER
Policy/Metadata
Distributed	
  Policy	
  Engine – Extensibility
Policy	
  Engine	
  Extensibility
Stream	
  Processing	
  Framework
Optimizer
1.	
  Development 2.	
  Optimization 3.	
  Compile	
  to	
  native	
  app
Use	
  eagle	
  alert	
  framework	
  as	
  library
• Light-­‐weight	
  ORM	
  Framework	
  for	
  HBase/RDMBS
• Full-­‐function	
   SQL-­‐Like	
  REST	
  Query	
  
• Optimized	
  Rowkey design	
  for	
  time-­‐series	
  data
• Native	
  HBase Coprocessor
• Secondary	
  Index	
  Support
@Table("alertdef")
@ColumnFamily("f")
@Prefix("alertdef")
@Service(AlertConstants.ALERT_DEFINITION_SERVICE_ENDPOINT_NAME)
@JsonIgnoreProperties(ignoreUnknown = true)
@TimeSeries(false)
@Tags({"site", "dataSource", "alertExecutorId", "policyId",
"policyType"})
@Indexes({
@Index(name="Index_1_alertExecutorId", columns = { "alertExecutorID"
}, unique = true),
})
public class AlertDefinitionAPIEntity extends TaggedLogAPIEntity{
@Column("a")
private String desc;
@Column("b")
private String policyDef;
@Column("c")
private String dedupeDef;
Query=AlertDefinitionService[@dataSource="hiveQueryLog"]{@policyDef}
Large	
  Scale	
  Storage	
  and	
  Query
Uniform	
  HBase rowkey design
• Metric
• Entity
• Log
Rowkey ::= Prefix | Partition Keys | timestamp | tagName | tagValue | …
Rowkey ::= Metric Name | Partition Keys | timestamp | tagName | tagValue | …
Rowkey ::= Default Prefix | Partition Keys | timestamp | tagName | tagValue | …
Rowkey ::= Log Type | Partition Keys | timestamp | tagName | tagValue | …
Rowvalue ::= Log Content
Large	
  Scale	
  Storage	
  and	
  Query
http://opentsdb.net
Multi-­‐Tenants	
  – Topology	
  Scheduler
• Dynamical	
  Topology	
  Management
• No-­‐downtime	
  Topology	
  Maintenance
• Topology	
  High	
  Availability	
  &	
  Balance
• Resource	
  Scheduling	
  &	
  Isolation:	
  Runtime,Woker,	
  Topology	
  or	
  Cluster
Multi-­‐Tenants	
  -­‐ Dynamical	
  Correlation
• Dynamical	
  Correlation	
  on	
  Runtime:
• Sort,	
  Groupby,	
  Join,	
  Window
• Hot	
  Deploy	
  Logic
• Policy	
  Management
• Multi-­‐Correlation	
  on	
  the	
  Single	
  Stream
• Group	
  by	
  different	
  fields	
  of	
  same	
  stream
• Resort	
  same	
  stream	
  by	
  different	
  order
• Join	
  certain	
  stream	
  in	
  different	
  way
• Multi	
  Correlation	
  on	
  Multi	
  Streams
• Cross	
  Streams	
  Join
• Real-­‐time	
  &	
  Historical	
  Stream	
  Join
Multi-­‐Tenants	
  -­‐ Dynamical	
  Correlation
Eagle	
  Use	
  Cases
Data	
  Activity	
  Monitoring
Secure	
  Hadoop	
  in	
  Realtime a	
  data	
  activity	
  monitoring	
  solution	
  to	
  instantly	
  identify	
  access	
  to	
  sensitive	
  data,	
  	
  recognize	
  
attacks/	
  malicious	
  activity	
  and	
  block	
  access	
  in	
  real	
  time.
Job	
  Performance	
  Monitoring
Hadoop,	
  Spark	
  Job	
  Profiling	
  &	
  Performance	
  Monitoring,	
  Cluster	
  Health	
  Anomaly	
  Detection
1
2
4
eBay	
  Unified	
  Monitoring	
  Platform
Provide	
  unified	
  monitoring-­‐as-­‐service	
  for	
  everything	
  around	
  infrastructure	
  or	
  business.
3
Eagle	
  Data	
  Activity	
  Monitoring
Data	
  Loss	
  Prevention
Get	
  alerted	
  and	
  stop	
  a	
  malicious	
  user	
  trying	
  to	
  copy,	
  delete,	
  move	
  
sensitive	
  data	
  from	
  the	
  Hadoop	
  cluster.
Malicious	
  Logins
Detect	
  login	
  when	
  malicious	
  user	
  tries	
  to	
  guess	
  password.	
  Eagle	
  
creates	
  user	
  profiles	
  using	
  machine	
  learning	
  algorithm	
  to	
  detect	
  
anomalies
Unauthorized	
  access
Detect	
  and	
  stop	
  a	
  malicious	
  user	
  trying	
  to	
  access	
  classified	
  data	
  
without	
  privilege.	
  
Malicious	
  user	
  operation
Detect	
  and	
  stop	
  a	
  malicious	
  user	
  trying	
  to	
  delete	
  large	
  amount	
  of	
  
data.	
  Operation	
  type	
  is	
  one	
  parameter	
  of	
  Eagle	
  user	
  profiles.	
  Eagle	
  
supports	
  multiple	
  native	
  operation	
  types.
User
Privileges
Common	
  
Data	
  Sets
Patterns
CommandsZones
Query
Columns
§ HDFS	
  Policies
§ Access	
  to	
  Sensitive	
  files
§ HDFS	
  Commands	
  used	
  (read,	
  write,	
  update…)
§ Client	
  host	
  
§ Destination
§ Security	
  Zones
§ Hive	
  Policies
§ Access	
  to	
  tables	
  with	
  PII	
  Data
§ SQL	
  Query	
  Profiles
§ Client	
  Host
§ Security	
  Zone
Eagle	
  Data	
  Activity	
  Monitoring
Hadoop	
  Data	
  Security:	
  Detect	
  anomalies	
  in	
  accessing	
  HDFS	
  and	
  Hive
Offline:	
  Determine	
  bandwidth	
  from	
  training	
  dataset	
  the	
  kernel	
  density	
  function	
   parameters	
  (KDE)
Online:	
  If	
  a	
  test	
  data	
  point	
  lies	
  outside	
  the	
  trained	
  bandwidth,	
   it	
  is	
  anomaly	
  (Policy)
PCs(Principle	
  Components)	
  in	
  EVD
(Eigenvalue	
  Value	
  Decomposition)
Kernel	
  Density	
  Function
Eagle	
  Machine	
  Learning	
  User	
  Profile
Use	
  Case Detect	
  node	
  anomaly	
  by	
  analyzing	
  task	
  failure	
  ratio	
  across	
  all	
  nodes
Assumption Task	
  failure	
  ratio	
  for	
  every	
  node	
  should	
  be	
  approximately	
  equal
Algorithm Node	
  by	
  node	
  compare	
  (symmetry	
  violation)	
  and	
  per	
  node	
  trend
Eagle	
  Job	
  Performance	
  Monitoring
Alerting:	
  
Anomaly	
  Detection	
  Alerting
Insight:	
  
Task	
  failure	
  drill-­‐down
Insight:	
  
Task	
  failure	
  drill-­‐down
Task	
  Failure	
  based	
  Anomaly	
  Host	
  Detection	
  
Counters  &  Features
Use	
  Case Detect	
  data	
  skew	
  by	
  statistics	
  and	
  distributions	
  for	
  attempt	
  execution	
  durations	
  and	
  
counters
Assumption	
  	
  	
  Duration	
  and	
  counters	
  should	
  be	
  in	
  normal	
  distribution
mapDuration
reduceDuration
mapInputRecords
reduceInputRecords
combineInputRecords
mapSpilledRecords
reduceShuffleRecords
mapLocalFileBytesRead
reduceLocalFileBytesRead
mapHDFSBytesRead
reduceHDFSBytesRead
Modeling  &  Statistics
Avg
Min
Max  
Distributions
Max  z-­score
Top-­N
Correlation
Threshold  &  Detection
Counters
Correlation  >  0.9  
&  Max(Z-­Score)  >  90%  
Hadoop	
  Job	
  Data	
  Skew	
  Detection
Counters  &  Features
Use	
  Case Detect	
  data	
  skew	
  by	
  statistics	
  and	
  distributions	
  for	
  attempt	
  execution	
  durations	
  and	
  
counters
Assumption	
  	
  	
  Duration	
  and	
  counters	
  should	
  be	
  in	
  normal	
  distribution
mapDuration
reduceDuration
mapInputRecords
reduceInputRecords
combineInputRecords
mapSpilledRecords
reduceShuffleRecords
mapLocalFileBytesRead
reduceLocalFileBytesRead
mapHDFSBytesRead
reduceHDFSBytesRead
Modeling  &  Statistics
Avg
Min
Max  
Distributions
Max  z-­score
Top-­N
Correlation
Threshold  &  Detection
Counters
Correlation  >  0.9  
&  Max(Z-­Score)  >  90%  
Hadoop	
  Job	
  Data	
  Skew	
  Detection
Learn	
  More	
  about	
  Apache	
  Eagle
Community
• Website:	
  http://eagle.incubator.apache.org
• Github:	
  http://github.com/apache/incubator-­‐eagle
• Mailing	
  list:	
  dev@eagle.incubator.apache.org
Resources
• Documentation:	
  http://eagle.incubator.apache.org/docs/
• Docker images:	
  https://hub.docker.com/r/apacheeagle/sandbox/
Publications	
  &	
  Patents
• EAGLE:	
  USER	
  PROFILE-­‐BASED	
  ANOMALY	
  DETECTION	
  IN	
  HADOOP	
  CLUSTER	
  (IEEE)
• EAGLE:	
  DISTRIBUTED	
  REALTIME	
  MONITORING	
  FRAMEWORK	
  FOR	
  HADOOP	
  CLUSTER
Q	
  &	
  A
http://eagle.incubator.apache.org
The	
  slide	
  is	
  licensed	
  under	
  Creative	
  Commons	
  Attribution	
  4.0	
  International	
  license.

More Related Content

What's hot

Apache solr performance and scalability effort update palo alto 2017%2 f7
Apache solr performance and scalability effort update palo alto 2017%2 f7Apache solr performance and scalability effort update palo alto 2017%2 f7
Apache solr performance and scalability effort update palo alto 2017%2 f7Cloudera, Inc.
 
(BDT210) Building Scalable Big Data Solutions: Intel & AOL
(BDT210) Building Scalable Big Data Solutions: Intel & AOL(BDT210) Building Scalable Big Data Solutions: Intel & AOL
(BDT210) Building Scalable Big Data Solutions: Intel & AOLAmazon Web Services
 
Opal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsOpal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsSriram Krishnan
 
Integrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesIntegrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesDataWorks Summit
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkChester Chen
 
myHadoop - Hadoop-on-Demand on Traditional HPC Resources
myHadoop - Hadoop-on-Demand on Traditional HPC ResourcesmyHadoop - Hadoop-on-Demand on Traditional HPC Resources
myHadoop - Hadoop-on-Demand on Traditional HPC ResourcesSriram Krishnan
 
Security From The Big Data and Analytics Perspective
Security From The Big Data and Analytics PerspectiveSecurity From The Big Data and Analytics Perspective
Security From The Big Data and Analytics PerspectiveAll Things Open
 
AWS Sydney Summit 2013 - Optimizing AWS Applications and Usage to Reduce Costs
AWS Sydney Summit 2013 - Optimizing AWS Applications and Usage to Reduce CostsAWS Sydney Summit 2013 - Optimizing AWS Applications and Usage to Reduce Costs
AWS Sydney Summit 2013 - Optimizing AWS Applications and Usage to Reduce CostsAmazon Web Services
 
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
[SmartNews] Globally Scalable Web Document Classification Using Word2VecKouhei Nakaji
 
A streaming architecture for Cyber Security - Apache Metron
A streaming architecture for Cyber Security - Apache MetronA streaming architecture for Cyber Security - Apache Metron
A streaming architecture for Cyber Security - Apache MetronSimon Elliston Ball
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataDataWorks Summit
 
(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduceAmazon Web Services
 
Setting Up Sumo Logic - Sep 2017
Setting Up Sumo Logic -  Sep 2017Setting Up Sumo Logic -  Sep 2017
Setting Up Sumo Logic - Sep 2017mariosany
 
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017Amazon Web Services
 
Degrading Performance? You Might be Suffering From the Small Files Syndrome
Degrading Performance? You Might be Suffering From the Small Files SyndromeDegrading Performance? You Might be Suffering From the Small Files Syndrome
Degrading Performance? You Might be Suffering From the Small Files SyndromeDatabricks
 
August 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieAugust 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieYahoo Developer Network
 
Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleRethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleHelena Edelson
 
BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...
BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...
BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...Amazon Web Services
 

What's hot (20)

Mhug apache storm
Mhug apache stormMhug apache storm
Mhug apache storm
 
Apache solr performance and scalability effort update palo alto 2017%2 f7
Apache solr performance and scalability effort update palo alto 2017%2 f7Apache solr performance and scalability effort update palo alto 2017%2 f7
Apache solr performance and scalability effort update palo alto 2017%2 f7
 
(BDT210) Building Scalable Big Data Solutions: Intel & AOL
(BDT210) Building Scalable Big Data Solutions: Intel & AOL(BDT210) Building Scalable Big Data Solutions: Intel & AOL
(BDT210) Building Scalable Big Data Solutions: Intel & AOL
 
Opal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific ApplicationsOpal: Simple Web Services Wrappers for Scientific Applications
Opal: Simple Web Services Wrappers for Scientific Applications
 
Integrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query EnginesIntegrating Apache Phoenix with Distributed Query Engines
Integrating Apache Phoenix with Distributed Query Engines
 
Real Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With SparkReal Time Machine Learning Visualization With Spark
Real Time Machine Learning Visualization With Spark
 
myHadoop - Hadoop-on-Demand on Traditional HPC Resources
myHadoop - Hadoop-on-Demand on Traditional HPC ResourcesmyHadoop - Hadoop-on-Demand on Traditional HPC Resources
myHadoop - Hadoop-on-Demand on Traditional HPC Resources
 
Security From The Big Data and Analytics Perspective
Security From The Big Data and Analytics PerspectiveSecurity From The Big Data and Analytics Perspective
Security From The Big Data and Analytics Perspective
 
AWS Sydney Summit 2013 - Optimizing AWS Applications and Usage to Reduce Costs
AWS Sydney Summit 2013 - Optimizing AWS Applications and Usage to Reduce CostsAWS Sydney Summit 2013 - Optimizing AWS Applications and Usage to Reduce Costs
AWS Sydney Summit 2013 - Optimizing AWS Applications and Usage to Reduce Costs
 
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
[SmartNews] Globally Scalable Web Document Classification Using Word2Vec
 
A streaming architecture for Cyber Security - Apache Metron
A streaming architecture for Cyber Security - Apache MetronA streaming architecture for Cyber Security - Apache Metron
A streaming architecture for Cyber Security - Apache Metron
 
Storm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-DataStorm-on-YARN: Convergence of Low-Latency and Big-Data
Storm-on-YARN: Convergence of Low-Latency and Big-Data
 
(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce(BDT316) Offloading ETL to Amazon Elastic MapReduce
(BDT316) Offloading ETL to Amazon Elastic MapReduce
 
Setting Up Sumo Logic - Sep 2017
Setting Up Sumo Logic -  Sep 2017Setting Up Sumo Logic -  Sep 2017
Setting Up Sumo Logic - Sep 2017
 
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
 
PostgreSQL
PostgreSQL PostgreSQL
PostgreSQL
 
Degrading Performance? You Might be Suffering From the Small Files Syndrome
Degrading Performance? You Might be Suffering From the Small Files SyndromeDegrading Performance? You Might be Suffering From the Small Files Syndrome
Degrading Performance? You Might be Suffering From the Small Files Syndrome
 
August 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache OozieAugust 2016 HUG: Recent development in Apache Oozie
August 2016 HUG: Recent development in Apache Oozie
 
Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleRethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For Scale
 
BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...
BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...
BDA305 NEW LAUNCH! Intro to Amazon Redshift Spectrum: Now query exabytes of d...
 

Viewers also liked

Turner / THE BOX - TURNER NEXT
Turner / THE BOX - TURNER NEXTTurner / THE BOX - TURNER NEXT
Turner / THE BOX - TURNER NEXTWilliam Trevisan
 
Shravya_Uppalanchi_Experienced
 Shravya_Uppalanchi_Experienced Shravya_Uppalanchi_Experienced
Shravya_Uppalanchi_ExperiencedShravya Uppalanchi
 
Practica fisio.-exploración-de-los-sentidos
Practica fisio.-exploración-de-los-sentidosPractica fisio.-exploración-de-los-sentidos
Practica fisio.-exploración-de-los-sentidosYorse Zam Rodriguezz
 
스타트업 선배의 브랜드 마케팅 이야기
스타트업 선배의 브랜드 마케팅 이야기스타트업 선배의 브랜드 마케팅 이야기
스타트업 선배의 브랜드 마케팅 이야기(주)배달통
 
Utilizing Predictive Modeling to Meet Business SLAs
Utilizing Predictive Modeling to Meet Business SLAs Utilizing Predictive Modeling to Meet Business SLAs
Utilizing Predictive Modeling to Meet Business SLAs Pyramid Solutions, Inc.
 
Qu'est-ce que le Jugement dernier ?
Qu'est-ce que le Jugement dernier ?Qu'est-ce que le Jugement dernier ?
Qu'est-ce que le Jugement dernier ?Pierrot Caron
 
Chapitre 31 - II. Aller avec le Christ
Chapitre 31 - II. Aller avec le Christ Chapitre 31 - II. Aller avec le Christ
Chapitre 31 - II. Aller avec le Christ Pierrot Caron
 
Herramientas Digitales en Educacion e Investigacion Cientifica
Herramientas Digitales en Educacion e Investigacion CientificaHerramientas Digitales en Educacion e Investigacion Cientifica
Herramientas Digitales en Educacion e Investigacion CientificaJorge Francisco Vera Mosquera
 
Unidade 6. Estado e nación na Restauración borbónica
Unidade 6. Estado e nación na Restauración borbónicaUnidade 6. Estado e nación na Restauración borbónica
Unidade 6. Estado e nación na Restauración borbónicaAgrela Elvixeo
 
Medio de enseñanza
Medio de enseñanzaMedio de enseñanza
Medio de enseñanzaYrkatania
 

Viewers also liked (16)

Nila Mendoza tarea 3
Nila Mendoza tarea 3Nila Mendoza tarea 3
Nila Mendoza tarea 3
 
Turner / THE BOX - TURNER NEXT
Turner / THE BOX - TURNER NEXTTurner / THE BOX - TURNER NEXT
Turner / THE BOX - TURNER NEXT
 
Teoria acces
Teoria accesTeoria acces
Teoria acces
 
Elt 455
Elt 455Elt 455
Elt 455
 
Shravya_Uppalanchi_Experienced
 Shravya_Uppalanchi_Experienced Shravya_Uppalanchi_Experienced
Shravya_Uppalanchi_Experienced
 
Biografia y discografia
Biografia y discografiaBiografia y discografia
Biografia y discografia
 
Practica fisio.-exploración-de-los-sentidos
Practica fisio.-exploración-de-los-sentidosPractica fisio.-exploración-de-los-sentidos
Practica fisio.-exploración-de-los-sentidos
 
스타트업 선배의 브랜드 마케팅 이야기
스타트업 선배의 브랜드 마케팅 이야기스타트업 선배의 브랜드 마케팅 이야기
스타트업 선배의 브랜드 마케팅 이야기
 
Utilizing Predictive Modeling to Meet Business SLAs
Utilizing Predictive Modeling to Meet Business SLAs Utilizing Predictive Modeling to Meet Business SLAs
Utilizing Predictive Modeling to Meet Business SLAs
 
Qu'est-ce que le Jugement dernier ?
Qu'est-ce que le Jugement dernier ?Qu'est-ce que le Jugement dernier ?
Qu'est-ce que le Jugement dernier ?
 
Chapitre 31 - II. Aller avec le Christ
Chapitre 31 - II. Aller avec le Christ Chapitre 31 - II. Aller avec le Christ
Chapitre 31 - II. Aller avec le Christ
 
Tata group
Tata groupTata group
Tata group
 
Herramientas Digitales en Educacion e Investigacion Cientifica
Herramientas Digitales en Educacion e Investigacion CientificaHerramientas Digitales en Educacion e Investigacion Cientifica
Herramientas Digitales en Educacion e Investigacion Cientifica
 
Unidade 6. Estado e nación na Restauración borbónica
Unidade 6. Estado e nación na Restauración borbónicaUnidade 6. Estado e nación na Restauración borbónica
Unidade 6. Estado e nación na Restauración borbónica
 
17. economía
17. economía17. economía
17. economía
 
Medio de enseñanza
Medio de enseñanzaMedio de enseñanza
Medio de enseñanza
 

Similar to Apache Eagle at Hadoop Summit 2016 San Jose

Apache Eagle in Action
Apache Eagle in ActionApache Eagle in Action
Apache Eagle in ActionHao Chen
 
Apache Eagle Dublin Hadoop Summit 2016
Apache Eagle   Dublin Hadoop Summit 2016Apache Eagle   Dublin Hadoop Summit 2016
Apache Eagle Dublin Hadoop Summit 2016Edward Zhang
 
Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎
Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎
Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎Qingwen zhao
 
Apache Eagle Architecture Evolvement
Apache Eagle Architecture EvolvementApache Eagle Architecture Evolvement
Apache Eagle Architecture EvolvementHao Chen
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudScott Miao
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Cask Data
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in RealtimeDataWorks Summit
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshSion Smith
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...Jürgen Ambrosi
 
Log Data Analysis Platform by Valentin Kropov
Log Data Analysis Platform by Valentin KropovLog Data Analysis Platform by Valentin Kropov
Log Data Analysis Platform by Valentin KropovSoftServe
 
Log Data Analysis Platform
Log Data Analysis PlatformLog Data Analysis Platform
Log Data Analysis PlatformValentin Kropov
 
Regain Control Thanks To Prometheus
Regain Control Thanks To PrometheusRegain Control Thanks To Prometheus
Regain Control Thanks To PrometheusEtienne Coutaud
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopDataWorks Summit
 
Building a cloud based managed BigData platform for the enterprise
Building a cloud based managed BigData platform for the enterpriseBuilding a cloud based managed BigData platform for the enterprise
Building a cloud based managed BigData platform for the enterpriseHemanth Yamijala
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Sourceaspyker
 
Apache Eagle Strata Hadoop World London 2016
Apache Eagle Strata Hadoop World London 2016Apache Eagle Strata Hadoop World London 2016
Apache Eagle Strata Hadoop World London 2016Arun Karthick Manoharan
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Timothy Spann
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Jim Dowling
 
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...Amazon Web Services
 

Similar to Apache Eagle at Hadoop Summit 2016 San Jose (20)

Apache Eagle in Action
Apache Eagle in ActionApache Eagle in Action
Apache Eagle in Action
 
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real TimeApache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
 
Apache Eagle Dublin Hadoop Summit 2016
Apache Eagle   Dublin Hadoop Summit 2016Apache Eagle   Dublin Hadoop Summit 2016
Apache Eagle Dublin Hadoop Summit 2016
 
Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎
Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎
Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎
 
Apache Eagle Architecture Evolvement
Apache Eagle Architecture EvolvementApache Eagle Architecture Evolvement
Apache Eagle Architecture Evolvement
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Visualizing Big Data in Realtime
Visualizing Big Data in RealtimeVisualizing Big Data in Realtime
Visualizing Big Data in Realtime
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Log Data Analysis Platform by Valentin Kropov
Log Data Analysis Platform by Valentin KropovLog Data Analysis Platform by Valentin Kropov
Log Data Analysis Platform by Valentin Kropov
 
Log Data Analysis Platform
Log Data Analysis PlatformLog Data Analysis Platform
Log Data Analysis Platform
 
Regain Control Thanks To Prometheus
Regain Control Thanks To PrometheusRegain Control Thanks To Prometheus
Regain Control Thanks To Prometheus
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
 
Building a cloud based managed BigData platform for the enterprise
Building a cloud based managed BigData platform for the enterpriseBuilding a cloud based managed BigData platform for the enterprise
Building a cloud based managed BigData platform for the enterprise
 
Netflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open SourceNetflix Cloud Architecture and Open Source
Netflix Cloud Architecture and Open Source
 
Apache Eagle Strata Hadoop World London 2016
Apache Eagle Strata Hadoop World London 2016Apache Eagle Strata Hadoop World London 2016
Apache Eagle Strata Hadoop World London 2016
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
(BDT302) Big Data Beyond Hadoop: Running Mahout, Giraph, and R on Amazon EMR ...
 

Recently uploaded

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Recently uploaded (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

Apache Eagle at Hadoop Summit 2016 San Jose

  • 1. Apache Eagle: Secure Hadoop in Real Time http://eagle.incubator.apache.org/ Hao Chen (@haozch) | Ralph Su (@raphaelsu)
  • 2. About Us Apache Eagle Co-creator,PMC & Committer hao@apache.org Sr. Software Engineer at eBay hchen9@ebay.com Apache Eagle PMC & Committer ralphsu@apache.org MTS. Software Engineer at eBay liasu@ebay.com Ralph Su / 苏良飞 Hao Chen / 陈浩
  • 4. is a distributed real-time monitoring and alerting engine for hadoop from eBay Open sourced as Apache Incubator Project on Oct 26th 2015 Secure Hadoop in Realtime a data activity monitoring solution to instantly identify access to sensitive data, recognize attacks/ malicious activity and block access in real time. See http://eagle.incubator.apache.org or http://github.com/apache/incubator-eagle Apache Eagle
  • 5. Eagle  was  initialized  by  end  of  2013  for  hadoop ecosystem  monitoring  as  any  existing   tool  like  zabbix,  ganglia  can  not  handle  the  huge  volume  of  metrics/logs  generated  by   hadoop system  in  eBay. 2014/2015 10,000  nodes 150,000+  cores 200  PB 2000+  user 3000+  nodes 10,000+  cores 50+  PB 2012 2011 1000+  nodes 10,000+  cores 10+  PB 100+  nodes 1000  +  cores 1  PB 2010 2009 50+  nodes 2007 1-­‐10  nodes Hadoop  Data   • Security • Activity Hadoop  Platform   • Heath • Availability • Performance Hadoop  @  eBay  Inc Initiative  – Why  build  Eagle? 7+   CLUSTERS 10000+ NODES 200+  PB DATA 10  B+   EVENTS  /  DAY 500+   METRIC  TYPES 50,000+   JOBS  /  DAY 50,000,000+   TASKS  /  DAY
  • 6. Initiative  – Core  Challenges Large  Scale  Processing  and  Storage • Scale  IO  Complexity  (Stream) • Scale  Computation  Complexity  (Policy) • Scale  Storage  &  Query  (Event,  Log,  Metric) 1 Real-­‐Time  Alerting • Real-­‐time  Data  Collection • Real-­‐time  Stream  Processing • Real-­‐time  Alerting 2 Expressive Correlation  Model • Complex  policy  model • Stream  GroupBy,  Join,  Window • Machine  Learning 3 Hadoop  Ecosystem  Integration • Data  Source  Integration • Anomaly  Detection  Model 4
  • 7. 1 Eagle  Framework Distributed  real-­‐time  framework  for  efficiently  developing   highly   scalable  monitoring   applications Eagle  Ecosystem 2 Eagle  Apps Security/  Hadoop/  Operational  Intelligence  /  … 3 Eagle  Interface REST  Service  /  Management  UI  /  Customizable  Analytics   Visualization 4 Eagle  Integration Ambari /  Docker /  Ranger  /  Dataguise Apps à Security à Hadoop à Cloud à Database Interface à Web Portal à REST Services à Analytics Visualization Integration à Ambari à Docker à Ranger à Dataguise Eagle Framework Open  Source Community-­‐driven   and  Cross-­‐community  cooperation 5
  • 8. Eagle  @  eBay  Inc. 7+   CLUSTERS 10000+ NODES 200+  PB DATA 10  B+   EVENTS  /  DAY 500+   METRIC  TYPES 50,000+   JOBS  /  DAY 50,000,000+   TASKS  /  DAY Eagle  Deployment  at  eBay  Production • 100+  security  policies • 8  nodes • 30  worker  process • 64  kafka partition Eagle  Performance • Avg Latency:   ~  50  ms • Max  Throughput/Cluster:  300  k  /s
  • 10. Distributed  Policy  Engine METADATA   MANAGER AlertExecutor_{1} AlertExecutor_{2} … AlertExecutor_{N} Real  Time  Alerts Alerts Policy   Management Policy Dynamical   Policy   Deployment Real-­‐time   Event  Stream Stream_{1} Stream_{*} Dynamical   Stream  Schema Stream   Processing • Real-­‐time  Streaming: Apache  Storm  (Execution  Engine)  +  Kafka  (Message  Bus) • Declarative  Policy: SQL  (CEP)  on  Streaming  +  Hot  Deploy • Linear  Scalability:  Data  volume scale +  Computation  scale • Metadata-­‐Driven:  Schema  Management  and  Dynamical  Policy  Lifecycle
  • 11. METADATA   MANAGER AlertExecutor_{1} AlertExecutor_{2} … AlertExecutor_{N} Real  Time  Alerts Alerts Policy   Management Policy Dynamical   Policy   Deployment Real-­‐time   Event  Stream Stream_{1} Stream_{*} Dynamical   Stream  Schema Stream   Processing from MetricStream[(name == 'ReplLag') and (value > 1000)] select * insert into outputStream; • Real-­‐time  Streaming: Apache  Storm  (Execution  Engine)  +  Kafka  (Message  Bus) • Declarative  Policy: SQL  (CEP)  on  Streaming  +  Hot  Deploy • Linear  Scalability:  Data  volume scale +  Computation  scale • Metadata-­‐Driven:  Schema  Management  and  Dynamical  Policy  Lifecycle Distributed  Policy  Engine
  • 12. Declarative  Policy • Filter • Join • Aggregation:  Avg,  Sum  ,  Min,  Max,  etc • Group by • Having • Stream handlers  for  window:  TimeWindow,  Batch  Window,   Length  Window   • Conditions  and  Expressions:  and,  or,  not,  ==,!=,  >=,  >,  <=,  <,   and  arithmetic  operations • Pattern  Processing • Sequence  processing • Event  Tables:  intergrate historical  data  in  realtime processing • SQL-­‐Like  Query:  Query,  Stream  Definition   and  Query  Plan   compilation Distributed  SQL  on  Streaming  :   Siddhi  CEP  +  Storm  by  default from MetricStream[(name == 'ReplLag') and (value > 1000)] select * insert into outputStream;
  • 13. Declarative  Policy  -­‐ Examples from hadoopJmxMetricEventStream [metric == "hadoop.namenode.fsnamesystemstate.capacityused" and value > 0.9] select metric, host, value, timestamp, component, site insert into alertStream; Example  1:  Alert  if  hadoop namenode capacity  usage  exceed  90  percentages from every a = hadoopJmxMetricEventStream[metric=="hadoop.namenode.fsnamesystem.hastate"] -> b = hadoopJmxMetricEventStream[metric==a.metric and b.host == a.host and a.value != value)] within 10 min select a.host, a.value as oldHaState, b.value as newHaState, b.timestamp as timestamp, b.metric as metric, b.component as component, b.site as site insert into alertStream; Example  2:  Alert  if  hadoop namenode HA  switches
  • 14. 1 Distributed  Policy  Engine  -­‐ Scalability 2 Distributed   Streaming   Cluster  Environment AlertExecutor_{1} AlertExecutor_{2} … AlertExecutor_{N} Stream_{1} Stream_{*} Stream   Processing Dynamic  policy  partition  by  {event}  *  {policy} • N  Users  with  3  partitions,  M  policies  with  2  partitions,  then  3*2  physical  tasks • Physical  partition  +  policy-­‐level  partition Linear  Scalability Principle
  • 15. 3 Algorithm Weights  of  Executors   By  Partition   User Random 0.0484 0.152 0.3535 0.105 0.203 0.072 0.042 0.024 Greedy 0.0837 0.0837 0.0837 0.0837 0.0737 0.0637 0.0437 0.0837 Stream  Partition  Skew  (15:1) Distributed  Policy  Engine – Optimization Stream  Partition  Problem     https://en.wikipedia.org/wiki/Partition_problem
  • 16. Distributed  Real-­time    Policy  Engine Siddhi  CEP  Policy   Evaluator Machine  Learning   Policy  Evaluator• Support  WSO2  Siddhi  CEP  as  first  class • Extensible  Policy  Engine  Implementation • Extensible  Policy  Lifecycle  Management • Metadata-­‐based  Module  Management Extensible   Policy   Evaluator public interface PolicyEvaluatorServiceProvider { public String getPolicyType(); // literal string to identify one type of policy public Class getPolicyEvaluator(); // get policy evaluator implementation public List getBindingModules(); // policy text with json format to object mapping } METADATA  MANAGER Policy/Metadata Distributed  Policy  Engine – Extensibility Policy  Engine  Extensibility
  • 17. Stream  Processing  Framework Optimizer 1.  Development 2.  Optimization 3.  Compile  to  native  app Use  eagle  alert  framework  as  library
  • 18. • Light-­‐weight  ORM  Framework  for  HBase/RDMBS • Full-­‐function   SQL-­‐Like  REST  Query   • Optimized  Rowkey design  for  time-­‐series  data • Native  HBase Coprocessor • Secondary  Index  Support @Table("alertdef") @ColumnFamily("f") @Prefix("alertdef") @Service(AlertConstants.ALERT_DEFINITION_SERVICE_ENDPOINT_NAME) @JsonIgnoreProperties(ignoreUnknown = true) @TimeSeries(false) @Tags({"site", "dataSource", "alertExecutorId", "policyId", "policyType"}) @Indexes({ @Index(name="Index_1_alertExecutorId", columns = { "alertExecutorID" }, unique = true), }) public class AlertDefinitionAPIEntity extends TaggedLogAPIEntity{ @Column("a") private String desc; @Column("b") private String policyDef; @Column("c") private String dedupeDef; Query=AlertDefinitionService[@dataSource="hiveQueryLog"]{@policyDef} Large  Scale  Storage  and  Query
  • 19. Uniform  HBase rowkey design • Metric • Entity • Log Rowkey ::= Prefix | Partition Keys | timestamp | tagName | tagValue | … Rowkey ::= Metric Name | Partition Keys | timestamp | tagName | tagValue | … Rowkey ::= Default Prefix | Partition Keys | timestamp | tagName | tagValue | … Rowkey ::= Log Type | Partition Keys | timestamp | tagName | tagValue | … Rowvalue ::= Log Content Large  Scale  Storage  and  Query http://opentsdb.net
  • 20. Multi-­‐Tenants  – Topology  Scheduler • Dynamical  Topology  Management • No-­‐downtime  Topology  Maintenance • Topology  High  Availability  &  Balance • Resource  Scheduling  &  Isolation:  Runtime,Woker,  Topology  or  Cluster
  • 21. Multi-­‐Tenants  -­‐ Dynamical  Correlation • Dynamical  Correlation  on  Runtime: • Sort,  Groupby,  Join,  Window • Hot  Deploy  Logic • Policy  Management • Multi-­‐Correlation  on  the  Single  Stream • Group  by  different  fields  of  same  stream • Resort  same  stream  by  different  order • Join  certain  stream  in  different  way • Multi  Correlation  on  Multi  Streams • Cross  Streams  Join • Real-­‐time  &  Historical  Stream  Join
  • 23. Eagle  Use  Cases Data  Activity  Monitoring Secure  Hadoop  in  Realtime a  data  activity  monitoring  solution  to  instantly  identify  access  to  sensitive  data,    recognize   attacks/  malicious  activity  and  block  access  in  real  time. Job  Performance  Monitoring Hadoop,  Spark  Job  Profiling  &  Performance  Monitoring,  Cluster  Health  Anomaly  Detection 1 2 4 eBay  Unified  Monitoring  Platform Provide  unified  monitoring-­‐as-­‐service  for  everything  around  infrastructure  or  business. 3
  • 24. Eagle  Data  Activity  Monitoring Data  Loss  Prevention Get  alerted  and  stop  a  malicious  user  trying  to  copy,  delete,  move   sensitive  data  from  the  Hadoop  cluster. Malicious  Logins Detect  login  when  malicious  user  tries  to  guess  password.  Eagle   creates  user  profiles  using  machine  learning  algorithm  to  detect   anomalies Unauthorized  access Detect  and  stop  a  malicious  user  trying  to  access  classified  data   without  privilege.   Malicious  user  operation Detect  and  stop  a  malicious  user  trying  to  delete  large  amount  of   data.  Operation  type  is  one  parameter  of  Eagle  user  profiles.  Eagle   supports  multiple  native  operation  types. User Privileges Common   Data  Sets Patterns CommandsZones Query Columns
  • 25. § HDFS  Policies § Access  to  Sensitive  files § HDFS  Commands  used  (read,  write,  update…) § Client  host   § Destination § Security  Zones § Hive  Policies § Access  to  tables  with  PII  Data § SQL  Query  Profiles § Client  Host § Security  Zone Eagle  Data  Activity  Monitoring Hadoop  Data  Security:  Detect  anomalies  in  accessing  HDFS  and  Hive
  • 26. Offline:  Determine  bandwidth  from  training  dataset  the  kernel  density  function   parameters  (KDE) Online:  If  a  test  data  point  lies  outside  the  trained  bandwidth,   it  is  anomaly  (Policy) PCs(Principle  Components)  in  EVD (Eigenvalue  Value  Decomposition) Kernel  Density  Function Eagle  Machine  Learning  User  Profile
  • 27. Use  Case Detect  node  anomaly  by  analyzing  task  failure  ratio  across  all  nodes Assumption Task  failure  ratio  for  every  node  should  be  approximately  equal Algorithm Node  by  node  compare  (symmetry  violation)  and  per  node  trend Eagle  Job  Performance  Monitoring
  • 28. Alerting:   Anomaly  Detection  Alerting Insight:   Task  failure  drill-­‐down Insight:   Task  failure  drill-­‐down Task  Failure  based  Anomaly  Host  Detection  
  • 29. Counters  &  Features Use  Case Detect  data  skew  by  statistics  and  distributions  for  attempt  execution  durations  and   counters Assumption      Duration  and  counters  should  be  in  normal  distribution mapDuration reduceDuration mapInputRecords reduceInputRecords combineInputRecords mapSpilledRecords reduceShuffleRecords mapLocalFileBytesRead reduceLocalFileBytesRead mapHDFSBytesRead reduceHDFSBytesRead Modeling  &  Statistics Avg Min Max   Distributions Max  z-­score Top-­N Correlation Threshold  &  Detection Counters Correlation  >  0.9   &  Max(Z-­Score)  >  90%   Hadoop  Job  Data  Skew  Detection
  • 30. Counters  &  Features Use  Case Detect  data  skew  by  statistics  and  distributions  for  attempt  execution  durations  and   counters Assumption      Duration  and  counters  should  be  in  normal  distribution mapDuration reduceDuration mapInputRecords reduceInputRecords combineInputRecords mapSpilledRecords reduceShuffleRecords mapLocalFileBytesRead reduceLocalFileBytesRead mapHDFSBytesRead reduceHDFSBytesRead Modeling  &  Statistics Avg Min Max   Distributions Max  z-­score Top-­N Correlation Threshold  &  Detection Counters Correlation  >  0.9   &  Max(Z-­Score)  >  90%   Hadoop  Job  Data  Skew  Detection
  • 31. Learn  More  about  Apache  Eagle Community • Website:  http://eagle.incubator.apache.org • Github:  http://github.com/apache/incubator-­‐eagle • Mailing  list:  dev@eagle.incubator.apache.org Resources • Documentation:  http://eagle.incubator.apache.org/docs/ • Docker images:  https://hub.docker.com/r/apacheeagle/sandbox/ Publications  &  Patents • EAGLE:  USER  PROFILE-­‐BASED  ANOMALY  DETECTION  IN  HADOOP  CLUSTER  (IEEE) • EAGLE:  DISTRIBUTED  REALTIME  MONITORING  FRAMEWORK  FOR  HADOOP  CLUSTER
  • 32. Q  &  A http://eagle.incubator.apache.org The  slide  is  licensed  under  Creative  Commons  Attribution  4.0  International  license.