SlideShare a Scribd company logo
1 of 45
Real Time
Fraud Detection
Patterns and reference architectures
Ted Malaska // PSA Gwen Shapira // Software
Engineer
2
• Intro
• Review Problem
• Quick overview of key technology
• High level architecture
• Deep Dive into NRT Processing
• Completing the Puzzle – Micro-batch, Ingest and Batch
Overview
©2014 Cloudera, Inc. All rights reserved.
3©2014 Cloudera, Inc. All rights reserved.
• 15 years of moving data
• Formerly consultant
• Now Cloudera Engineer:
– Sqoop Committer
– Kafka
– Flume
• @gwenshap
Gwen Shapira
4
• Ted Malaska (PSA at Cloudera)
• Hadoop for ~5 years
• Contributed to
– HDFS, MapReduce, Yarn, HBase, Spark, Avro,
– Kite, Pig, Navigator, Cloudera Manager, Flume, Kafke, Sqoop, Accumulo
– And working on a Sentry Patch
• Co-Author to O’Reilly Hadoop Application Architectures
• Worked with about 70 companies in 8 countries
• Marvel Fan Boy
• Runner
Hello
©2014 Cloudera, Inc. All rights reserved.
5
The Problem
©2014 Cloudera, Inc. All rights reserved.
6
Credit Card Transaction Fraud
©2014 Cloudera, Inc. All rights reserved.
7
Ikea Meat Balls
©2014 Cloudera, Inc. All rights reserved.
8
Coupon Fraud
©2014 Cloudera, Inc. All rights reserved.
9
Video Game Strategy
©2014 Cloudera, Inc. All rights reserved.
10
Health Insurance Fraud
©2014 Cloudera, Inc. All rights reserved.
11
• Typical Atomic Card Fraud Detection
• Ikea Meat Ball
• Multi Coupons Combinations
• OP or Negative Video Games Strategies
• Ad Serving
• Health Insurance Fraud
• Kid Coming Home From School
Review of the Problem
©2014 Cloudera, Inc. All rights reserved.
12
How do we React
• Human Brain at Tennis
– Muscle Memory
– Reaction Thought
– Reflective Meditation
©2014 Cloudera, Inc. All rights reserved.
13
Overview of
Key Technologies
©2014 Cloudera, Inc. All rights reserved.
14
Kafka
©2014 Cloudera, Inc. All Rights Reserved.
15©2014 Cloudera, Inc. All rights reserved.
• Messages are organized into topics
• Producers push messages
• Consumers pull messages
• Kafka runs in a cluster. Nodes are called
brokers
The Basics
16©2014 Cloudera, Inc. All rights reserved.
Topics, Partitions and Logs
17©2014 Cloudera, Inc. All rights reserved.
Each partition is a log
18©2014 Cloudera, Inc. All rights reserved.
Each Broker has many partitions
Partition 0 Partition 0
Partition 1 Partition 1
Partition 2
Partition 1
Partition 0
Partition 2 Partion 2
19©2014 Cloudera, Inc. All rights reserved.
Producers load balance between partitions
Partition 0
Partition 1
Partition 2
Partition 1
Partition 0
Partition 2
Partition 0
Partition 1
Partion 2
Client
20©2014 Cloudera, Inc. All rights reserved.
Producers load balance between partitions
Partition 0
Partition 1
Partition 2
Partition 1
Partition 0
Partition 2
Partition 0
Partition 1
Partion 2
Client
21©2014 Cloudera, Inc. All rights reserved.
Consumers
Consumer Group Y
Consumer Group X
Consumer
Kafka Cluster
Topic
Partition A (File)
Partition B (File)
Partition C (File)
Consumer
Consumer
Consumer
Order retained with in
partition
Order retained with in
partition but not over
partitionsOffSetX
OffSetX
OffSetX
OffSetYOffSetYOffSetY
Off sets are kept per
consumer group
22
Flume
23
Sources Interceptors Selectors Channels Sinks
Flume Agent
Short Intro to Flume
Twitter, logs, JMS,
webserver, Kafka
Mask, re-format,
validate…
DR, critical
Memory, file,
Kafka
HDFS, HBase,
Solr
24
Flume and/or Kafka
©2014 Cloudera, Inc. All rights reserved.
Flume
UpStream
Flume Source
Interceptor
Flume Channel
Flume Sink
Down Stream
Selector
Can Be KafkaCan Be KafkaCan Be Kafka
25
Interceptors
• Mask fields
• Validate information
against external source
• Extract fields
• Modify data format
• Filter or split events
©2014 Cloudera, Inc. All rights reserved.
26
SparkStreaming
27
Spark Streaming Example
©2014 Cloudera, Inc. All rights reserved.
1. val conf = new SparkConf().setMaster("local[2]”)
2. val ssc = new StreamingContext(conf, Seconds(1))
3. val lines = ssc.socketTextStream("localhost", 9999)
4. val words = lines.flatMap(_.split(" "))
5. val pairs = words.map(word => (word, 1))
6. val wordCounts = pairs.reduceByKey(_ + _)
7. wordCounts.print()
8. SSC.start()
28
Spark Streaming Example
©2014 Cloudera, Inc. All rights reserved.
1. val conf = new SparkConf().setMaster("local[2]”)
2. val sc = new SparkContext(conf)
3. val lines = sc.textFile(path, 2)
4. val words = lines.flatMap(_.split(" "))
5. val pairs = words.map(word => (word, 1))
6. val wordCounts = pairs.reduceByKey(_ + _)
7. wordCounts.print()
29
DStream
DStream
DStream
Spark Streaming
Confidentiality Information Goes Here
Single Pass
Source Receiver RDD
Source Receiver RDD
RDD
Filter Count Print
Source Receiver RDD
RDD
RDD
Single Pass
Filter Count Print
Pre-first
Batch
First
Batch
Second
Batch
30
DStream
DStream
DStreamSpark Streaming
Confidentiality Information Goes Here
Single Pass
Source Receiver RDD
Source Receiver RDD
RDD
Filter Count
Print
Source Receiver RDD
RDD
RDD
Single Pass
Filter Count
Pre-first
Batch
First
Batch
Second
Batch
Stateful RDD 1
Print
Stateful RDD 2
Stateful RDD 1
31
Spark Streaming and HBase
©2014 Cloudera, Inc. All rights reserved.
Driver
Walker Node
Configs
Executor
Static Space
Configs
HConnection
Tasks Tasks
Walker Node
Executor
Static Space
Configs
HConnection
Tasks Tasks
32
High Level
Architecture
©2014 Cloudera, Inc. All rights reserved.
33
Real-Time Event Processing Approach
©2014 Cloudera, Inc. All rights reserved.
Hadoop Cluster II
Storage Processing
SolR
Hadoop Cluster I
ClientClient
Flume Agents
Hbase /
Memory
Spark
Streaming
HDFS
Hive/Im
pala
Map/Re
duce
Spark
Search
Automated &
Manual
Analytical
Adjustments
and Pattern
detection
Fetching &
Updating Profiles
Adjusting NRT Stats
HDFSEventSink
SolR Sink
Batch Time Adjustments
Automated &
Manual
Review of
NRT Changes
and Counters
Local Cache
Kafka
Clients:
(Swipe
here!)
Web App
34
NRT Processing
©2014 Cloudera, Inc. All rights reserved.
35
Focus on NRT First
©2014 Cloudera, Inc. All rights reserved.
Hadoop Cluster II
Storage Processing
SolR
Hadoop Cluster I
ClientClient
Flume Agents
Hbase /
Memory
Spark
Streaming
HDFS
Hive/Im
pala
Map/Re
duce
Spark
Search
Automated &
Manual
Analytical
Adjustments
and Pattern
detection
Fetching &
Updating Profiles
Adjusting NRT Stats
HDFSEventSink
SolR Sink
Batch Time Adjustments
Automated &
Manual
Review of
NRT Changes
and Counters
Local Cache
Kafka
Clients:
(Swipe
here!)
Web App
NRT Event Processing with Context
36
Streaming Architecture – NRT Event Processing
©2014 Cloudera, Inc. All rights reserved.
Flume Source
Flume Source
Kafka
Initial Events Topic
Flume Source
Flume Interceptor
Event Processing Logic
Local
Memory
HBase
Client
Kafka
Answer Topic
HBase
KafkaConsumer
KafkaProducer
Able to respond with
in 10s of
milliseconds
37
Partitioned NRT Event Processing
©2014 Cloudera, Inc. All rights reserved.
Flume Source
Flume Source
Kafka
Initial Events Topic
Flume Source
Flume Interceptor
Event Processing Logic
Local
Memory
HBase
Client
Kafka
Answer Topic
HBase
KafkaConsumer
KafkaProducer
Topic
Partition A
Partition B
Partition C
Producer
Partitione
r
Producer
Partitione
r
Producer
Partitione
r
Custom Partitioner
Better use of local
memory
38
Completing the
Puzzle
©2014 Cloudera, Inc. All rights reserved.
39
Micro Batching
©2014 Cloudera, Inc. All rights reserved.
Hadoop Cluster II
Storage Processing
SolR
Hadoop Cluster I
ClientClient
Flume Agents
Hbase /
Memory
Spark
Streaming
HDFS
Hive/Im
pala
Map/Re
duce
Spark
Search
Automated &
Manual
Analytical
Adjustments
and Pattern
detection
Fetching &
Updating Profiles
Adjusting NRT Stats
HDFSEventSink
SolR Sink
Batch Time Adjustments
Automated &
Manual
Review of
NRT Changes
and Counters
Local Cache
Kafka
Clients:
(Swipe
here!)
Web App
Micro Batching
Micro Batching
Micro Batching
40
Complex Topologies
©2014 Cloudera, Inc. All rights reserved.
Kafka
Initial Events Topic
Spark Streaming
KafkaDirect
Connection
Dag Topologies
Kafka
Initial Events Topic
Spark Streaming
Kafka Receivers Dag Topologies
Kafka Receivers
Kafka Receivers
• Manages Offset
• Stores Offset is RDD
• No longer needs HDFS for initial RDD check
pointing
• Lets Kafka Manage Offsets
• Uses HDFS for initial RDD recovery
1.3
1.2
41
MicroBatch Bad-Input Handling
©2014 Cloudera, Inc. All rights reserved.
0 1 2 3 4 5 6 7 8 9
1
0
1
1
1
2
1
3
Kafka – incoming events topic
Dag Topologies
0 1 2 3 4 5 6 7 8 9
1
0
1
1
1
2
1
3
Kafka – bad events topic
0 1 2 3 4 5 6 7 8 9
1
0
1
1
1
2
1
3
Kafka – resolved events topic
0 1 2 3 4 5 6 7 8 9
1
0
1
1
1
2
1
3
Kafka – results topic
42
Ingestion
©2014 Cloudera, Inc. All rights reserved.
Hadoop Cluster II
Storage Processing
SolR
Hadoop Cluster I
ClientClient
Flume Agents
Hbase /
Memory
Spark
Streaming
HDFS
Hive/Im
pala
Map/Re
duce
Spark
Search
Automated &
Manual
Analytical
Adjustments
and Pattern
detection
Fetching &
Updating Profiles
Adjusting NRT Stats
HDFSEventSink
SolR Sink
Batch Time Adjustments
Automated &
Manual
Review of
NRT Changes
and Counters
Local Cache
Kafka
Clients:
(Swipe
here!)
Web App
Ingestion
Ingestion
43
Ingestion
©2014 Cloudera, Inc. All rights reserved.
Flume HDFS Sink
Kafka Cluster
Topic
Partition A
Partition B
Partition C
Sink
Sink
Sink
HDFS
Flume SolR Sink
Sink
Sink
Sink
SolR
Flume Hbase Sink
Sink
Sink
Sink
HBase
44
Reflective Thoughts
©2014 Cloudera, Inc. All rights reserved.
Hadoop Cluster II
Storage Processing
SolR
Hadoop Cluster I
ClientClient
Flume Agents
Hbase /
Memory
Spark
Streaming
HDFS
Hive/Im
pala
Map/Re
duce
Spark
Search
Automated &
Manual
Analytical
Adjustments
and Pattern
detection
Fetching &
Updating Profiles
Adjusting NRT Stats
HDFSEventSink
SolR Sink
Batch Time Adjustments
Automated &
Manual
Review of
NRT Changes
and Counters
Local Cache
Kafka
Clients:
(Swipe
here!)
Web App
Research and Searching
©2014 Cloudera, Inc. All rights reserved.

More Related Content

What's hot

Data Loss Prevention
Data Loss PreventionData Loss Prevention
Data Loss PreventionReza Kopaee
 
Network traffic analysis with cyber security
Network traffic analysis with cyber securityNetwork traffic analysis with cyber security
Network traffic analysis with cyber securityKAMALI PRIYA P
 
Knowledge for the masses: Storytelling with ATT&CK
Knowledge for the masses: Storytelling with ATT&CKKnowledge for the masses: Storytelling with ATT&CK
Knowledge for the masses: Storytelling with ATT&CKMITRE ATT&CK
 
When Insiders ATT&CK!
When Insiders ATT&CK!When Insiders ATT&CK!
When Insiders ATT&CK!MITRE ATT&CK
 
DDoS Attack Preparation and Mitigation
DDoS Attack Preparation and MitigationDDoS Attack Preparation and Mitigation
DDoS Attack Preparation and MitigationJerod Brennen
 
Career in Ethical Hacking
Career in Ethical Hacking Career in Ethical Hacking
Career in Ethical Hacking neosphere
 
Zero Trust Model
Zero Trust ModelZero Trust Model
Zero Trust ModelYash
 
Cyber Table Top Exercise -- Model Roadmap
Cyber Table Top Exercise -- Model RoadmapCyber Table Top Exercise -- Model Roadmap
Cyber Table Top Exercise -- Model RoadmapDavid Sweigert
 
Malware Classification and Analysis
Malware Classification and AnalysisMalware Classification and Analysis
Malware Classification and AnalysisPrashant Chopra
 
The ATT&CK Latin American APT Playbook
The ATT&CK Latin American APT PlaybookThe ATT&CK Latin American APT Playbook
The ATT&CK Latin American APT PlaybookMITRE ATT&CK
 
Apply MLOps at Scale
Apply MLOps at ScaleApply MLOps at Scale
Apply MLOps at ScaleDatabricks
 
AIOps, IT Analytics, and Business Performance: What’s Needed and What Works
AIOps, IT Analytics, and Business Performance: What’s Needed and What Works AIOps, IT Analytics, and Business Performance: What’s Needed and What Works
AIOps, IT Analytics, and Business Performance: What’s Needed and What Works Enterprise Management Associates
 
Tracking Noisy Behavior and Risk-Based Alerting with ATT&CK
Tracking Noisy Behavior and Risk-Based Alerting with ATT&CKTracking Noisy Behavior and Risk-Based Alerting with ATT&CK
Tracking Noisy Behavior and Risk-Based Alerting with ATT&CKMITRE ATT&CK
 
Malware detection-using-machine-learning
Malware detection-using-machine-learningMalware detection-using-machine-learning
Malware detection-using-machine-learningSecurity Bootcamp
 
MITRE ATT&CKcon 2018: Playing Devil’s Advocate to Security Initiatives with A...
MITRE ATT&CKcon 2018: Playing Devil’s Advocate to Security Initiatives with A...MITRE ATT&CKcon 2018: Playing Devil’s Advocate to Security Initiatives with A...
MITRE ATT&CKcon 2018: Playing Devil’s Advocate to Security Initiatives with A...MITRE - ATT&CKcon
 
Leveraging MITRE ATT&CK - Speaking the Common Language
Leveraging MITRE ATT&CK - Speaking the Common LanguageLeveraging MITRE ATT&CK - Speaking the Common Language
Leveraging MITRE ATT&CK - Speaking the Common LanguageErik Van Buggenhout
 
Cyber security and demonstration of security tools
Cyber security and demonstration of security toolsCyber security and demonstration of security tools
Cyber security and demonstration of security toolsVicky Fernandes
 

What's hot (20)

Data Loss Prevention
Data Loss PreventionData Loss Prevention
Data Loss Prevention
 
Network traffic analysis with cyber security
Network traffic analysis with cyber securityNetwork traffic analysis with cyber security
Network traffic analysis with cyber security
 
Knowledge for the masses: Storytelling with ATT&CK
Knowledge for the masses: Storytelling with ATT&CKKnowledge for the masses: Storytelling with ATT&CK
Knowledge for the masses: Storytelling with ATT&CK
 
Malware Detection using Machine Learning
Malware Detection using Machine Learning	Malware Detection using Machine Learning
Malware Detection using Machine Learning
 
When Insiders ATT&CK!
When Insiders ATT&CK!When Insiders ATT&CK!
When Insiders ATT&CK!
 
DDoS Attack Preparation and Mitigation
DDoS Attack Preparation and MitigationDDoS Attack Preparation and Mitigation
DDoS Attack Preparation and Mitigation
 
Threat Modeling Using STRIDE
Threat Modeling Using STRIDEThreat Modeling Using STRIDE
Threat Modeling Using STRIDE
 
Career in Ethical Hacking
Career in Ethical Hacking Career in Ethical Hacking
Career in Ethical Hacking
 
Zero Trust Model
Zero Trust ModelZero Trust Model
Zero Trust Model
 
Cyber Table Top Exercise -- Model Roadmap
Cyber Table Top Exercise -- Model RoadmapCyber Table Top Exercise -- Model Roadmap
Cyber Table Top Exercise -- Model Roadmap
 
Malware Classification and Analysis
Malware Classification and AnalysisMalware Classification and Analysis
Malware Classification and Analysis
 
The ATT&CK Latin American APT Playbook
The ATT&CK Latin American APT PlaybookThe ATT&CK Latin American APT Playbook
The ATT&CK Latin American APT Playbook
 
Apply MLOps at Scale
Apply MLOps at ScaleApply MLOps at Scale
Apply MLOps at Scale
 
AIOps, IT Analytics, and Business Performance: What’s Needed and What Works
AIOps, IT Analytics, and Business Performance: What’s Needed and What Works AIOps, IT Analytics, and Business Performance: What’s Needed and What Works
AIOps, IT Analytics, and Business Performance: What’s Needed and What Works
 
Tracking Noisy Behavior and Risk-Based Alerting with ATT&CK
Tracking Noisy Behavior and Risk-Based Alerting with ATT&CKTracking Noisy Behavior and Risk-Based Alerting with ATT&CK
Tracking Noisy Behavior and Risk-Based Alerting with ATT&CK
 
Malware detection-using-machine-learning
Malware detection-using-machine-learningMalware detection-using-machine-learning
Malware detection-using-machine-learning
 
MITRE ATT&CKcon 2018: Playing Devil’s Advocate to Security Initiatives with A...
MITRE ATT&CKcon 2018: Playing Devil’s Advocate to Security Initiatives with A...MITRE ATT&CKcon 2018: Playing Devil’s Advocate to Security Initiatives with A...
MITRE ATT&CKcon 2018: Playing Devil’s Advocate to Security Initiatives with A...
 
Leveraging MITRE ATT&CK - Speaking the Common Language
Leveraging MITRE ATT&CK - Speaking the Common LanguageLeveraging MITRE ATT&CK - Speaking the Common Language
Leveraging MITRE ATT&CK - Speaking the Common Language
 
Cloud security
Cloud securityCloud security
Cloud security
 
Cyber security and demonstration of security tools
Cyber security and demonstration of security toolsCyber security and demonstration of security tools
Cyber security and demonstration of security tools
 

Similar to Fraud Detection Architecture

Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings MeetupGwen (Chen) Shapira
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Data Con LA
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingJack Gudenkauf
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwordsSzehon Ho
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaDataWorks Summit
 
Avoiding Common Pitfalls: Spark Structured Streaming with Kafka
Avoiding Common Pitfalls: Spark Structured Streaming with KafkaAvoiding Common Pitfalls: Spark Structured Streaming with Kafka
Avoiding Common Pitfalls: Spark Structured Streaming with KafkaHostedbyConfluent
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Cloudera, Inc.
 
PCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System TuningPCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System TuningDr. Mirko Kämpf
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupGwen (Chen) Shapira
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaGrant Henke
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023Timothy Spann
 
ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016Jayesh Thakrar
 

Similar to Fraud Detection Architecture (20)

Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings Meetup
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream Processing
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwords
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 
Avoiding Common Pitfalls: Spark Structured Streaming with Kafka
Avoiding Common Pitfalls: Spark Structured Streaming with KafkaAvoiding Common Pitfalls: Spark Structured Streaming with Kafka
Avoiding Common Pitfalls: Spark Structured Streaming with Kafka
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015Real Time Data Processing using Spark Streaming | Data Day Texas 2015
Real Time Data Processing using Spark Streaming | Data Day Texas 2015
 
PCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System TuningPCAP Graphs for Cybersecurity and System Tuning
PCAP Graphs for Cybersecurity and System Tuning
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
 
Empower Hive with Spark
Empower Hive with SparkEmpower Hive with Spark
Empower Hive with Spark
 
Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
 
ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016ApacheCon-Flume-Kafka-2016
ApacheCon-Flume-Kafka-2016
 
Kafka talk
Kafka talkKafka talk
Kafka talk
 

More from Gwen (Chen) Shapira

Velocity 2019 - Kafka Operations Deep Dive
Velocity 2019  - Kafka Operations Deep DiveVelocity 2019  - Kafka Operations Deep Dive
Velocity 2019 - Kafka Operations Deep DiveGwen (Chen) Shapira
 
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote Gwen (Chen) Shapira
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGwen (Chen) Shapira
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Gwen (Chen) Shapira
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebookGwen (Chen) Shapira
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Gwen (Chen) Shapira
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupGwen (Chen) Shapira
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Gwen (Chen) Shapira
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereGwen (Chen) Shapira
 
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersNyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersGwen (Chen) Shapira
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingGwen (Chen) Shapira
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupGwen (Chen) Shapira
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureGwen (Chen) Shapira
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupGwen (Chen) Shapira
 

More from Gwen (Chen) Shapira (20)

Velocity 2019 - Kafka Operations Deep Dive
Velocity 2019  - Kafka Operations Deep DiveVelocity 2019  - Kafka Operations Deep Dive
Velocity 2019 - Kafka Operations Deep Dive
 
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service mesh
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebook
 
Kafka reliability velocity 17
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data Meetup
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersNyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it too
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
R for hadoopers
R for hadoopersR for hadoopers
R for hadoopers
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding Failure
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
 
Incredible Impala
Incredible Impala Incredible Impala
Incredible Impala
 

Recently uploaded

20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 

Recently uploaded (20)

20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 

Fraud Detection Architecture

  • 1. Real Time Fraud Detection Patterns and reference architectures Ted Malaska // PSA Gwen Shapira // Software Engineer
  • 2. 2 • Intro • Review Problem • Quick overview of key technology • High level architecture • Deep Dive into NRT Processing • Completing the Puzzle – Micro-batch, Ingest and Batch Overview ©2014 Cloudera, Inc. All rights reserved.
  • 3. 3©2014 Cloudera, Inc. All rights reserved. • 15 years of moving data • Formerly consultant • Now Cloudera Engineer: – Sqoop Committer – Kafka – Flume • @gwenshap Gwen Shapira
  • 4. 4 • Ted Malaska (PSA at Cloudera) • Hadoop for ~5 years • Contributed to – HDFS, MapReduce, Yarn, HBase, Spark, Avro, – Kite, Pig, Navigator, Cloudera Manager, Flume, Kafke, Sqoop, Accumulo – And working on a Sentry Patch • Co-Author to O’Reilly Hadoop Application Architectures • Worked with about 70 companies in 8 countries • Marvel Fan Boy • Runner Hello ©2014 Cloudera, Inc. All rights reserved.
  • 5. 5 The Problem ©2014 Cloudera, Inc. All rights reserved.
  • 6. 6 Credit Card Transaction Fraud ©2014 Cloudera, Inc. All rights reserved.
  • 7. 7 Ikea Meat Balls ©2014 Cloudera, Inc. All rights reserved.
  • 8. 8 Coupon Fraud ©2014 Cloudera, Inc. All rights reserved.
  • 9. 9 Video Game Strategy ©2014 Cloudera, Inc. All rights reserved.
  • 10. 10 Health Insurance Fraud ©2014 Cloudera, Inc. All rights reserved.
  • 11. 11 • Typical Atomic Card Fraud Detection • Ikea Meat Ball • Multi Coupons Combinations • OP or Negative Video Games Strategies • Ad Serving • Health Insurance Fraud • Kid Coming Home From School Review of the Problem ©2014 Cloudera, Inc. All rights reserved.
  • 12. 12 How do we React • Human Brain at Tennis – Muscle Memory – Reaction Thought – Reflective Meditation ©2014 Cloudera, Inc. All rights reserved.
  • 13. 13 Overview of Key Technologies ©2014 Cloudera, Inc. All rights reserved.
  • 14. 14 Kafka ©2014 Cloudera, Inc. All Rights Reserved.
  • 15. 15©2014 Cloudera, Inc. All rights reserved. • Messages are organized into topics • Producers push messages • Consumers pull messages • Kafka runs in a cluster. Nodes are called brokers The Basics
  • 16. 16©2014 Cloudera, Inc. All rights reserved. Topics, Partitions and Logs
  • 17. 17©2014 Cloudera, Inc. All rights reserved. Each partition is a log
  • 18. 18©2014 Cloudera, Inc. All rights reserved. Each Broker has many partitions Partition 0 Partition 0 Partition 1 Partition 1 Partition 2 Partition 1 Partition 0 Partition 2 Partion 2
  • 19. 19©2014 Cloudera, Inc. All rights reserved. Producers load balance between partitions Partition 0 Partition 1 Partition 2 Partition 1 Partition 0 Partition 2 Partition 0 Partition 1 Partion 2 Client
  • 20. 20©2014 Cloudera, Inc. All rights reserved. Producers load balance between partitions Partition 0 Partition 1 Partition 2 Partition 1 Partition 0 Partition 2 Partition 0 Partition 1 Partion 2 Client
  • 21. 21©2014 Cloudera, Inc. All rights reserved. Consumers Consumer Group Y Consumer Group X Consumer Kafka Cluster Topic Partition A (File) Partition B (File) Partition C (File) Consumer Consumer Consumer Order retained with in partition Order retained with in partition but not over partitionsOffSetX OffSetX OffSetX OffSetYOffSetYOffSetY Off sets are kept per consumer group
  • 23. 23 Sources Interceptors Selectors Channels Sinks Flume Agent Short Intro to Flume Twitter, logs, JMS, webserver, Kafka Mask, re-format, validate… DR, critical Memory, file, Kafka HDFS, HBase, Solr
  • 24. 24 Flume and/or Kafka ©2014 Cloudera, Inc. All rights reserved. Flume UpStream Flume Source Interceptor Flume Channel Flume Sink Down Stream Selector Can Be KafkaCan Be KafkaCan Be Kafka
  • 25. 25 Interceptors • Mask fields • Validate information against external source • Extract fields • Modify data format • Filter or split events ©2014 Cloudera, Inc. All rights reserved.
  • 27. 27 Spark Streaming Example ©2014 Cloudera, Inc. All rights reserved. 1. val conf = new SparkConf().setMaster("local[2]”) 2. val ssc = new StreamingContext(conf, Seconds(1)) 3. val lines = ssc.socketTextStream("localhost", 9999) 4. val words = lines.flatMap(_.split(" ")) 5. val pairs = words.map(word => (word, 1)) 6. val wordCounts = pairs.reduceByKey(_ + _) 7. wordCounts.print() 8. SSC.start()
  • 28. 28 Spark Streaming Example ©2014 Cloudera, Inc. All rights reserved. 1. val conf = new SparkConf().setMaster("local[2]”) 2. val sc = new SparkContext(conf) 3. val lines = sc.textFile(path, 2) 4. val words = lines.flatMap(_.split(" ")) 5. val pairs = words.map(word => (word, 1)) 6. val wordCounts = pairs.reduceByKey(_ + _) 7. wordCounts.print()
  • 29. 29 DStream DStream DStream Spark Streaming Confidentiality Information Goes Here Single Pass Source Receiver RDD Source Receiver RDD RDD Filter Count Print Source Receiver RDD RDD RDD Single Pass Filter Count Print Pre-first Batch First Batch Second Batch
  • 30. 30 DStream DStream DStreamSpark Streaming Confidentiality Information Goes Here Single Pass Source Receiver RDD Source Receiver RDD RDD Filter Count Print Source Receiver RDD RDD RDD Single Pass Filter Count Pre-first Batch First Batch Second Batch Stateful RDD 1 Print Stateful RDD 2 Stateful RDD 1
  • 31. 31 Spark Streaming and HBase ©2014 Cloudera, Inc. All rights reserved. Driver Walker Node Configs Executor Static Space Configs HConnection Tasks Tasks Walker Node Executor Static Space Configs HConnection Tasks Tasks
  • 32. 32 High Level Architecture ©2014 Cloudera, Inc. All rights reserved.
  • 33. 33 Real-Time Event Processing Approach ©2014 Cloudera, Inc. All rights reserved. Hadoop Cluster II Storage Processing SolR Hadoop Cluster I ClientClient Flume Agents Hbase / Memory Spark Streaming HDFS Hive/Im pala Map/Re duce Spark Search Automated & Manual Analytical Adjustments and Pattern detection Fetching & Updating Profiles Adjusting NRT Stats HDFSEventSink SolR Sink Batch Time Adjustments Automated & Manual Review of NRT Changes and Counters Local Cache Kafka Clients: (Swipe here!) Web App
  • 34. 34 NRT Processing ©2014 Cloudera, Inc. All rights reserved.
  • 35. 35 Focus on NRT First ©2014 Cloudera, Inc. All rights reserved. Hadoop Cluster II Storage Processing SolR Hadoop Cluster I ClientClient Flume Agents Hbase / Memory Spark Streaming HDFS Hive/Im pala Map/Re duce Spark Search Automated & Manual Analytical Adjustments and Pattern detection Fetching & Updating Profiles Adjusting NRT Stats HDFSEventSink SolR Sink Batch Time Adjustments Automated & Manual Review of NRT Changes and Counters Local Cache Kafka Clients: (Swipe here!) Web App NRT Event Processing with Context
  • 36. 36 Streaming Architecture – NRT Event Processing ©2014 Cloudera, Inc. All rights reserved. Flume Source Flume Source Kafka Initial Events Topic Flume Source Flume Interceptor Event Processing Logic Local Memory HBase Client Kafka Answer Topic HBase KafkaConsumer KafkaProducer Able to respond with in 10s of milliseconds
  • 37. 37 Partitioned NRT Event Processing ©2014 Cloudera, Inc. All rights reserved. Flume Source Flume Source Kafka Initial Events Topic Flume Source Flume Interceptor Event Processing Logic Local Memory HBase Client Kafka Answer Topic HBase KafkaConsumer KafkaProducer Topic Partition A Partition B Partition C Producer Partitione r Producer Partitione r Producer Partitione r Custom Partitioner Better use of local memory
  • 38. 38 Completing the Puzzle ©2014 Cloudera, Inc. All rights reserved.
  • 39. 39 Micro Batching ©2014 Cloudera, Inc. All rights reserved. Hadoop Cluster II Storage Processing SolR Hadoop Cluster I ClientClient Flume Agents Hbase / Memory Spark Streaming HDFS Hive/Im pala Map/Re duce Spark Search Automated & Manual Analytical Adjustments and Pattern detection Fetching & Updating Profiles Adjusting NRT Stats HDFSEventSink SolR Sink Batch Time Adjustments Automated & Manual Review of NRT Changes and Counters Local Cache Kafka Clients: (Swipe here!) Web App Micro Batching Micro Batching Micro Batching
  • 40. 40 Complex Topologies ©2014 Cloudera, Inc. All rights reserved. Kafka Initial Events Topic Spark Streaming KafkaDirect Connection Dag Topologies Kafka Initial Events Topic Spark Streaming Kafka Receivers Dag Topologies Kafka Receivers Kafka Receivers • Manages Offset • Stores Offset is RDD • No longer needs HDFS for initial RDD check pointing • Lets Kafka Manage Offsets • Uses HDFS for initial RDD recovery 1.3 1.2
  • 41. 41 MicroBatch Bad-Input Handling ©2014 Cloudera, Inc. All rights reserved. 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 Kafka – incoming events topic Dag Topologies 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 Kafka – bad events topic 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 Kafka – resolved events topic 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 Kafka – results topic
  • 42. 42 Ingestion ©2014 Cloudera, Inc. All rights reserved. Hadoop Cluster II Storage Processing SolR Hadoop Cluster I ClientClient Flume Agents Hbase / Memory Spark Streaming HDFS Hive/Im pala Map/Re duce Spark Search Automated & Manual Analytical Adjustments and Pattern detection Fetching & Updating Profiles Adjusting NRT Stats HDFSEventSink SolR Sink Batch Time Adjustments Automated & Manual Review of NRT Changes and Counters Local Cache Kafka Clients: (Swipe here!) Web App Ingestion Ingestion
  • 43. 43 Ingestion ©2014 Cloudera, Inc. All rights reserved. Flume HDFS Sink Kafka Cluster Topic Partition A Partition B Partition C Sink Sink Sink HDFS Flume SolR Sink Sink Sink Sink SolR Flume Hbase Sink Sink Sink Sink HBase
  • 44. 44 Reflective Thoughts ©2014 Cloudera, Inc. All rights reserved. Hadoop Cluster II Storage Processing SolR Hadoop Cluster I ClientClient Flume Agents Hbase / Memory Spark Streaming HDFS Hive/Im pala Map/Re duce Spark Search Automated & Manual Analytical Adjustments and Pattern detection Fetching & Updating Profiles Adjusting NRT Stats HDFSEventSink SolR Sink Batch Time Adjustments Automated & Manual Review of NRT Changes and Counters Local Cache Kafka Clients: (Swipe here!) Web App Research and Searching
  • 45. ©2014 Cloudera, Inc. All rights reserved.

Editor's Notes

  1. This gives me a lot of perspective regarding the use of Hadoop
  2. Topics are partitioned, each partition ordered and immutable. Messages in a partition have an ID, called Offset. Offset uniquely identifies a message within a partition
  3. Kafka retains all messages for fixed amount of time. Not waiting for acks from consumers. The only metadata retained per consumer is the position in the log – the offset So adding many consumers is cheap On the other hand, consumers have more responsibility and are more challenging to implement correctly And “batching” consumers is not a problem
  4. 3 partitions, each replicated 3 times.
  5. The choose how many replicas must ACK a message before its considered committed. This is the tradeoff between speed and reliability
  6. The choose how many replicas must ACK a message before its considered committed. This is the tradeoff between speed and reliability
  7. can read from one or more partition leader. You can’t have two consumers in same group reading the same partition. Leaders obviously do more work – but they are balanced between nodes We reviewed the basic components on the system, and it may seem complex. In the next section we’ll see how simple it actually is to get started with Kafka.
  8. Does not require programming.