SlideShare a Scribd company logo
1 of 64
EVENT-DRIVEN
MESSAGING AND ACTIONS
USING
APACHE FLINK AND APACHE NIFI
Dave Torok
Distinguished Architect
Comcast Corporation
23 May, 2019
DataWorks Summit – Washington, DC – 2019
3
COMCAST CUSTOMER RELATIONSHIPS
30.7 MILLION OVERALL CUSTOMER
RELATIONSHIPS AS OF Q1 2019
INCLUDING:
27.6 MILLION HIGH-SPEED INTERNET
21.9 MILLION VIDEO
11.4 MILLION VOICE
ONE MILLION CUSTOMER NET
ADDITIONS IN 2018
4
DELIVER THE ULTIMATE CUSTOMER EXPERIENCE
IS THE CUSTOMER HAVING A GOOD EXPERIENCE
WITH OUR PRODUCTS AND SERVICE?
IF THE CUSTOMER ENGAGES US DIGITALLY, CAN
WE OFFER A SELF-SERVICE EXPERIENCE?
GUIDE THE CUSTOMER THROUGH A JOURNEY WITH
DIGITAL COMMUNICATIONS
KEEP THE CUSTOMER INFORMED WITH THE RIGHT
MESSAGE TO THE RIGHT PERSON AT THE RIGHT
TIME
REDUCE TIME AND COST TO THE BUSINESS AND
THE CUSTOMER
How do we personalize
the conversation?
Comcast collects, stores, and uses all data in accordance with our privacy disclosures to users and applicable laws.
6
EXAMPLE ONE-TIME MESSAGE
7
EXAMPLE – NEW SERVICE INSTALL
8
EXAMPLE - APPOINTMENT REMINDERS
9
FOLLOW UP SATISFACTION AND SURVEY
1 0
EXAMPLE WITH SMS RESPONSES
FOLLOWING UP ON THE
INTERACTION:
Is the problem resolved?
If so, great!
If not, offer to talk with an
agent.
APACHE
Apache®, Apache NiFi®, and the NiFi logo are either registered
trademarks or trademarks of the Apache Software Foundation in the
United States and/or other countries.
1 2
WHAT IS APACHE NIFI?
ENTERPRISE DATA FLOW…. GET STUFF FROM SOMEWHERE TO SOMEWHERE ELSE
Source Systems
FTP
HTTP
SFTP
Kafka
RabbitMQ
JDBC
Kinesis
S3
….
Destination
Systems
FTP
HTTP
SFTP
Kafka
RabbitMQ
JDBC
Kinesis
S3
….
Do Stuff!
Transform
Validate
Enrich
Protocol Conversion
….
350+ Processors,
Controllers, and
Reporting Tasks
1 3
EXAMPLE NIFI FLOW
1 4
WHAT IS NIFI GOOD FOR?
ASYNCHRONOUS AND STATELESS STREAM PROCESSING
PROTOCOL CONVERSION
FORMAT CONVERSION AND TRANSFORMATION
PUSH AND PULL SCENARIOS E.G. FTP
LOTS OF DIFFERENT SOURCE AND SINK TYPES
MILD CONTENT ENRICHMENT
SERVICE CALLS / REST CALLS
JDBC / CACHE LOOKUP
RAPIDLY CHANGING BUSINESS LOGIC***
RAPID PROTOTYPING***
CONFIGURE RATHER THAN CODE ***
EXTENSIBILITY (SCRIPTING PROCESSORS, CUSTOM (JAVA) PROCESSORS)
1 5
OUR TEAM’S HISTORY WITH NIFI
FIRST PRODUCTION WORKFLOW MAY 2016
RECENT SNAPSHOT:
• 65+ USE CASES
• 900+ PROCESS GROUPS
• 7400+ PROCESSORS
• 44000+ THREADS
• 12 NODE PRIMARY PRODUCTION CLUSTER (16VCPU/32GB)
1 6
NIFI –TOP LEVEL
1 7
TOP PROCESSORS IN OUR NIFI CLUSTER
PROCESSING
1114 UpdateAttribute
923 RouteOnAttribute
732 JSON-related (incl. 240 JOLTTransformJson)
729 ReplaceText
527 ExecuteScript (many for HTTP Retry Logic)
516 LogAttribute
162 ControlRate
98 AVRO-related
87 ExtractText
COMMUNICATION
207 InvokeHTTP
128 PutSql / ExecuteSql
39 ConsumeKafka
10 PublishKafka/PutKafka
41 GetKinesisStream
6 PutKinesisStream
2 PutSFTP
2 Consume AMQP
APACHE FLINK
Apache®, Apache Flink®, and the squirrel logo are either registered
trademarks or trademarks of the Apache Software Foundation in the
United States and/or other countries.
1 9
WHAT IS APACHE FLINK?
REAL-TIME STREAM PROCESSING FRAMEWORK
DISTRIBUTED PARALLEL COMPUTE ENGINE
SIMILAR API STYLE TO APACHE SPARK
LOW LATENCY, HIGH PERFORMANCE
STATEFUL
SOURCE Reduce
Filter
Join
SOURCE
Map
Sum SINK
2 0
FLINK STREAMING API STYLES
TABLE / SQL API
SQL PROVIDED BY APACHE CALCITE
SELECTS, JOINS, GROUP-BY, AGGREGATIONS
WINDOWS
TIME AND COUNT
WINDOW-BASED JOINS
WINDOW-BASED AGGREGATIONS
TEMPORAL TABLES
UDF (USER-DEFINED FUNCTIONS)
DATASTREAM API
MAP / REDUCE / FOLD
FILTER
AGGREGATIONS (SUM, MIN, MAX)
WINDOWS
TIME AND COUNT
TUMBLING, SLIDING
STREAM UNION, JOIN, CO-MAP
ITERATIONS
NOTE: THERE IS ALSO A BATCH API
2 1
EXAMPLE “WORD COUNT” CODE
DataStream<WordWithCount> windowCounts = textInputStream
.flatMap(new FlatMapFunction<String, WordWithCount>() {
public void flatMap(String value, Collector<WordWithCount> out) {
for (String word : value.split("s")) {
out.collect(new WordWithCount(word, 1L));
}}
})
.keyBy("word")
.timeWindow(Time.seconds(5))
.reduce(new ReduceFunction<WordWithCount>() {
public WordWithCount reduce(WordWithCount a, WordWithCount b) {
return new WordWithCount(a.word, a.count + b.count);
}
});
2 2
WHAT IS FLINK GOOD FOR?
HIGH THROUGHPUT STREAM PROCESSING
“MAP / REDUCE” STYLE PARALLEL COMPUTING
STATEFUL PROCESSING
AGGREGATIONS AND TIME WINDOWS
MULTIPLE-STREAM OPERATIONS
SQL-ON-STREAM
HOWEVER…
LIMITED “ORCHESTRATION”
LIMITED SOURCE / SINK TYPES
2 3
FLINK CONNECTORS
ALSO VIA APACHE BAHIR:
APACHE ACTIVEMQ (SOURCE/SINK)
APACHE FLUME (SINK)
REDIS (SINK)
AKKA (SINK)
NETTY (SOURCE)
FLINK PROJECT:
APACHE KAFKA (SOURCE/SINK)
AMAZON KINESIS STREAMS
(SOURCE/SINK)
RABBITMQ (SOURCE/SINK)
APACHE NIFI (SOURCE/SINK)
APACHE CASSANDRA (SINK)
ELASTICSEARCH (SINK)
HADOOP FILESYSTEM – HDFS (SINK)
TWITTER STREAMING API (SOURCE)
2 4
OUR TEAM’S HISTORY WITH FLINK
USED FOR 4+ DIFFERENT KINDS OF USE CASES
FIRST DEV – NOV 2016
FIRST PRODUCTION – MAY 2018
CUSTOMER EXPERIENCE USE CASE:
• 7 BILLION DATA POINTS PER DAY
PRODUCTION SIZE FOR ABOVE:
• 14 FLINK APPLICATION CLUSTERS
• 150 VMS
• 1100 VCPU
• 5.8 TB RAM
2 5
NIFI / FLINK MAJOR DIFFERENCES
NiFi Flink
Distributed-capable Distributed by nature
Lineage, queues, buffering Straight-through processing
100’s of processor types Stream-oriented operators
Limited state processing Natively stateful if desired
UI-driven visual development Code / compiled / deployed
2 6
“CONFIGURE NOT CODE”
Scratch Website - http://scratch.mit.edu/
MESSAGING
USE CASE
2 8
START SIMPLE (EVENT, CONDITION, ACTION)
Trigger
Event
Producers
Notification
Services
Action
2 9
START SIMPLE (EVENT, CONDITION, ACTION)
Trigger
Event
Producers
Notification
Services
Action
NEED MORE
INFORMATION
3 0
STATELESS USE CASE
Trigger Enrich Filter
Enterprise
Services
(REST)
Event
Producers
Notification
Services
Action
3 1
EXAMPLE: VIDEO ON DEMAND
EVENT:
RECEIVE “VIDEO ON DEMAND” MESSAGE
TRIGGER:
IF (PRICE > 5) AND (TYPE = ‘RENTAL’)
ENRICH:
PREFERRED COMMUNICATION (EMAIL OR SMS)
ACTION:
SEND CONFIRMATION EMAIL OR SMS
3 2
NIFI VERSION
Consume
Events
Extract Attributes
Call Customer
Pref Service
Set SMS
Parameters
Set Email
Parameters
Logging
Metrics
Send to
Communication
Handlers
TRIGGERS
3 4
SQL ON STREAM – APACHE CALCITE
FLINK APPROACH - SQL
// SQL query with an inlined (unregistered) table
Table table = tableEnv.fromDataStream(ds, "user, product, amount");
Table result = tableEnv.sqlQuery(
"SELECT SUM(amount) FROM " + table + " WHERE product LIKE '%Rubber%'");
NIFI APPROACH – TRADITIONAL
• EVALUATEJSONPATH / EXTRACTTEXT
• NIFI EXPRESSION LANGUAGE + ROUTEONATTRIBUTE
NIFI APPROACH - CALCITE
• QUERYRECORD PROCESSOR
• RECORDREADER / RECORDWRITER PATTERN
ENRICHMENT AND
ACTIONS
3 6
ACTIONS
Send SMS
Action
Request
Action
Handler
Send Email
Other
Notification
Methods
Communication
Preferences
3 7
ENRICHMENT DATA PLANE
Streaming Compute Pipeline
AWS
S3
HDFS
Data File Abstraction
Databases
MODEL
Streaming
State
Sum
Avg
Time
Buckets
Stream
Data
QUERY
Enterprise Services
Data Sets at Rest
3 8
CALLING SERVICES - NIFI
INVOKEHTTP PROCESSOR
NIFI GOOD FOR
• REQUEST PREPARATION
• RESULT TRANSFORMATION
• HTTP ATTRIBUTE HANDLING
• FAILURE AND RETRY LOGIC
3 9
NIFI - RETRY LOGIC
4 0
FLINK METHOD FOR CALLING SERVICES
ASYNC I/O OPERATOR
WORKS WITH ASYNC-CAPABLE POOLS
• HTTP
• JDBC
CODE-YOUR-OWN
NO BUILT-IN RETRY CAPABILITY
TIMEOUTS CAN LEAD TO FLOW FAILURE
4 1
FLINK CONNECTED STREAM PATTERN
REST Service
Connected Stream
Operator
5 Minute Global Window
Enrichment Handler
STATEFUL FLOWS
4 3
WHAT IS “STATE”?
STATE
1
STATE
2
STATE
3
ACTION ON
ENTRY
ACTION ON
EXIT
TRANSITION
CONDITION
TRANSITION
CONDITION
STATE
TIMEOUT
4 4
EXAMPLE STATEFUL JOURNEY
ORDER
PLACED
IN
TRANSIT
OUT
FOR
DELIVERY
“YOUR ORDER
IS ON ITS WAY”
“SORRY WE
MISSED YOU”
SHIPPED
PLACED ON
LOCAL
TRUCK
11PM
EXPIRE
4 5
NIFI STATE
PROCESSOR STATE (LOCAL AND CLUSTERED)
BACKED BY ZOOKEEPER
PROCESSORS:
UPDATEATTRIBUTE (LOCAL ONLY)
ATTRIBUTEROLLINGWINDOW
“DISTRIBUTED” MAP CACHE
IN-MEMORY OR REDIS-BACKED (NEW IN 1.8)
NODE-LOCAL OR “SINGLE NODE” CENTRAL CACHE
PROCESSORS:
PUTDISTRIBUTEMAPCACHE, GETDISTRIBUTEDMAPCACHE
BEFORE NIFI 1.8: NO EASY PARTITIONING / SHARDING
1.8 AND LATER: NODE BALANCED CONNECTIONS
PARTITION BY ATTRIBUTE
CACHE != STATE
(but you can store
state in a cache)
4 6
USING EXTERNAL STATE WITH NIFI
USE EXTERNAL DATABASE (E.G. MYSQL)
PERIODIC QUERY TO FIND EXPIRED TIMERS
BEWARE OF RACE CONDITIONS / FREQUENT UPDATES
4 7
NIFI – SQL BASED STATE (STATE UPDATE)
4 8
NIFI – SQL BASED STATE (TIMER EXPIRATION)
4 9
FLINK APPROACH TO STATE
KEYED (NODE LOCAL) STATE
WINDOWED OPERATIONS (E.G. 10 MINUTE WINDOW SLIDING BY 1 MINUTE)
EVERY OPERATOR CAN HAS ITS OWN STATE
QUERYABLE STATE
ROCKSDB (IN-MEMORY + DISK STORAGE)
CHECKPOINTS AND SAVEPOINTS TO DURABLE FILESYSTEM (HDFS, S3)
5 0
NETWORK
IN
DISTRIBUTED FLINK STATE
KAFKABROKERS
PARTITION 1
PARTITION 2
PARTITION 3
PARTITION 4
PARTITION 5
PARTITION 6
FlinkKafkaConsumer
FlinkKafkaConsumer
FlinkKafkaConsumer
FlinkKafkaConsumer
FlinkKafkaConsumer
FlinkKafkaConsumer
NODE 1
NODE 2
NODE 3
KeyedStreamOperator
KeyedStreamOperator
KeyedStreamOperator
KeyedStreamOperator
KeyedStreamOperator
KeyedStreamOperator
P1
P2
P3
P4
NETWORK
OUT
keyBy()
SHUFFLE/SORT
STATE
STATE
STATE
STATE
STATE
STATE
Local
STATE
5 1
WORKING WITH FLINK STATE
private transient MapState<String, String> myState;
public void open(Configuration config) {
MapStateDescriptor<String, String> descriptor =
new MapStateDescriptor<String, String>(
“myStateName", // the state name
String.class, String.class); // K/V types
//get the mapstate for the key
myState = getRuntimeContext().getMapState(descriptor);
}
public String map(String myField) {
String myValue = myState.get(myField);
myState.put(myField, myValue + “ another one”);
}
DECLARE VARIABLE
DESCRIPTOR
WITH TYPE
INFORMATION
INITIALIZE
STATE
READ/WRITE STATE
INTEGRATING
NIFI AND FLINK
5 3
OPTION 1: BUILT-IN FLINK-NIFI CONNECTOR
USES NIFI “SITE TO SITE” PROTOCOL
ENABLES PASSING “FLOWFILE ATTRIBUTE” AND “FLOWFILE CONTENT” INTACT
public interface NiFiDataPacket {
byte[] getContent();
Map<String, String> getAttributes();
}
FLINK BACKPRESSURE CONCERNS
HTTPS://GITHUB.COM/APACHE/FLINK/TREE/MASTER/FLINK-CONNECTORS/FLINK-CONNECTOR-NIFI
5 4
OPTION 2: KAFKA TOPIC
LOOSER COUPLING
ALLOWS MANY “ACTION HANDLERS” (NOT JUST NIFI)
MORE BUFFERING / REDUCE BACKPRESSURE RISK
JSON AS STANDARD PAYLOAD
NIFI + FLINK
SOLUTION APPROACH
5 6
SOLUTION APPROACH
FLINK AS THE HIGH VOLUME EVENT PROCESSOR
• MANY USE CASES WITH ONE STREAM
• SQL ON STREAM
FLINK-BASED TRIGGER, FILTER, ENRICHMENT REQUEST, AND
ACTION REQUEST
FLINK MANAGES CUSTOMER JOURNEY STATE
NIFI FOR:
NAMED “PROFILES” FOR ENRICHMENT SERVICES
NAMED “PROFILES” FOR NOTIFICATIONS AND ACTIONS
Configuration-based
use cases in Flink
Library of handlers
in NiFi
5 7
HIGH LEVEL SOLUTION
Trigger
Enrichment
Orchestration
Filter
Enterprise
Services
(REST)
Event
Producers
Notification
Services
Action
Orchestration
Enrich Action
EVENT AND MESSAGE ORCHESTRATION
5 8
USE CASE CONFIGURATION (SIMPLIFIED)
{ "source": {
"type": “kafka",
“name": "vod_event_stream“
},
"triggerSql": "data.price > 5 AND data.order_type = 'Rental'“
“enrichment":
{“profileName": “communicationprefs"},
"actions":[
{“profileName": "email",
"templateId":"1234",
“fieldMapping":[ {“field” : “cost”, “source” : “data.price”}] },
{“profileName": “sms",
"templateId":“5678",
“fieldMapping":[ {“field” : “cost”, “source” : “data.price”}] } }
5 9
HIGH LEVEL SOLUTION (WITH STATE)
Trigger
Enrichment
Orchestration
Filter
Enterprise
Services
(REST)
Event
Producers
Notification
Services
Action
Orchestration
Enrich Action
EVENT AND MESSAGE ORCHESTRATION
FLINK
LOCAL
STATE
Journey
State
Management
FLINK
STATE
6 0
NIFI + FLINK SOLUTION SUMMARY
NIFI FOR SERVICES, DATAFLOW, AND TEXT HANDLING
FLINK FOR HIGH-PERFORMANCE STREAM PROCESSING
FLINK FOR COMMON PATTERNS – CONFIG DRIVEN
FLINK FOR STATE MANAGEMENT
DECOUPLED LIBRARY OF ENRICHMENT HANDLERS AND ACTION HANDLERS
6 1
FUTURE WORK
FLINK + NIFI
SELF-SERVICE
USE CASE PORTAL
INCREASE
CATALOG OF
ACTIONS AND
ENRICHMENT
PROFILES
MOVE MORE
COMMON
CAPABILITIES
TO FLINK
6 2
WE’RE
HIRING!
PHILADELPHIA
WASHINGTON, D.C.
SUNNYVALE
DENVER
THANK YOU!
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi

More Related Content

What's hot

Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
Kai Wähner
 

What's hot (20)

Kafka Retry and DLQ
Kafka Retry and DLQKafka Retry and DLQ
Kafka Retry and DLQ
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Hardening Kafka Replication
Hardening Kafka Replication Hardening Kafka Replication
Hardening Kafka Replication
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Flink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaFlink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at Alibaba
 
Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink...
Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink...Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink...
Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink...
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELK
 
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustShipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafka
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived
 

Similar to Event-Driven Messaging and Actions using Apache Flink and Apache NiFi

Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Data Con LA
 
Introducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Society
confluent
 

Similar to Event-Driven Messaging and Actions using Apache Flink and Apache NiFi (20)

How Netflix Directs 1/3rd of Internet Traffic
How Netflix Directs 1/3rd of Internet TrafficHow Netflix Directs 1/3rd of Internet Traffic
How Netflix Directs 1/3rd of Internet Traffic
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
 
Delivering High Performance Websites with NGINX
Delivering High Performance Websites with NGINXDelivering High Performance Websites with NGINX
Delivering High Performance Websites with NGINX
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
Scaling Push Messaging for Millions of Devices @Netflix
Scaling Push Messaging for Millions of Devices @NetflixScaling Push Messaging for Millions of Devices @Netflix
Scaling Push Messaging for Millions of Devices @Netflix
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Service Discovery and Registration in a Microservices Architecture
Service Discovery and Registration in a Microservices ArchitectureService Discovery and Registration in a Microservices Architecture
Service Discovery and Registration in a Microservices Architecture
 
Introducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Society
 
RESTful APIs and SBCs
RESTful APIs and SBCsRESTful APIs and SBCs
RESTful APIs and SBCs
 
Spring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - BostonSpring and Pivotal Application Service - SpringOne Tour - Boston
Spring and Pivotal Application Service - SpringOne Tour - Boston
 
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
 
Using Databases and Containers From Development to Deployment
Using Databases and Containers  From Development to DeploymentUsing Databases and Containers  From Development to Deployment
Using Databases and Containers From Development to Deployment
 
SD Times - Docker v2
SD Times - Docker v2SD Times - Docker v2
SD Times - Docker v2
 
Scale Your Load Balancer from 0 to 1 million TPS on Azure
Scale Your Load Balancer from 0 to 1 million TPS on AzureScale Your Load Balancer from 0 to 1 million TPS on Azure
Scale Your Load Balancer from 0 to 1 million TPS on Azure
 
01282016 Aerospike-Docker webinar
01282016 Aerospike-Docker webinar01282016 Aerospike-Docker webinar
01282016 Aerospike-Docker webinar
 
RESTful APIs and SBCs
RESTful APIs and SBCsRESTful APIs and SBCs
RESTful APIs and SBCs
 
Flink SQL in Action
Flink SQL in ActionFlink SQL in Action
Flink SQL in Action
 
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIs
 
The role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial InformaticsThe role of NoSQL in the Next Generation of Financial Informatics
The role of NoSQL in the Next Generation of Financial Informatics
 
Splunk Conf2010: Corporate Express presents Splunk with SAP
Splunk Conf2010: Corporate Express presents Splunk with SAPSplunk Conf2010: Corporate Express presents Splunk with SAP
Splunk Conf2010: Corporate Express presents Splunk with SAP
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Event-Driven Messaging and Actions using Apache Flink and Apache NiFi

  • 1. EVENT-DRIVEN MESSAGING AND ACTIONS USING APACHE FLINK AND APACHE NIFI Dave Torok Distinguished Architect Comcast Corporation 23 May, 2019 DataWorks Summit – Washington, DC – 2019
  • 2.
  • 3. 3 COMCAST CUSTOMER RELATIONSHIPS 30.7 MILLION OVERALL CUSTOMER RELATIONSHIPS AS OF Q1 2019 INCLUDING: 27.6 MILLION HIGH-SPEED INTERNET 21.9 MILLION VIDEO 11.4 MILLION VOICE ONE MILLION CUSTOMER NET ADDITIONS IN 2018
  • 4. 4 DELIVER THE ULTIMATE CUSTOMER EXPERIENCE IS THE CUSTOMER HAVING A GOOD EXPERIENCE WITH OUR PRODUCTS AND SERVICE? IF THE CUSTOMER ENGAGES US DIGITALLY, CAN WE OFFER A SELF-SERVICE EXPERIENCE? GUIDE THE CUSTOMER THROUGH A JOURNEY WITH DIGITAL COMMUNICATIONS KEEP THE CUSTOMER INFORMED WITH THE RIGHT MESSAGE TO THE RIGHT PERSON AT THE RIGHT TIME REDUCE TIME AND COST TO THE BUSINESS AND THE CUSTOMER
  • 5. How do we personalize the conversation? Comcast collects, stores, and uses all data in accordance with our privacy disclosures to users and applicable laws.
  • 7. 7 EXAMPLE – NEW SERVICE INSTALL
  • 10. 1 0 EXAMPLE WITH SMS RESPONSES FOLLOWING UP ON THE INTERACTION: Is the problem resolved? If so, great! If not, offer to talk with an agent.
  • 11. APACHE Apache®, Apache NiFi®, and the NiFi logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
  • 12. 1 2 WHAT IS APACHE NIFI? ENTERPRISE DATA FLOW…. GET STUFF FROM SOMEWHERE TO SOMEWHERE ELSE Source Systems FTP HTTP SFTP Kafka RabbitMQ JDBC Kinesis S3 …. Destination Systems FTP HTTP SFTP Kafka RabbitMQ JDBC Kinesis S3 …. Do Stuff! Transform Validate Enrich Protocol Conversion …. 350+ Processors, Controllers, and Reporting Tasks
  • 14. 1 4 WHAT IS NIFI GOOD FOR? ASYNCHRONOUS AND STATELESS STREAM PROCESSING PROTOCOL CONVERSION FORMAT CONVERSION AND TRANSFORMATION PUSH AND PULL SCENARIOS E.G. FTP LOTS OF DIFFERENT SOURCE AND SINK TYPES MILD CONTENT ENRICHMENT SERVICE CALLS / REST CALLS JDBC / CACHE LOOKUP RAPIDLY CHANGING BUSINESS LOGIC*** RAPID PROTOTYPING*** CONFIGURE RATHER THAN CODE *** EXTENSIBILITY (SCRIPTING PROCESSORS, CUSTOM (JAVA) PROCESSORS)
  • 15. 1 5 OUR TEAM’S HISTORY WITH NIFI FIRST PRODUCTION WORKFLOW MAY 2016 RECENT SNAPSHOT: • 65+ USE CASES • 900+ PROCESS GROUPS • 7400+ PROCESSORS • 44000+ THREADS • 12 NODE PRIMARY PRODUCTION CLUSTER (16VCPU/32GB)
  • 17. 1 7 TOP PROCESSORS IN OUR NIFI CLUSTER PROCESSING 1114 UpdateAttribute 923 RouteOnAttribute 732 JSON-related (incl. 240 JOLTTransformJson) 729 ReplaceText 527 ExecuteScript (many for HTTP Retry Logic) 516 LogAttribute 162 ControlRate 98 AVRO-related 87 ExtractText COMMUNICATION 207 InvokeHTTP 128 PutSql / ExecuteSql 39 ConsumeKafka 10 PublishKafka/PutKafka 41 GetKinesisStream 6 PutKinesisStream 2 PutSFTP 2 Consume AMQP
  • 18. APACHE FLINK Apache®, Apache Flink®, and the squirrel logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
  • 19. 1 9 WHAT IS APACHE FLINK? REAL-TIME STREAM PROCESSING FRAMEWORK DISTRIBUTED PARALLEL COMPUTE ENGINE SIMILAR API STYLE TO APACHE SPARK LOW LATENCY, HIGH PERFORMANCE STATEFUL SOURCE Reduce Filter Join SOURCE Map Sum SINK
  • 20. 2 0 FLINK STREAMING API STYLES TABLE / SQL API SQL PROVIDED BY APACHE CALCITE SELECTS, JOINS, GROUP-BY, AGGREGATIONS WINDOWS TIME AND COUNT WINDOW-BASED JOINS WINDOW-BASED AGGREGATIONS TEMPORAL TABLES UDF (USER-DEFINED FUNCTIONS) DATASTREAM API MAP / REDUCE / FOLD FILTER AGGREGATIONS (SUM, MIN, MAX) WINDOWS TIME AND COUNT TUMBLING, SLIDING STREAM UNION, JOIN, CO-MAP ITERATIONS NOTE: THERE IS ALSO A BATCH API
  • 21. 2 1 EXAMPLE “WORD COUNT” CODE DataStream<WordWithCount> windowCounts = textInputStream .flatMap(new FlatMapFunction<String, WordWithCount>() { public void flatMap(String value, Collector<WordWithCount> out) { for (String word : value.split("s")) { out.collect(new WordWithCount(word, 1L)); }} }) .keyBy("word") .timeWindow(Time.seconds(5)) .reduce(new ReduceFunction<WordWithCount>() { public WordWithCount reduce(WordWithCount a, WordWithCount b) { return new WordWithCount(a.word, a.count + b.count); } });
  • 22. 2 2 WHAT IS FLINK GOOD FOR? HIGH THROUGHPUT STREAM PROCESSING “MAP / REDUCE” STYLE PARALLEL COMPUTING STATEFUL PROCESSING AGGREGATIONS AND TIME WINDOWS MULTIPLE-STREAM OPERATIONS SQL-ON-STREAM HOWEVER… LIMITED “ORCHESTRATION” LIMITED SOURCE / SINK TYPES
  • 23. 2 3 FLINK CONNECTORS ALSO VIA APACHE BAHIR: APACHE ACTIVEMQ (SOURCE/SINK) APACHE FLUME (SINK) REDIS (SINK) AKKA (SINK) NETTY (SOURCE) FLINK PROJECT: APACHE KAFKA (SOURCE/SINK) AMAZON KINESIS STREAMS (SOURCE/SINK) RABBITMQ (SOURCE/SINK) APACHE NIFI (SOURCE/SINK) APACHE CASSANDRA (SINK) ELASTICSEARCH (SINK) HADOOP FILESYSTEM – HDFS (SINK) TWITTER STREAMING API (SOURCE)
  • 24. 2 4 OUR TEAM’S HISTORY WITH FLINK USED FOR 4+ DIFFERENT KINDS OF USE CASES FIRST DEV – NOV 2016 FIRST PRODUCTION – MAY 2018 CUSTOMER EXPERIENCE USE CASE: • 7 BILLION DATA POINTS PER DAY PRODUCTION SIZE FOR ABOVE: • 14 FLINK APPLICATION CLUSTERS • 150 VMS • 1100 VCPU • 5.8 TB RAM
  • 25. 2 5 NIFI / FLINK MAJOR DIFFERENCES NiFi Flink Distributed-capable Distributed by nature Lineage, queues, buffering Straight-through processing 100’s of processor types Stream-oriented operators Limited state processing Natively stateful if desired UI-driven visual development Code / compiled / deployed
  • 26. 2 6 “CONFIGURE NOT CODE” Scratch Website - http://scratch.mit.edu/
  • 28. 2 8 START SIMPLE (EVENT, CONDITION, ACTION) Trigger Event Producers Notification Services Action
  • 29. 2 9 START SIMPLE (EVENT, CONDITION, ACTION) Trigger Event Producers Notification Services Action NEED MORE INFORMATION
  • 30. 3 0 STATELESS USE CASE Trigger Enrich Filter Enterprise Services (REST) Event Producers Notification Services Action
  • 31. 3 1 EXAMPLE: VIDEO ON DEMAND EVENT: RECEIVE “VIDEO ON DEMAND” MESSAGE TRIGGER: IF (PRICE > 5) AND (TYPE = ‘RENTAL’) ENRICH: PREFERRED COMMUNICATION (EMAIL OR SMS) ACTION: SEND CONFIRMATION EMAIL OR SMS
  • 32. 3 2 NIFI VERSION Consume Events Extract Attributes Call Customer Pref Service Set SMS Parameters Set Email Parameters Logging Metrics Send to Communication Handlers
  • 34. 3 4 SQL ON STREAM – APACHE CALCITE FLINK APPROACH - SQL // SQL query with an inlined (unregistered) table Table table = tableEnv.fromDataStream(ds, "user, product, amount"); Table result = tableEnv.sqlQuery( "SELECT SUM(amount) FROM " + table + " WHERE product LIKE '%Rubber%'"); NIFI APPROACH – TRADITIONAL • EVALUATEJSONPATH / EXTRACTTEXT • NIFI EXPRESSION LANGUAGE + ROUTEONATTRIBUTE NIFI APPROACH - CALCITE • QUERYRECORD PROCESSOR • RECORDREADER / RECORDWRITER PATTERN
  • 36. 3 6 ACTIONS Send SMS Action Request Action Handler Send Email Other Notification Methods Communication Preferences
  • 37. 3 7 ENRICHMENT DATA PLANE Streaming Compute Pipeline AWS S3 HDFS Data File Abstraction Databases MODEL Streaming State Sum Avg Time Buckets Stream Data QUERY Enterprise Services Data Sets at Rest
  • 38. 3 8 CALLING SERVICES - NIFI INVOKEHTTP PROCESSOR NIFI GOOD FOR • REQUEST PREPARATION • RESULT TRANSFORMATION • HTTP ATTRIBUTE HANDLING • FAILURE AND RETRY LOGIC
  • 39. 3 9 NIFI - RETRY LOGIC
  • 40. 4 0 FLINK METHOD FOR CALLING SERVICES ASYNC I/O OPERATOR WORKS WITH ASYNC-CAPABLE POOLS • HTTP • JDBC CODE-YOUR-OWN NO BUILT-IN RETRY CAPABILITY TIMEOUTS CAN LEAD TO FLOW FAILURE
  • 41. 4 1 FLINK CONNECTED STREAM PATTERN REST Service Connected Stream Operator 5 Minute Global Window Enrichment Handler
  • 43. 4 3 WHAT IS “STATE”? STATE 1 STATE 2 STATE 3 ACTION ON ENTRY ACTION ON EXIT TRANSITION CONDITION TRANSITION CONDITION STATE TIMEOUT
  • 44. 4 4 EXAMPLE STATEFUL JOURNEY ORDER PLACED IN TRANSIT OUT FOR DELIVERY “YOUR ORDER IS ON ITS WAY” “SORRY WE MISSED YOU” SHIPPED PLACED ON LOCAL TRUCK 11PM EXPIRE
  • 45. 4 5 NIFI STATE PROCESSOR STATE (LOCAL AND CLUSTERED) BACKED BY ZOOKEEPER PROCESSORS: UPDATEATTRIBUTE (LOCAL ONLY) ATTRIBUTEROLLINGWINDOW “DISTRIBUTED” MAP CACHE IN-MEMORY OR REDIS-BACKED (NEW IN 1.8) NODE-LOCAL OR “SINGLE NODE” CENTRAL CACHE PROCESSORS: PUTDISTRIBUTEMAPCACHE, GETDISTRIBUTEDMAPCACHE BEFORE NIFI 1.8: NO EASY PARTITIONING / SHARDING 1.8 AND LATER: NODE BALANCED CONNECTIONS PARTITION BY ATTRIBUTE CACHE != STATE (but you can store state in a cache)
  • 46. 4 6 USING EXTERNAL STATE WITH NIFI USE EXTERNAL DATABASE (E.G. MYSQL) PERIODIC QUERY TO FIND EXPIRED TIMERS BEWARE OF RACE CONDITIONS / FREQUENT UPDATES
  • 47. 4 7 NIFI – SQL BASED STATE (STATE UPDATE)
  • 48. 4 8 NIFI – SQL BASED STATE (TIMER EXPIRATION)
  • 49. 4 9 FLINK APPROACH TO STATE KEYED (NODE LOCAL) STATE WINDOWED OPERATIONS (E.G. 10 MINUTE WINDOW SLIDING BY 1 MINUTE) EVERY OPERATOR CAN HAS ITS OWN STATE QUERYABLE STATE ROCKSDB (IN-MEMORY + DISK STORAGE) CHECKPOINTS AND SAVEPOINTS TO DURABLE FILESYSTEM (HDFS, S3)
  • 50. 5 0 NETWORK IN DISTRIBUTED FLINK STATE KAFKABROKERS PARTITION 1 PARTITION 2 PARTITION 3 PARTITION 4 PARTITION 5 PARTITION 6 FlinkKafkaConsumer FlinkKafkaConsumer FlinkKafkaConsumer FlinkKafkaConsumer FlinkKafkaConsumer FlinkKafkaConsumer NODE 1 NODE 2 NODE 3 KeyedStreamOperator KeyedStreamOperator KeyedStreamOperator KeyedStreamOperator KeyedStreamOperator KeyedStreamOperator P1 P2 P3 P4 NETWORK OUT keyBy() SHUFFLE/SORT STATE STATE STATE STATE STATE STATE Local STATE
  • 51. 5 1 WORKING WITH FLINK STATE private transient MapState<String, String> myState; public void open(Configuration config) { MapStateDescriptor<String, String> descriptor = new MapStateDescriptor<String, String>( “myStateName", // the state name String.class, String.class); // K/V types //get the mapstate for the key myState = getRuntimeContext().getMapState(descriptor); } public String map(String myField) { String myValue = myState.get(myField); myState.put(myField, myValue + “ another one”); } DECLARE VARIABLE DESCRIPTOR WITH TYPE INFORMATION INITIALIZE STATE READ/WRITE STATE
  • 53. 5 3 OPTION 1: BUILT-IN FLINK-NIFI CONNECTOR USES NIFI “SITE TO SITE” PROTOCOL ENABLES PASSING “FLOWFILE ATTRIBUTE” AND “FLOWFILE CONTENT” INTACT public interface NiFiDataPacket { byte[] getContent(); Map<String, String> getAttributes(); } FLINK BACKPRESSURE CONCERNS HTTPS://GITHUB.COM/APACHE/FLINK/TREE/MASTER/FLINK-CONNECTORS/FLINK-CONNECTOR-NIFI
  • 54. 5 4 OPTION 2: KAFKA TOPIC LOOSER COUPLING ALLOWS MANY “ACTION HANDLERS” (NOT JUST NIFI) MORE BUFFERING / REDUCE BACKPRESSURE RISK JSON AS STANDARD PAYLOAD
  • 56. 5 6 SOLUTION APPROACH FLINK AS THE HIGH VOLUME EVENT PROCESSOR • MANY USE CASES WITH ONE STREAM • SQL ON STREAM FLINK-BASED TRIGGER, FILTER, ENRICHMENT REQUEST, AND ACTION REQUEST FLINK MANAGES CUSTOMER JOURNEY STATE NIFI FOR: NAMED “PROFILES” FOR ENRICHMENT SERVICES NAMED “PROFILES” FOR NOTIFICATIONS AND ACTIONS Configuration-based use cases in Flink Library of handlers in NiFi
  • 57. 5 7 HIGH LEVEL SOLUTION Trigger Enrichment Orchestration Filter Enterprise Services (REST) Event Producers Notification Services Action Orchestration Enrich Action EVENT AND MESSAGE ORCHESTRATION
  • 58. 5 8 USE CASE CONFIGURATION (SIMPLIFIED) { "source": { "type": “kafka", “name": "vod_event_stream“ }, "triggerSql": "data.price > 5 AND data.order_type = 'Rental'“ “enrichment": {“profileName": “communicationprefs"}, "actions":[ {“profileName": "email", "templateId":"1234", “fieldMapping":[ {“field” : “cost”, “source” : “data.price”}] }, {“profileName": “sms", "templateId":“5678", “fieldMapping":[ {“field” : “cost”, “source” : “data.price”}] } }
  • 59. 5 9 HIGH LEVEL SOLUTION (WITH STATE) Trigger Enrichment Orchestration Filter Enterprise Services (REST) Event Producers Notification Services Action Orchestration Enrich Action EVENT AND MESSAGE ORCHESTRATION FLINK LOCAL STATE Journey State Management FLINK STATE
  • 60. 6 0 NIFI + FLINK SOLUTION SUMMARY NIFI FOR SERVICES, DATAFLOW, AND TEXT HANDLING FLINK FOR HIGH-PERFORMANCE STREAM PROCESSING FLINK FOR COMMON PATTERNS – CONFIG DRIVEN FLINK FOR STATE MANAGEMENT DECOUPLED LIBRARY OF ENRICHMENT HANDLERS AND ACTION HANDLERS
  • 61. 6 1 FUTURE WORK FLINK + NIFI SELF-SERVICE USE CASE PORTAL INCREASE CATALOG OF ACTIONS AND ENRICHMENT PROFILES MOVE MORE COMMON CAPABILITIES TO FLINK