SlideShare a Scribd company logo
1 of 90
Download to read offline
A Practical Guide to Selecting a
Stream Processing Technology
Michael  G.  Noll
Product  Manager,  Confluent
Kafka Talk Series
Date Title
Sep 27 Introduction	
  To	
  Streaming	
  Data	
  and	
  Stream	
  Processing	
  with	
  Apache	
  Kafka
Oct	
  06 Deep	
  Dive	
  into	
  Apache	
  Kafka
Oct	
  27 Data	
  Integration	
  with	
  Apache	
  Kafka
Nov	
  17 Demystifying	
  Stream	
  Processing	
  with	
  Apache	
  Kafka
Dec	
  01 A	
  Practical	
  Guide	
  to	
  Selecting	
  a	
  Stream	
  Processing	
  Technology
Dec	
  15 Streaming	
  in	
  Practice:	
  Putting	
  Apache	
  Kafka	
  in	
  Production
https://www.confluent.io/apache-­‐kafka-­‐talk-­‐series
Agenda
• Recap:  What  is  Stream  Processing?
• The  Three  Pillars  of  Stream  Processing  in  Practice
• Key  Selection  Criteria
• Organizational/Non-Technical  Dimensions
• Technical  Dimensions
• Summary
Agenda
• Recap:  What  is  Stream  Processing?
• The  Three  Pillars  of  Stream  Processing  in  Practice
• Key  Selection  Criteria
• Organizational/Non-Technical  Dimensions
• Technical  Dimensions
• Summary
Agenda
• Recap:  What  is  Stream  Processing?
• The  Three  Pillars  of  Stream  Processing  in  Practice
• Key  Selection  Criteria
• Organizational/Non-Technical  Dimensions
• Technical  Dimensions
• Summary
Powered by Kafka (﴾thousands more)﴿
Spark Streaming API (﴾2.0)﴿
Kafka’s Streams API (﴾0.10)﴿
Example: Streams and Tables in Kafka
Word Count
hello 2
kafka 1
world 1
… …
Streams & Databases
• A  stream  processing  technology  must  have  first-class  
support  for Streams  and Tables
• With  scalability,  fault  tolerance,  …
• Why?  Because  most  use  cases  require  not  just  one,  but  both!
• Support  – or  lack  thereof  – strongly  impacts  the  resulting  
technical  architecture  and  development  efforts
• No  support  means:
• Painful  Do-It-Yourself
• Increased  complexity,  more  moving  pieces  to  juggle
Agenda
• Recap:  What  is  Stream  Processing?
• The  Three  Pillars  of  Stream  Processing  in  Practice
• Key  Selection  Criteria
• Organizational/Non-Technical  Dimensions
• Technical  Dimensions
• Summary
Agenda
• Recap:  What  is  Stream  Processing?
• The  Three  Pillars  of  Stream  Processing  in  Practice
• Key  Selection  Criteria
• Organizational/Non-Technical  Dimensions
• Technical  Dimensions
• Summary
Organizational/Non-‐Tech Dimensions
• Can  your  org  understand  and  leverage  the  technology?
• Familiarity  with  languages;  intuitive  concepts  and  APIs;  trainings
• Are  you  permitted  to  use  it  in  your  organization?
• Security  features,  licensing,  open  source  vs.  proprietary
• Can  you  continue  to  use  it  in  the  future?
• Longevity  of  technology,  licensing,  vendor  strength
Organizational/Non-‐Tech Dimensions
• Do  you  believe  in  the  long-term  vision?
• Switching  technologies  in  an  organization  is  often  expensive/slow:  
legacy  migration,  re-training,  resistance  to  change,  etc.
• What  is  the  path  and  time  to  success?
• Can  you  move  smoothly  and  quickly  from  proof-of-concept  to  
production?
• Areas  and  range  of  applicability in  your  organization
• General-purpose  vs.  niche  technology
• Viable  for  S/M/L/XL  use  cases  vs.  for  XL  use  cases  only
• Building  core  business  apps  vs.  doing  backend  analytics
Organizational/Non-‐Tech Dimensions
Licensing Vision/Roadmap ROI
Impact	
  on
Organization
Broad	
  vs.	
  Niche
Applicability
Time	
  to	
  Market
Professional
Services
Documentation Examples User	
  CommunityLearning	
  Curve
Impact	
  on	
  Tools,
Infrastructure,	
  …
Agenda
• Recap:  What  is  Stream  Processing?
• The  Three  Pillars  of  Stream  Processing  in  Practice
• Key  Selection  Criteria
• Organizational/Non-Technical  Dimensions
• Technical  Dimensions
• Summary
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
State
• Stateful  processing  of  any  kind  requires…state
• Many  (most?)  use  cases  for  stream  processing  are  stateful
• Joins,  aggregations,  windowing,  counting,  ...
• Is  state  performant?  Local  vs.  remote  state?
50
State
• Stateful  processing  of  any  kind  requires…state
• Many  (most?)  use  cases  for  stream  processing  are  stateful
• Joins,  aggregations,  windowing,  counting,  ...
• Is  state  performant?  Local  vs.  remote  state?
• Is  state  fault-tolerant?  How  fast  is  recovery/failover?
53
State
• Stateful  processing  of  any  kind  requires…state
• Many  (most?)  use  cases  for  stream  processing  are  stateful
• Joins,  aggregations,  windowing,  counting,  ...
• Is  state  performant?  Local  vs.  remote  state?
• Is  state  fault-tolerant?  How  fast  is  recovery/failover?
• Is  state  interactively  queryable?
• Kafka:  ready  for  use  (GA)
• Spark,  Flink:  under  development  (alpha)
• Storm,  Samza,  and  others:  not  available
55
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
Abstractions
• What  are  the  data  model  and  the  available  abstractions?
• Most  common  abstraction:  stream of  records,  events
• Kafka,  Spark,  Storm,  Samza,  Flink,  Apex,  ...
• New,  very  powerful:  table  of  records
• Currently  unique  to  Kafka
• Represents  latest  state and  materialized  views
• State  must  have  a  first-class  abstraction  because,  as  we  just  saw  in  
the  previous  section,  state  is  crucial  for  stream  processing!
58
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
Time model
• Different  use  cases  require  different  time  semantics
• Great  majority  of  use  cases  require  event-time semantics
• Other  use  cases  may  require  processing-time (e.g.  real-
time  monitoring)  or  special  variants  like  ingestion-time
• A  stream  processing  technology  should,  at  a  minimum,  
support  event-time  to  cover  most  use  cases  in  practice
• Examples:  Kafka,  Beam,  Flink
Time Model
61
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
Windowing
• Windowing  is  an  operation  that  groups events
Windowing
Input	
  data,	
  where
colors	
  represent
different	
  users	
  events
Rectangles	
  denote
different	
  event-­‐time
windows
processing-­‐time
event-­‐time
windowing
alice
bob
dave
Windowing
• Windowing  is  an  operation  that  groups events
• Most  commonly  needed:  time  windows,  session  windows
• Examples:
• Real-time  monitoring:  5-minute  averages
• Reader  behavior  on  a  website:  user  browsing  sessions
Windowing
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
Out-‐of-‐order and late-‐arriving data
• Is  very  common in  practice,  not  a  rare  corner  case
• Related  to  time  model  discussion
Out-‐of-‐order and late-‐arriving data
Users	
  with	
  mobile	
  phones	
  enter
airplane,	
  lose	
  Internet	
  connectivity
Emails	
  are	
  being	
  written
during	
  the	
  10h	
  flight
Internet	
  connectivity	
  is	
  restored,
phones	
  will	
  send	
  queued	
  emails	
  now
Out-‐of-‐order and late-‐arriving data
• Is  very  common in  practice,  not  a  rare  corner  case
• Related  to  time  model  discussion
• We  want  control over  how  out-of-order  data  is  handled
• Example:
• We  process  data  in  5-minute  windows,  e.g.  compute  statistics
• When  event  arrives  1  minute  late:  update the  original  result!
• When  event  arrives  2  hours  late:  discard it!
• Handling  must  be  efficient because  it  happens  so  often
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
Reprocessing
• Re-process  data  by  rewinding  a  stream  back  in  time
• Use  cases  in  practice  include
• Correcting  output  data  after  fixing  a  bug
• Facilitate  iterative  and  explorative  development
• A/B  testing
• Processing  historical  data
• Walking  through  "What  If?"  scenarios
• Also:  often  used  behind-the-scenes  for  fault  tolerance
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
Scalability, Elasticity, Fault Tolerance
• Can  the  technology  scale according  to  your  needs?
• Desired  latency,  throughput?
• Able  to  process  millions  of  messages  per  second?
• What  is  the  minimum  footprint?
• Expand/shrink  capacity  dynamically  during  operations?
• Helps  with  resource  utilization  because  most  stream  apps  run  continuously
• Resilience and  fault  tolerance
• Which  guarantees  for  data  delivery  and  for  state?  "At-least-once",  "exactly-
once",  "effectively-once",  etc.
• Failover  behavior  and  recovery  time?  Automated  or  manual?
• Any  negative  impact  of  fault  tolerance  features  on  performance?
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
Security
• To  meet  internal  security  policies,  legal  compliance,  etc.
• Typical  base  requirements  for  stream  processing  applications:
• Encrypt  data-in-transit  (e.g.  from/to  Kafka)
• Authentication:  "only  some  applications  may  talk  to  production"
• Authorization:  "access  to  sensitive  data  such  as  PII  is  restricted”
• The  easier  it  is  to  use  security  features,  the  more  likely  they  are  
actually  being  used  in  practice
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
Processing Model
• True  stream  processing  is  record-at-a-time processing
• Benefits  include  low  latency (millisecs),  dealing  efficiently  with  out-of-order  data
• Can  provide  both  latency  and  high  throughput  via  internal  optimizations
• Examples:  Kafka,  Storm,  Samza,  Flink,  Beam
• Some  processing  technologies  opt  for  (micro)batching
• Micro-batching  has  no  true  benefits:  consider  it  a  technical  workaround  to  
shoehorn  stream-like  functionality  into  a  tool
• Suffers  from  significant  overhead  when  dealing  with  e.g.  out-of-order/late-arriving  
data,  when  performing  windowed  analyses  (e.g.  session  windows)
• Typically  a  strong  blocker  for  use  cases  such  as  fraud  detection  or  anything  where  
"a  few  seconds"  of  latency  is  prohibitive
• Examples:  Spark,  Storm  (Trident),  Hadoop*
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
API
• Choice  of  API  is  a  subjective  matter  – skills,  preference,  …
• Typical  options
• Declarative,  expressive  API:  operations  like  map(),  filter()
• Imperative,  lower-level  API:  callbacks  like  process(event)
• Streaming  SQL:  STREAM  SELECT  …  FROM  …  WHERE  …  
• In  the  best  case  you  get  not  just  one,  but  all  three
• "Abstractions  are  great!"
• "Abstractions  considered  harmful!"
Technical Dimensions
Reprocessing Scalability	
  &
Elasticity
Fault	
  Tolerance
API Dev/Ops
Lifecycle
Security Processing
Model
Out	
  of	
  Order
Data
Abstractions Time	
  Model WindowingState
Developer/Operations Lifecycle
• How  should  your  daily  work  look  and  feel  like?
• "I  like  to  do  quick,  iterative  development"  (modify/test/repeat)
• "I  want  to  decouple  team  roadmaps,  project  schedules"
• Big  difference  between  App  Model  <->  Cluster  Model
• Testing,  packaging,  deployment,  monitoring,  operations
• "Do  I  need  to  know  Java  (app)  or  YARN  (cluster)  for  this?”
• "I  want  reactive  processing  in  containers  that  run  on  Mesos!"
• Rolling,  no-downtime  upgrades?
• Integration  with  existing  Ops  infra,  tools,  processes?
Agenda
• Recap:  What  is  Stream  Processing?
• The  Three  Pillars  of  Stream  Processing  in  Practice
• Key  Selection  Criteria
• Organizational/Non-Technical  Dimensions
• Technical  Dimensions
• Summary
Summary
• What  we  covered  is  a  good  starting  point
• But,  no  free  lunch!
• Understand  what  you  need,  and  weigh  criteria  appropriately
• Think  end-to-end:  idea,  development,  operations,  troubleshooting
• Think  big-picture:  future  use  cases,  architecture,  security,  training,  …
• Do  your  own  internal  hackathons,  proof-of-concepts
• Do  your  own  benchmarks
• If  in  doubt:  simplicity  beats  complexity
• Faster  to  learn,  easier  to  understand,  less  likely  to  fail,  …
Q&A Session
89
Coming Up Next
Date Title Speaker
Dec	
  15 Streaming in Practice: Putting Apache
Kafka in Production
Roger Hoover
https://www.confluent.io/apache-­‐kafka-­‐talk-­‐series

More Related Content

What's hot

Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsSlim Baltagi
 
Event Driven Architectures with Apache Kafka on Heroku
Event Driven Architectures with Apache Kafka on HerokuEvent Driven Architectures with Apache Kafka on Heroku
Event Driven Architectures with Apache Kafka on HerokuHeroku
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to SurviveHadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Surviveconfluent
 
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming ApplicationsMetrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applicationsconfluent
 
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...confluent
 
What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2 What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2 confluent
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignMichael Noll
 
How to over-engineer things and have fun? | Oto Brglez, OPALAB
How to over-engineer things and have fun? | Oto Brglez, OPALABHow to over-engineer things and have fun? | Oto Brglez, OPALAB
How to over-engineer things and have fun? | Oto Brglez, OPALABHostedbyConfluent
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBconfluent
 
Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center   Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center confluent
 
Confluent building a real-time streaming platform using kafka streams and k...
Confluent   building a real-time streaming platform using kafka streams and k...Confluent   building a real-time streaming platform using kafka streams and k...
Confluent building a real-time streaming platform using kafka streams and k...Thomas Alex
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...confluent
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingGuozhang Wang
 
Using Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session WindowsUsing Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session Windowsconfluent
 
Evolving from Messaging to Event Streaming
Evolving from Messaging to Event StreamingEvolving from Messaging to Event Streaming
Evolving from Messaging to Event Streamingconfluent
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterconfluent
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupGwen (Chen) Shapira
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
 

What's hot (20)

Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Event Driven Architectures with Apache Kafka on Heroku
Event Driven Architectures with Apache Kafka on HerokuEvent Driven Architectures with Apache Kafka on Heroku
Event Driven Architectures with Apache Kafka on Heroku
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to SurviveHadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
 
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming ApplicationsMetrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
 
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
 
What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2 What's new in Confluent 3.2 and Apache Kafka 0.10.2
What's new in Confluent 3.2 and Apache Kafka 0.10.2
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
How to over-engineer things and have fun? | Oto Brglez, OPALAB
How to over-engineer things and have fun? | Oto Brglez, OPALABHow to over-engineer things and have fun? | Oto Brglez, OPALAB
How to over-engineer things and have fun? | Oto Brglez, OPALAB
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center   Monitoring Apache Kafka with Confluent Control Center
Monitoring Apache Kafka with Confluent Control Center
 
Confluent building a real-time streaming platform using kafka streams and k...
Confluent   building a real-time streaming platform using kafka streams and k...Confluent   building a real-time streaming platform using kafka streams and k...
Confluent building a real-time streaming platform using kafka streams and k...
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
Using Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session WindowsUsing Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session Windows
 
Evolving from Messaging to Event Streaming
Evolving from Messaging to Event StreamingEvolving from Messaging to Event Streaming
Evolving from Messaging to Event Streaming
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data Meetup
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 

Viewers also liked

Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Productionconfluent
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafkaconfluent
 
Demystifying Stream Processing with Apache Kafka
Demystifying Stream Processing with Apache KafkaDemystifying Stream Processing with Apache Kafka
Demystifying Stream Processing with Apache Kafkaconfluent
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analyticsconfluent
 
The Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and ServicesThe Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and Servicesconfluent
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structuresconfluent
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafkaconfluent
 
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...confluent
 
Introduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache KafkaIntroduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache Kafkaconfluent
 
Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafkaconfluent
 
Data Pipelines Made Simple with Apache Kafka
Data Pipelines Made Simple with Apache KafkaData Pipelines Made Simple with Apache Kafka
Data Pipelines Made Simple with Apache Kafkaconfluent
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...confluent
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetricconfluent
 
Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan confluent
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIconfluent
 
Building Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache KafkaBuilding Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache Kafkaconfluent
 
Partner Development Guide for Kafka Connect
Partner Development Guide for Kafka ConnectPartner Development Guide for Kafka Connect
Partner Development Guide for Kafka Connectconfluent
 
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...confluent
 
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...confluent
 
Confluent & Attunity: Mainframe Data Modern Analytics
Confluent & Attunity: Mainframe Data Modern AnalyticsConfluent & Attunity: Mainframe Data Modern Analytics
Confluent & Attunity: Mainframe Data Modern Analyticsconfluent
 

Viewers also liked (20)

Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
 
Demystifying Stream Processing with Apache Kafka
Demystifying Stream Processing with Apache KafkaDemystifying Stream Processing with Apache Kafka
Demystifying Stream Processing with Apache Kafka
 
Leveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern AnalyticsLeveraging Mainframe Data for Modern Analytics
Leveraging Mainframe Data for Modern Analytics
 
The Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and ServicesThe Data Dichotomy- Rethinking the Way We Treat Data and Services
The Data Dichotomy- Rethinking the Way We Treat Data and Services
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
 
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
 
Introduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache KafkaIntroduction To Streaming Data and Stream Processing with Apache Kafka
Introduction To Streaming Data and Stream Processing with Apache Kafka
 
Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafka
 
Data Pipelines Made Simple with Apache Kafka
Data Pipelines Made Simple with Apache KafkaData Pipelines Made Simple with Apache Kafka
Data Pipelines Made Simple with Apache Kafka
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and VormetricProtecting your data at rest with Apache Kafka by Confluent and Vormetric
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
 
Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
 
Building Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache KafkaBuilding Event-Driven Services with Apache Kafka
Building Event-Driven Services with Apache Kafka
 
Partner Development Guide for Kafka Connect
Partner Development Guide for Kafka ConnectPartner Development Guide for Kafka Connect
Partner Development Guide for Kafka Connect
 
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
 
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
Introducing Kafka Streams: Large-scale Stream Processing with Kafka, Neha Nar...
 
Confluent & Attunity: Mainframe Data Modern Analytics
Confluent & Attunity: Mainframe Data Modern AnalyticsConfluent & Attunity: Mainframe Data Modern Analytics
Confluent & Attunity: Mainframe Data Modern Analytics
 

Similar to A Practical Guide to Selecting a Stream Processing Technology

Azure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesAzure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesIvo Andreev
 
Integration strategies best practices- Mulesoft meetup April 2018
Integration strategies   best practices- Mulesoft meetup April 2018Integration strategies   best practices- Mulesoft meetup April 2018
Integration strategies best practices- Mulesoft meetup April 2018Rohan Rasane
 
Introduction to the Typesafe Reactive Platform
Introduction to the Typesafe Reactive PlatformIntroduction to the Typesafe Reactive Platform
Introduction to the Typesafe Reactive PlatformBoldRadius Solutions
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Spark Summit
 
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...Modern DevOps across Technologies on premises and clouds with Oracle Manageme...
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...Lucas Jellema
 
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Pavel Hardak
 
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Eren Avşaroğulları
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsClaudiu Barbura
 
Oracle Management Cloud - introduction, overview and getting started (AMIS, 2...
Oracle Management Cloud - introduction, overview and getting started (AMIS, 2...Oracle Management Cloud - introduction, overview and getting started (AMIS, 2...
Oracle Management Cloud - introduction, overview and getting started (AMIS, 2...Lucas Jellema
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022HostedbyConfluent
 
6 GigaSpaces Principles to Survive Black Friday
6 GigaSpaces Principles to Survive Black Friday6 GigaSpaces Principles to Survive Black Friday
6 GigaSpaces Principles to Survive Black FridayAli Hodroj
 
Ultra-scale e-Commerce Transaction Services with Lean Middleware
Ultra-scale e-Commerce Transaction Services with Lean Middleware Ultra-scale e-Commerce Transaction Services with Lean Middleware
Ultra-scale e-Commerce Transaction Services with Lean Middleware WSO2
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Value Association
 
Oracle Forms Modernization Roadmap
Oracle Forms Modernization RoadmapOracle Forms Modernization Roadmap
Oracle Forms Modernization RoadmapKai-Uwe Möller
 
Oracle Sistemas Convergentes
Oracle Sistemas ConvergentesOracle Sistemas Convergentes
Oracle Sistemas ConvergentesFran Navarro
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!Richard Robinson
 
Top Down Network Design - ebrahma.com
Top Down Network Design - ebrahma.comTop Down Network Design - ebrahma.com
Top Down Network Design - ebrahma.comPawan Sharma
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko GlobalLogic Ukraine
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesNick Pentreath
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowGoDataDriven
 

Similar to A Practical Guide to Selecting a Stream Processing Technology (20)

Azure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesAzure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challenges
 
Integration strategies best practices- Mulesoft meetup April 2018
Integration strategies   best practices- Mulesoft meetup April 2018Integration strategies   best practices- Mulesoft meetup April 2018
Integration strategies best practices- Mulesoft meetup April 2018
 
Introduction to the Typesafe Reactive Platform
Introduction to the Typesafe Reactive PlatformIntroduction to the Typesafe Reactive Platform
Introduction to the Typesafe Reactive Platform
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
 
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...Modern DevOps across Technologies on premises and clouds with Oracle Manageme...
Modern DevOps across Technologies on premises and clouds with Oracle Manageme...
 
Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020Spark Development Lifecycle at Workday - ApacheCon 2020
Spark Development Lifecycle at Workday - ApacheCon 2020
 
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
Apache Spark Development Lifecycle @ Workday - ApacheCon 2020
 
Lessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatternsLessons learned from embedding Cassandra in xPatterns
Lessons learned from embedding Cassandra in xPatterns
 
Oracle Management Cloud - introduction, overview and getting started (AMIS, 2...
Oracle Management Cloud - introduction, overview and getting started (AMIS, 2...Oracle Management Cloud - introduction, overview and getting started (AMIS, 2...
Oracle Management Cloud - introduction, overview and getting started (AMIS, 2...
 
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
Unbundling the Modern Streaming Stack With Dunith Dhanushka | Current 2022
 
6 GigaSpaces Principles to Survive Black Friday
6 GigaSpaces Principles to Survive Black Friday6 GigaSpaces Principles to Survive Black Friday
6 GigaSpaces Principles to Survive Black Friday
 
Ultra-scale e-Commerce Transaction Services with Lean Middleware
Ultra-scale e-Commerce Transaction Services with Lean Middleware Ultra-scale e-Commerce Transaction Services with Lean Middleware
Ultra-scale e-Commerce Transaction Services with Lean Middleware
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
Oracle Forms Modernization Roadmap
Oracle Forms Modernization RoadmapOracle Forms Modernization Roadmap
Oracle Forms Modernization Roadmap
 
Oracle Sistemas Convergentes
Oracle Sistemas ConvergentesOracle Sistemas Convergentes
Oracle Sistemas Convergentes
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
 
Top Down Network Design - ebrahma.com
Top Down Network Design - ebrahma.comTop Down Network Design - ebrahma.com
Top Down Network Design - ebrahma.com
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI Pipelines
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlow
 

More from confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

More from confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Recently uploaded

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

A Practical Guide to Selecting a Stream Processing Technology

  • 1. A Practical Guide to Selecting a Stream Processing Technology Michael  G.  Noll Product  Manager,  Confluent
  • 2. Kafka Talk Series Date Title Sep 27 Introduction  To  Streaming  Data  and  Stream  Processing  with  Apache  Kafka Oct  06 Deep  Dive  into  Apache  Kafka Oct  27 Data  Integration  with  Apache  Kafka Nov  17 Demystifying  Stream  Processing  with  Apache  Kafka Dec  01 A  Practical  Guide  to  Selecting  a  Stream  Processing  Technology Dec  15 Streaming  in  Practice:  Putting  Apache  Kafka  in  Production https://www.confluent.io/apache-­‐kafka-­‐talk-­‐series
  • 3. Agenda • Recap:  What  is  Stream  Processing? • The  Three  Pillars  of  Stream  Processing  in  Practice • Key  Selection  Criteria • Organizational/Non-Technical  Dimensions • Technical  Dimensions • Summary
  • 4. Agenda • Recap:  What  is  Stream  Processing? • The  Three  Pillars  of  Stream  Processing  in  Practice • Key  Selection  Criteria • Organizational/Non-Technical  Dimensions • Technical  Dimensions • Summary
  • 5.
  • 6.
  • 7.
  • 8.
  • 9. Agenda • Recap:  What  is  Stream  Processing? • The  Three  Pillars  of  Stream  Processing  in  Practice • Key  Selection  Criteria • Organizational/Non-Technical  Dimensions • Technical  Dimensions • Summary
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. Powered by Kafka (﴾thousands more)﴿
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. Spark Streaming API (﴾2.0)﴿
  • 21. Kafka’s Streams API (﴾0.10)﴿
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37. Example: Streams and Tables in Kafka Word Count hello 2 kafka 1 world 1 … …
  • 38.
  • 39.
  • 40.
  • 41.
  • 42. Streams & Databases • A  stream  processing  technology  must  have  first-class   support  for Streams  and Tables • With  scalability,  fault  tolerance,  … • Why?  Because  most  use  cases  require  not  just  one,  but  both! • Support  – or  lack  thereof  – strongly  impacts  the  resulting   technical  architecture  and  development  efforts • No  support  means: • Painful  Do-It-Yourself • Increased  complexity,  more  moving  pieces  to  juggle
  • 43. Agenda • Recap:  What  is  Stream  Processing? • The  Three  Pillars  of  Stream  Processing  in  Practice • Key  Selection  Criteria • Organizational/Non-Technical  Dimensions • Technical  Dimensions • Summary
  • 44. Agenda • Recap:  What  is  Stream  Processing? • The  Three  Pillars  of  Stream  Processing  in  Practice • Key  Selection  Criteria • Organizational/Non-Technical  Dimensions • Technical  Dimensions • Summary
  • 45. Organizational/Non-‐Tech Dimensions • Can  your  org  understand  and  leverage  the  technology? • Familiarity  with  languages;  intuitive  concepts  and  APIs;  trainings • Are  you  permitted  to  use  it  in  your  organization? • Security  features,  licensing,  open  source  vs.  proprietary • Can  you  continue  to  use  it  in  the  future? • Longevity  of  technology,  licensing,  vendor  strength
  • 46. Organizational/Non-‐Tech Dimensions • Do  you  believe  in  the  long-term  vision? • Switching  technologies  in  an  organization  is  often  expensive/slow:   legacy  migration,  re-training,  resistance  to  change,  etc. • What  is  the  path  and  time  to  success? • Can  you  move  smoothly  and  quickly  from  proof-of-concept  to   production? • Areas  and  range  of  applicability in  your  organization • General-purpose  vs.  niche  technology • Viable  for  S/M/L/XL  use  cases  vs.  for  XL  use  cases  only • Building  core  business  apps  vs.  doing  backend  analytics
  • 47. Organizational/Non-‐Tech Dimensions Licensing Vision/Roadmap ROI Impact  on Organization Broad  vs.  Niche Applicability Time  to  Market Professional Services Documentation Examples User  CommunityLearning  Curve Impact  on  Tools, Infrastructure,  …
  • 48. Agenda • Recap:  What  is  Stream  Processing? • The  Three  Pillars  of  Stream  Processing  in  Practice • Key  Selection  Criteria • Organizational/Non-Technical  Dimensions • Technical  Dimensions • Summary
  • 49. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 50. State • Stateful  processing  of  any  kind  requires…state • Many  (most?)  use  cases  for  stream  processing  are  stateful • Joins,  aggregations,  windowing,  counting,  ... • Is  state  performant?  Local  vs.  remote  state? 50
  • 51.
  • 52.
  • 53. State • Stateful  processing  of  any  kind  requires…state • Many  (most?)  use  cases  for  stream  processing  are  stateful • Joins,  aggregations,  windowing,  counting,  ... • Is  state  performant?  Local  vs.  remote  state? • Is  state  fault-tolerant?  How  fast  is  recovery/failover? 53
  • 54.
  • 55. State • Stateful  processing  of  any  kind  requires…state • Many  (most?)  use  cases  for  stream  processing  are  stateful • Joins,  aggregations,  windowing,  counting,  ... • Is  state  performant?  Local  vs.  remote  state? • Is  state  fault-tolerant?  How  fast  is  recovery/failover? • Is  state  interactively  queryable? • Kafka:  ready  for  use  (GA) • Spark,  Flink:  under  development  (alpha) • Storm,  Samza,  and  others:  not  available 55
  • 56.
  • 57. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 58. Abstractions • What  are  the  data  model  and  the  available  abstractions? • Most  common  abstraction:  stream of  records,  events • Kafka,  Spark,  Storm,  Samza,  Flink,  Apex,  ... • New,  very  powerful:  table  of  records • Currently  unique  to  Kafka • Represents  latest  state and  materialized  views • State  must  have  a  first-class  abstraction  because,  as  we  just  saw  in   the  previous  section,  state  is  crucial  for  stream  processing! 58
  • 59. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 60. Time model • Different  use  cases  require  different  time  semantics • Great  majority  of  use  cases  require  event-time semantics • Other  use  cases  may  require  processing-time (e.g.  real- time  monitoring)  or  special  variants  like  ingestion-time • A  stream  processing  technology  should,  at  a  minimum,   support  event-time  to  cover  most  use  cases  in  practice • Examples:  Kafka,  Beam,  Flink
  • 62. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 63. Windowing • Windowing  is  an  operation  that  groups events
  • 64. Windowing Input  data,  where colors  represent different  users  events Rectangles  denote different  event-­‐time windows processing-­‐time event-­‐time windowing alice bob dave
  • 65. Windowing • Windowing  is  an  operation  that  groups events • Most  commonly  needed:  time  windows,  session  windows • Examples: • Real-time  monitoring:  5-minute  averages • Reader  behavior  on  a  website:  user  browsing  sessions
  • 67. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 68. Out-‐of-‐order and late-‐arriving data • Is  very  common in  practice,  not  a  rare  corner  case • Related  to  time  model  discussion
  • 69. Out-‐of-‐order and late-‐arriving data Users  with  mobile  phones  enter airplane,  lose  Internet  connectivity Emails  are  being  written during  the  10h  flight Internet  connectivity  is  restored, phones  will  send  queued  emails  now
  • 70. Out-‐of-‐order and late-‐arriving data • Is  very  common in  practice,  not  a  rare  corner  case • Related  to  time  model  discussion • We  want  control over  how  out-of-order  data  is  handled • Example: • We  process  data  in  5-minute  windows,  e.g.  compute  statistics • When  event  arrives  1  minute  late:  update the  original  result! • When  event  arrives  2  hours  late:  discard it! • Handling  must  be  efficient because  it  happens  so  often
  • 71. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 72. Reprocessing • Re-process  data  by  rewinding  a  stream  back  in  time • Use  cases  in  practice  include • Correcting  output  data  after  fixing  a  bug • Facilitate  iterative  and  explorative  development • A/B  testing • Processing  historical  data • Walking  through  "What  If?"  scenarios • Also:  often  used  behind-the-scenes  for  fault  tolerance
  • 73.
  • 74. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 75. Scalability, Elasticity, Fault Tolerance • Can  the  technology  scale according  to  your  needs? • Desired  latency,  throughput? • Able  to  process  millions  of  messages  per  second? • What  is  the  minimum  footprint? • Expand/shrink  capacity  dynamically  during  operations? • Helps  with  resource  utilization  because  most  stream  apps  run  continuously • Resilience and  fault  tolerance • Which  guarantees  for  data  delivery  and  for  state?  "At-least-once",  "exactly- once",  "effectively-once",  etc. • Failover  behavior  and  recovery  time?  Automated  or  manual? • Any  negative  impact  of  fault  tolerance  features  on  performance?
  • 76.
  • 77.
  • 78.
  • 79. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 80. Security • To  meet  internal  security  policies,  legal  compliance,  etc. • Typical  base  requirements  for  stream  processing  applications: • Encrypt  data-in-transit  (e.g.  from/to  Kafka) • Authentication:  "only  some  applications  may  talk  to  production" • Authorization:  "access  to  sensitive  data  such  as  PII  is  restricted” • The  easier  it  is  to  use  security  features,  the  more  likely  they  are   actually  being  used  in  practice
  • 81. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 82. Processing Model • True  stream  processing  is  record-at-a-time processing • Benefits  include  low  latency (millisecs),  dealing  efficiently  with  out-of-order  data • Can  provide  both  latency  and  high  throughput  via  internal  optimizations • Examples:  Kafka,  Storm,  Samza,  Flink,  Beam • Some  processing  technologies  opt  for  (micro)batching • Micro-batching  has  no  true  benefits:  consider  it  a  technical  workaround  to   shoehorn  stream-like  functionality  into  a  tool • Suffers  from  significant  overhead  when  dealing  with  e.g.  out-of-order/late-arriving   data,  when  performing  windowed  analyses  (e.g.  session  windows) • Typically  a  strong  blocker  for  use  cases  such  as  fraud  detection  or  anything  where   "a  few  seconds"  of  latency  is  prohibitive • Examples:  Spark,  Storm  (Trident),  Hadoop*
  • 83. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 84. API • Choice  of  API  is  a  subjective  matter  – skills,  preference,  … • Typical  options • Declarative,  expressive  API:  operations  like  map(),  filter() • Imperative,  lower-level  API:  callbacks  like  process(event) • Streaming  SQL:  STREAM  SELECT  …  FROM  …  WHERE  …   • In  the  best  case  you  get  not  just  one,  but  all  three • "Abstractions  are  great!" • "Abstractions  considered  harmful!"
  • 85. Technical Dimensions Reprocessing Scalability  & Elasticity Fault  Tolerance API Dev/Ops Lifecycle Security Processing Model Out  of  Order Data Abstractions Time  Model WindowingState
  • 86. Developer/Operations Lifecycle • How  should  your  daily  work  look  and  feel  like? • "I  like  to  do  quick,  iterative  development"  (modify/test/repeat) • "I  want  to  decouple  team  roadmaps,  project  schedules" • Big  difference  between  App  Model  <->  Cluster  Model • Testing,  packaging,  deployment,  monitoring,  operations • "Do  I  need  to  know  Java  (app)  or  YARN  (cluster)  for  this?” • "I  want  reactive  processing  in  containers  that  run  on  Mesos!" • Rolling,  no-downtime  upgrades? • Integration  with  existing  Ops  infra,  tools,  processes?
  • 87. Agenda • Recap:  What  is  Stream  Processing? • The  Three  Pillars  of  Stream  Processing  in  Practice • Key  Selection  Criteria • Organizational/Non-Technical  Dimensions • Technical  Dimensions • Summary
  • 88. Summary • What  we  covered  is  a  good  starting  point • But,  no  free  lunch! • Understand  what  you  need,  and  weigh  criteria  appropriately • Think  end-to-end:  idea,  development,  operations,  troubleshooting • Think  big-picture:  future  use  cases,  architecture,  security,  training,  … • Do  your  own  internal  hackathons,  proof-of-concepts • Do  your  own  benchmarks • If  in  doubt:  simplicity  beats  complexity • Faster  to  learn,  easier  to  understand,  less  likely  to  fail,  …
  • 90. Coming Up Next Date Title Speaker Dec  15 Streaming in Practice: Putting Apache Kafka in Production Roger Hoover https://www.confluent.io/apache-­‐kafka-­‐talk-­‐series