In this talk we'll look at the relationship between three of the most disruptive software engineering paradigms: event sourcing, stream processing and serverless. We'll debunk some of the myths around event sourcing. We'll look at the inevitability of event-driven programming in the serverless space and we'll see how stream processing links these two concepts together with a single 'database for events'. As the story unfolds we'll dive into some use cases, examine the practicalities of each approach-particularly the stateful elements-and finally extrapolate how their future relationship is likely to unfold. Key takeaways include: The different flavors of event sourcing and where their value lies. The difference between stream processing at application- and infrastructure-levels. The relationship between stream processors and serverless functions. The practical limits of storing data in Kafka and stream processors like KSQL.
2. What we’re going to talk about
• Event Sourcing
• What it is and how does it relate to Event Streaming?
• Stream Processing as a kind of “Database”
• What does this mean?
• Serverless Functions
• How do this relate?
7. Traditional Event Sourcing
(Store immutable events in a database in time order)
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L AT F O R MTable of events
Persist events
Apps Apps
8. Traditional Event Sourcing (Read)
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L AT F O R M
Apps
Search Monitoring
Apps Apps
Chronological
Reduce on read
(done inside the app)
Query by
customer Id
(+session?)
- No schema migration
- Similar to ’schema on read’
12. Analytics
Keep the data needed to
extract trends and behaviors
i.e. non-lossy
(e.g. insight, metrics, ML)
13. Traditional Event Sourcing
• Use a database (any one will do)
• Create a table and insert events as they occur
• Query all the events associated with your problem*
• Reduce them chronologically to get the current state
*Aggregate ID in DDD parlance
14. Traditional Event Sourcing with Kafka
• Use a database Kafka
• Create a table topic insert events as they occur
• Query all the events associated with your problem*
• Reduce them chronologically to get the current state
*Aggregate ID in DDD parlance
15. Confusion: You can’t query Kafka by say Customer Id*
*Aggregate ID in DDD parlance
16. If we can’t query by Customer ID
then what do we do?
17. CQRS is a tonic: Cache the projection in a ‘View’
Apps
Search Monitoring
Apps Apps
S T R E A M I N G P L AT F O R M
Query by customer Id
Apps
Search
NoSQL
Apps Apps
DWH
Hadoop
S T R E A M I N G P L AT F O R M
View
Events/Command
Events are the
Storage Model
Stream Processor
Cache/DB/Ktable etc.
Regenerate the view
rather than doing
schema migration
19. Even with CQRS, Event Sourcing is Hard
CQRS helps, but it’s still quite hard if you’re a CRUD app
20. What’s the problem?
Harder:
• Eventually Consistent
• Multi-model (Complexity ∝ #Schemas in the log)
• More moving parts
Apps
Search
NoSQL
Monitoring
Security
Apps Apps
S T R E A M I N G P L A T F O R M
CRUD System CQRS
21. New York Times Website
Source of Truth
Every article since
1851
https://www.confluent.io/blog/publishing-apache-kafka-new-york-times/
Normalized assets
(images, articles, bylines, tags
all separate messages)
Denormalized into
“Content View”
22. If CRUD makes sense there are other ways:
audit tables, CDC, etc.
Trigger
Evidentiary
Replayable N/A to web app
Analytics
CDC
26. Online Transaction Processing: e.g. a Flight Booking System
- Flight price served 10,000 x #bookings
- Consistency required only at booking time
27. CQRS with event movement
Apps
Search Monitoring
Apps Apps
S T R E A M I N G P L AT F O R M
Apps
Search
NoSQL
Apps Apps
DWH
Hadoop
S T R E A M I N G P L AT F O R M
View
Book Flight
Apps
Search
Apps
S T R E A M I N G P L A
View
Apps
Search
NoSQL
Apps
DWH
S T R E A M I N G P L A
View
Get Flights
Get Flights
Get Flights
Global Read
Central Write
29. Event Sourcing for Microservices
Basket Service
Fraud Service
Billing Service
Email ServiceBasket Events
30. Event Sourcing for Microservices
Basket Service
Fraud Service
Billing Service
Email ServiceBasket Events
Events are the
storage model
Each microservice creates a
view that suits its use case
31. Event Sourcing “with a DB”
for monoliths.
Event Streaming for
Microservices & Scale.
(Often via. CQRS)
33. Event Streaming is a more general form of Event Sourcing/CQRS
Event Streaming
• Events as shared data model
• Many microservices
• Polyglot persistence
• Event-Driven processing
Traditional Event Sourcing
• Events as a storage model
• Single microservice
• Single DB
• data-at-rest
34. Event Streams is about many event sources
(Join, Filter, Transform and Summarize)
Fraud Service
Orders
Service
Payment
Service
Customer
Service
Event Log
Projection created in
Kafka Streams API
35. KStreams & KSQL have different positioning
•KStreams is a library for Dataflow programming:
• App Logic & Stream Processor (including state) are combined.
• Apps are stateful.
• JVM only.
•KSQL is a ‘database’ for event preparation:
• App sends SQL to a separate process
• Apps are stateless
• Connect from any language
39. Serverless Functions (FaaS)
• Write a function
• Upload
• Configure a trigger (HTTP, Messaging, Object Store, Database, Timer etc.)
Request Respond Event Source
40. FaaS in a Nutshell
• Fully managed (Runs in a container pool)
• Pay for execution time (not resources used)
• Auto-scales with load
• 0-1000+ concurrent functions
• Stateless
• Short lived (limit 5-15 mins)
• Weak ordering guarantees
• Cold start’s can be (very) slow: 100ms – 45s (AWS 250ms-7s)
41. Where is FaaS useful?
• Spikey workloads and ‘occasional’ use cases
• Use cases that don’t typically warrant massive parallelism
e.g. CI systems.
• General purpose programming paradigm?
59. Queries filter out the
events you need
(much like you filter rows in a
database query)
60. FaaSFaaSFaaSKSQL
Customers
Table
KSQL as a “Database” for Event-Driven Infrastructure
FaaSFaaS
Stateless,
elastic compute
Prepare the
events we need
(Sateful)
Orders
Payments
Customers
Autoscale
with load
62. Event-Driven vs. Event Streaming
Event Driven Event Streaming
Multiple Event Sources Use Database + ETL + Code Handles automatically
Efficiency Extract data from DB in the
FaaS (IO)
Only the data you need
Logic-driven data requests. Call DB from the FaaS (IO) DB/KStreams KqlDB?
65. Summary
• Event Streaming provides the benefits of Event Sourcing to
microservices and data pipelines.
• Events are the data model.
• Projections are the serving model: matching to each specific use case
• Serving layer can be regenerated from the log (CQRS)
• KSQL provides the same benefits for event-driven programs: e.g.
preparing the event streams each FaaS application’s specific needs
• In serverless architectures this drives efficiency: a ‘database-
equivalent’ for event-driven infrastructure.
66. FaaSFaaSFaaSKSQL
Can I Build This?
FaaSFaaS
AWS Lambda /
Azure Functions Connectors
(in Preview)
Hosted KSQL In Preview
Confluent Cloud
67. Things I didn’t tell you
• Tools like KSQL provide data provisioning, not state mutation.
• Use single writers. Try KSQL DB?
• Can KSQL handle large state?
• Unintended rebalance can stall processing
• Static membership (KIP-345) – name the list of stream processors
• Increase the timeout for rebalance after node removal (group.max.session.timeout.ms)
• Worst case reload: RocksDB ~GbE speed
• Can Kafka be used for long term storage?
• Log files are immutable once they roll (unless compacted)
• Jun spent a decade working on DB2
• Careful:
• Historical reads can stall real-time requests (cached)
• ZFS has several page cache optimizations
• Tiered storage will help
68. Find out More
• Peeking Behind the Curtains of Serverless Platforms, Wang et al.
• Cloud Programming Simplified: A Berkeley View on Serverless Compute
• Neil Avery’s Journey to Event Driven Part 3. The Affinity Between Events, Streams and Serverless.
• Designing Event Driven Systems, Ben Stopford