8. Shared data in a microservice architecture
8
Bounded context
DB A
Service A
Apache Kafka™
changes changes changes
other
data
other
data
other
data
Bounded context
DB B
Service B
Bounded context
DB C
Service C
materialized
views
materialized
views
materialized
views
11. Apache Kafka™
Consumers
How do we get a stream of data changes?
11
Modify the app to
write out events?
DB
Application
Application 2 Application 3
What about the
other apps that
change data?
Dual writes?!
12. Apache Kafka™
Consumers
How do we get a stream of data changes?
12
Or we can watch the database
DB
Application
Need a connector to do this
Just install, configure and run it,
and it will adapt
No need to change our apps!
Change data capture!
Kafka Connect
Connector
13. Databases 101
13
insert row 1
insert row 2
update row 1
insert row 3
delete row 2
insert row 4
update row 2
• Applications modify rows in transactions
• DBMS records the changes in a log,
then updates the tables
• DBMS uses log for recovery, replication, …
- MySQL binlog
- MongoDB oplog
- PostgreSQL WAL
• We can (try to) use the log for CDC*
Application
*mileage may vary
14. Change Data Capture (CDC) at work
14
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream
15. Change Data Capture (CDC) at work
15
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream
16. Change Data Capture (CDC) at work
16
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream Table*
17. Change Data Capture (CDC) at work
17
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream Table*
19. Change Data Capture (CDC) at work
19
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream Table*
20. What does a change event look like?
20
• Primary/unique key of the row
• Kind of operation: insert, update, delete
• State of the row after the changes
• State of the row before the changes
• Source-specific provenance metadata
- location in the log
- database name, table name
- transaction ID, source timestamp, …
• Capture timestamp
21. What does a change event look like?
21
• Key
- Primary/unique key of the row
• Value
- Operation
- State of the row after the changes
- State of the row before the changes (if available)
- Source-specific provenance metadata
- Capture timestamp
• Timestamp
This maps perfectly to a Kafka message!
22. Single Message Transforms
22
• Simple transformations for a single message
• Defined as part of Kafka Connect
- Some useful transforms provided in-the-box
- Easily implement your own
• Optionally deploy 1+ transforms with each connector
- Modify messages produced by source connector
- Modify messages sent to sink connectors
• Makes it much easier to mix and match connectors
23. Connectors started long after DBs were created
23
• Databases don’t keep all past changes
- The logs are not kept indefinitely
• So CDC connectors often start by taking an initial snapshot
- Capture initial state of every row at that time
- Then capture and apply changes committed after initial copy started
- Transition can be tricky, but is easier if changes are idempotent
- Must handle failure at any point
• Consumers are eventually consistent with upstream sources
- More sophisticated consumers might process source transactions
24. Debezium connectors
24
• MySQL connector
- Multiple MySQL topologies
- GTIDs, DDL and DML, table filters, events mirror table structures
• MongoDB connector
- Replica set or sharded cluster
- Only insert events have “after” state; others have patch operation
• PostgreSQL connector
- Provides server-side logical decoding plugin
- Table filters, events mirror table structures
• SQL Server and Oracle connectors coming next
26. Using Debezium + Kafka Connect
26
Apache Kafka™
MySQL
• Use existing Kafka cluster
27. Using Debezium + Kafka Connect
27
Apache Kafka™Kafka Connect
MySQL
• Use existing Kafka cluster
• Start Kafka Connect cluster
28. Using Debezium + Kafka Connect
28
Apache Kafka™Kafka Connect
MySQL
MySQL
Connector
• Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s)
29. Using Debezium + Kafka Connect
29
Apache Kafka™Kafka Connect
MySQL
• Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s), begin snapshot
MySQL
Connector
30. • Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s), begin snapshot, capture changes
Using Debezium + Kafka Connect
30
Apache Kafka™Kafka Connect
MySQL
MySQL
Connector
31. • Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s), begin snapshot, capture changes
• Consume change events
Using Debezium + Kafka Connect
31
Apache Kafka™Kafka Connect
MySQL
Consumers
Consumers
Consumers
MySQL
Connector
32. • Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s), begin snapshot, capture changes
• Pause, undeploy, or redeploy connector at any time
Using Debezium + Kafka Connect
32
Apache Kafka™Kafka Connect
MySQL
Consumers
Consumers
Consumers
MySQL
Connector
33. • Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s), begin snapshot, capture changes
• Pause, undeploy, or redeploy connector at any time
• Consumers will keep consuming or block until there are more events
Using Debezium + Kafka Connect
33
Apache Kafka™Kafka Connect
MySQL
Consumers
Consumers
Consumers
MySQL
Connector
34. Using Debezium + Kafka Connect
34
Kafka Connect
Apache Kafka™Kafka Connect
MySQL
ConnectorMySQL
PostgreSQL
ConnectorPostgreSQL
MySQL
Connector
MySQL
MySQL
Connector
Consumers
Consumers
Consumers
Consumers
Consumers
Consumers
Consumers
36. Create data pipelines for data you already have
37
DB1
DB2
Extract
Kafka Streams
Transform Load
Kafka Connect
Source
Connector
Kafka Connect
Sink
Connector
DB2
Kafka Streams Kafka Connect
Sink
Connector
37. ApplicationsApplications
Create data pipelines for data you already have
38
DB1 DB2
Kafka Streams
Kafka Connect
Source
Connector
Kafka Connect
Sink
Connector
DB2
Kafka Streams Kafka Connect
Sink
Connector
Applications
&
Frameworks
38. Summary
39
• Just configure and deploy connectors - no custom code!
• Continuously captures changes with low latency and without batching
• Fault tolerant
- failures only cause a delay in processing
- still process events at least once
- avoid dual-write problems
• Use stream processing to combine/merge/join multiple low-level events
• CDC is more complex, but amortize across multiple systems
• Works with limited DBMSes (for now) that have APIs for CDC