Capture the Streams of Database Changes

Capture the Streams of
Database Changes
Randall Hauch
Founder of Debezium project
@rhauch

Apach Kafka™
2
Producers
Consumers
Apache Kafka Streams API
Apache Kafka Connect API
DB

Change Data Capture Connectors
3
See the list at https://www.confluent.io/product/connectors/

Apache Kafka™
Why capture streams of data changes?
4
DB
Application

Streaming data replication
5
DB
Apache Kafka™
DB2

Streaming analytics and machine learning
6
DB
…
Apache Kafka™

Streaming ETL
7
DB2
Extract Transform Load
DB
Apache Kafka™

Shared data in a microservice architecture
8
Bounded context
DB A
Service A
Apache Kafka™
changes changes changes
other
data
other
data
other
data
Bounded context
DB B
Service B
Bounded context
DB C
Service C
materialized
views
materialized
views
materialized
views

Deconstructed applications
9
DB
Application
Cache
Indexes
Cache
Indexes
DB
Apache Kafka™
CacheIndexes
Application
(dual writes!)

Kafka
Consumers
How do we get a stream of data changes?
10
DB
Application
?

Apache Kafka™
Consumers
11
Modify the app to
write out events?
DB
Application
Application 2 Application 3
What about the
other apps that
change data?
Dual writes?!

Apache Kafka™
Consumers
12
Or we can watch the database
DB
Application
Need a connector to do this
Just install, configure and run it,
and it will adapt
No need to change our apps!
Change data capture!
Kafka Connect
Connector

Databases 101
13
insert row 1
insert row 2
update row 1
insert row 3
delete row 2
insert row 4
update row 2
• Applications modify rows in transactions
• DBMS records the changes in a log,
then updates the tables
• DBMS uses log for recovery, replication, …
- MySQL binlog
- MongoDB oplog
- PostgreSQL WAL
• We can (try to) use the log for CDC*
Application
*mileage may vary

Change Data Capture (CDC) at work
14
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream

15
Table Stream

16
Table Stream Table*

17
Table Stream Table*

Stream-Table Duality
18
We can view a table as a stream
and
We can view a stream as a table

19
Table Stream Table*

What does a change event look like?
20
• Primary/unique key of the row
• Kind of operation: insert, update, delete
• State of the row after the changes
• State of the row before the changes
• Source-specific provenance metadata
- location in the log
- database name, table name
- transaction ID, source timestamp, …
• Capture timestamp

What does a change event look like?
21
• Key
- Primary/unique key of the row
• Value
- Operation
- State of the row after the changes
- State of the row before the changes (if available)
- Source-specific provenance metadata
- Capture timestamp
• Timestamp
This maps perfectly to a Kafka message!

Single Message Transforms
22
• Simple transformations for a single message
• Defined as part of Kafka Connect
- Some useful transforms provided in-the-box
- Easily implement your own
• Optionally deploy 1+ transforms with each connector
- Modify messages produced by source connector
- Modify messages sent to sink connectors
• Makes it much easier to mix and match connectors

Connectors started long after DBs were created
23
• Databases don’t keep all past changes
- The logs are not kept indefinitely
• So CDC connectors often start by taking an initial snapshot
- Capture initial state of every row at that time
- Then capture and apply changes committed after initial copy started
- Transition can be tricky, but is easier if changes are idempotent
- Must handle failure at any point
• Consumers are eventually consistent with upstream sources
- More sophisticated consumers might process source transactions

Debezium connectors
24
• MySQL connector
- Multiple MySQL topologies
- GTIDs, DDL and DML, table filters, events mirror table structures
• MongoDB connector
- Replica set or sharded cluster
- Only insert events have “after” state; others have patch operation
• PostgreSQL connector
- Provides server-side logical decoding plugin
- Table filters, events mirror table structures
• SQL Server and Oracle connectors coming next

Using Debezium + Kafka Connect
25
MySQL

26
Apache Kafka™
MySQL
• Use existing Kafka cluster

27
Apache Kafka™Kafka Connect
MySQL
• Start Kafka Connect cluster

28
MySQL
MySQL
Connector
• Deploy Debezium connector(s)

29
MySQL
• Deploy Debezium connector(s), begin snapshot
MySQL
Connector

• Deploy Debezium connector(s), begin snapshot, capture changes
30
MySQL
MySQL
Connector

• Consume change events
31
MySQL
Consumers
Consumers
Consumers
MySQL
Connector

• Pause, undeploy, or redeploy connector at any time
32
MySQL
Consumers
Consumers
Consumers
MySQL
Connector

• Pause, undeploy, or redeploy connector at any time
• Consumers will keep consuming or block until there are more events
33
MySQL
Consumers
Consumers
Consumers
MySQL
Connector

34
Kafka Connect
MySQL
ConnectorMySQL
PostgreSQL
ConnectorPostgreSQL
MySQL
Connector
MySQL
MySQL
Connector
Consumers
Consumers
Consumers
Consumers
Consumers
Consumers
Consumers

DB2
Kafka Connect
Sink
Connector
Create data pipelines for data you already have
36
DB1
Extract
Kafka Streams
Transform Load
Kafka Connect
Source
Connector

37
DB1
DB2
Extract
Kafka Streams
Transform Load
Kafka Connect
Source
Connector
Kafka Connect
Sink
Connector
DB2
Kafka Streams Kafka Connect
Sink
Connector

ApplicationsApplications
38
DB1 DB2
Kafka Streams
Kafka Connect
Source
Connector
Kafka Connect
Sink
Connector
DB2
Kafka Streams Kafka Connect
Sink
Connector
Applications
&
Frameworks

Summary
39
• Just configure and deploy connectors - no custom code!
• Continuously captures changes with low latency and without batching
• Fault tolerant
- failures only cause a delay in processing
- still process events at least once
- avoid dual-write problems
• Use stream processing to combine/merge/join multiple low-level events
• CDC is more complex, but amortize across multiple systems
• Works with limited DBMSes (for now) that have APIs for CDC

Interested? Want to contribute?
40
debezium.io
@debezium

Capture the Streams of Database Changes

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Capture the Streams of Database Changes

Similar to Capture the Streams of Database Changes (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

Capture the Streams of Database Changes