SlideShare a Scribd company logo
1 of 40
Capture the Streams of
Database Changes
Randall Hauch
Founder of Debezium project
@rhauch
Apach Kafka™
2
Producers
Consumers
Apache Kafka Streams API
Apache Kafka Connect API
DB
Change Data Capture Connectors
3
See the list at https://www.confluent.io/product/connectors/
Apache Kafka™
Why capture streams of data changes?
4
DB
Application
Streaming data replication
5
DB
Apache Kafka™
DB2
Streaming analytics and machine learning
6
DB
…
Apache Kafka™
Streaming ETL
7
DB2
Extract Transform Load
DB
Apache Kafka™
Shared data in a microservice architecture
8
Bounded context
DB A
Service A
Apache Kafka™
changes changes changes
other
data
other
data
other
data
Bounded context
DB B
Service B
Bounded context
DB C
Service C
materialized
views
materialized
views
materialized
views
Deconstructed applications
9
DB
Application
Cache
Indexes
Cache
Indexes
DB
Apache Kafka™
CacheIndexes
Application
(dual writes!)
Kafka
Consumers
How do we get a stream of data changes?
10
DB
Application
?
Apache Kafka™
Consumers
How do we get a stream of data changes?
11
Modify the app to
write out events?
DB
Application
Application 2 Application 3
What about the
other apps that
change data?
Dual writes?!
Apache Kafka™
Consumers
How do we get a stream of data changes?
12
Or we can watch the database
DB
Application
Need a connector to do this
Just install, configure and run it,
and it will adapt
No need to change our apps!
Change data capture!
Kafka Connect
Connector
Databases 101
13
insert row 1
insert row 2
update row 1
insert row 3
delete row 2
insert row 4
update row 2
• Applications modify rows in transactions
• DBMS records the changes in a log,
then updates the tables
• DBMS uses log for recovery, replication, …
- MySQL binlog
- MongoDB oplog
- PostgreSQL WAL
• We can (try to) use the log for CDC*
Application
*mileage may vary
Change Data Capture (CDC) at work
14
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream
Change Data Capture (CDC) at work
15
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream
Change Data Capture (CDC) at work
16
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream Table*
Change Data Capture (CDC) at work
17
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream Table*
Stream-Table Duality
18
We can view a table as a stream
and
We can view a stream as a table
Change Data Capture (CDC) at work
19
• Read the changes from the database
- Using the log or API
- This is the hardest part
• Write them in the same order
• Don’t miss any changes
- Okay, this is hard, too
Table Stream Table*
What does a change event look like?
20
• Primary/unique key of the row
• Kind of operation: insert, update, delete
• State of the row after the changes
• State of the row before the changes
• Source-specific provenance metadata
- location in the log
- database name, table name
- transaction ID, source timestamp, …
• Capture timestamp
What does a change event look like?
21
• Key
- Primary/unique key of the row
• Value
- Operation
- State of the row after the changes
- State of the row before the changes (if available)
- Source-specific provenance metadata
- Capture timestamp
• Timestamp
This maps perfectly to a Kafka message!
Single Message Transforms
22
• Simple transformations for a single message
• Defined as part of Kafka Connect
- Some useful transforms provided in-the-box
- Easily implement your own
• Optionally deploy 1+ transforms with each connector
- Modify messages produced by source connector
- Modify messages sent to sink connectors
• Makes it much easier to mix and match connectors
Connectors started long after DBs were created
23
• Databases don’t keep all past changes
- The logs are not kept indefinitely
• So CDC connectors often start by taking an initial snapshot
- Capture initial state of every row at that time
- Then capture and apply changes committed after initial copy started
- Transition can be tricky, but is easier if changes are idempotent
- Must handle failure at any point
• Consumers are eventually consistent with upstream sources
- More sophisticated consumers might process source transactions
Debezium connectors
24
• MySQL connector
- Multiple MySQL topologies
- GTIDs, DDL and DML, table filters, events mirror table structures
• MongoDB connector
- Replica set or sharded cluster
- Only insert events have “after” state; others have patch operation
• PostgreSQL connector
- Provides server-side logical decoding plugin
- Table filters, events mirror table structures
• SQL Server and Oracle connectors coming next
Using Debezium + Kafka Connect
25
MySQL
Using Debezium + Kafka Connect
26
Apache Kafka™
MySQL
• Use existing Kafka cluster
Using Debezium + Kafka Connect
27
Apache Kafka™Kafka Connect
MySQL
• Use existing Kafka cluster
• Start Kafka Connect cluster
Using Debezium + Kafka Connect
28
Apache Kafka™Kafka Connect
MySQL
MySQL
Connector
• Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s)
Using Debezium + Kafka Connect
29
Apache Kafka™Kafka Connect
MySQL
• Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s), begin snapshot
MySQL
Connector
• Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s), begin snapshot, capture changes
Using Debezium + Kafka Connect
30
Apache Kafka™Kafka Connect
MySQL
MySQL
Connector
• Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s), begin snapshot, capture changes
• Consume change events
Using Debezium + Kafka Connect
31
Apache Kafka™Kafka Connect
MySQL
Consumers
Consumers
Consumers
MySQL
Connector
• Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s), begin snapshot, capture changes
• Pause, undeploy, or redeploy connector at any time
Using Debezium + Kafka Connect
32
Apache Kafka™Kafka Connect
MySQL
Consumers
Consumers
Consumers
MySQL
Connector
• Use existing Kafka cluster
• Start Kafka Connect cluster
• Deploy Debezium connector(s), begin snapshot, capture changes
• Pause, undeploy, or redeploy connector at any time
• Consumers will keep consuming or block until there are more events
Using Debezium + Kafka Connect
33
Apache Kafka™Kafka Connect
MySQL
Consumers
Consumers
Consumers
MySQL
Connector
Using Debezium + Kafka Connect
34
Kafka Connect
Apache Kafka™Kafka Connect
MySQL
ConnectorMySQL
PostgreSQL
ConnectorPostgreSQL
MySQL
Connector
MySQL
MySQL
Connector
Consumers
Consumers
Consumers
Consumers
Consumers
Consumers
Consumers
DB2
Kafka Connect
Sink
Connector
Create data pipelines for data you already have
36
DB1
Extract
Kafka Streams
Transform Load
Kafka Connect
Source
Connector
Create data pipelines for data you already have
37
DB1
DB2
Extract
Kafka Streams
Transform Load
Kafka Connect
Source
Connector
Kafka Connect
Sink
Connector
DB2
Kafka Streams Kafka Connect
Sink
Connector
ApplicationsApplications
Create data pipelines for data you already have
38
DB1 DB2
Kafka Streams
Kafka Connect
Source
Connector
Kafka Connect
Sink
Connector
DB2
Kafka Streams Kafka Connect
Sink
Connector
Applications
&
Frameworks
Summary
39
• Just configure and deploy connectors - no custom code!
• Continuously captures changes with low latency and without batching
• Fault tolerant
- failures only cause a delay in processing
- still process events at least once
- avoid dual-write problems
• Use stream processing to combine/merge/join multiple low-level events
• CDC is more complex, but amortize across multiple systems
• Works with limited DBMSes (for now) that have APIs for CDC
Interested? Want to contribute?
40
debezium.io
@debezium
Thank you!

More Related Content

What's hot

Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...confluent
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®confluent
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registryconfluent
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin PodvalMartin Podval
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaJiangjie Qin
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Apache doris (incubating) introduction
Apache doris (incubating) introductionApache doris (incubating) introduction
Apache doris (incubating) introductionleanderlee2
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)DataWorks Summit
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 

What's hot (20)

Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
 
Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Stream Processing made simple with Kafka
Stream Processing made simple with KafkaStream Processing made simple with Kafka
Stream Processing made simple with Kafka
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Apache doris (incubating) introduction
Apache doris (incubating) introductionApache doris (incubating) introduction
Apache doris (incubating) introduction
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 
kafka
kafkakafka
kafka
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 

Similar to Capture the Streams of Database Changes

Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward
 
Data Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCData Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCAbhijit Kumar
 
Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)
Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)
Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)Serena Software
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseWill Gardella
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connectconfluent
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®confluent
 
Oracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
Oracle11g R2 - Edition Based Redefinition for On Line Application UpgradeOracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
Oracle11g R2 - Edition Based Redefinition for On Line Application UpgradeLucas Jellema
 
Evolutionary database design
Evolutionary database designEvolutionary database design
Evolutionary database designSalehein Syed
 
Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache FlinkFabian Hueske
 
Database Migrations with Gradle and Liquibase
Database Migrations with Gradle and LiquibaseDatabase Migrations with Gradle and Liquibase
Database Migrations with Gradle and LiquibaseDan Stine
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY StyleAthens Big Data
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Fieldconfluent
 
Editioning use in ebs
Editioning use in  ebsEditioning use in  ebs
Editioning use in ebspasalapudi123
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectKaufman Ng
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of stateYoni Farin
 

Similar to Capture the Streams of Database Changes (20)

Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
Data Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDCData Stream Processing for Beginners with Kafka and CDC
Data Stream Processing for Beginners with Kafka and CDC
 
Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)
Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)
Overview and Demonstration of Dimensions CM 14.2 (FUG presentation track 2)
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
Diving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka ConnectDiving into the Deep End - Kafka Connect
Diving into the Deep End - Kafka Connect
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Oracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
Oracle11g R2 - Edition Based Redefinition for On Line Application UpgradeOracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
Oracle11g R2 - Edition Based Redefinition for On Line Application Upgrade
 
Evolutionary database design
Evolutionary database designEvolutionary database design
Evolutionary database design
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache Flink
 
Database Migrations with Gradle and Liquibase
Database Migrations with Gradle and LiquibaseDatabase Migrations with Gradle and Liquibase
Database Migrations with Gradle and Liquibase
 
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
8th Athens Big Data Meetup - 1st Talk - Riding The Streaming Wave DIY Style
 
Riding the Streaming Wave DIY style
Riding the Streaming Wave  DIY styleRiding the Streaming Wave  DIY style
Riding the Streaming Wave DIY style
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
 
Editioning use in ebs
Editioning use in  ebsEditioning use in  ebs
Editioning use in ebs
 
Editioning use in ebs
Editioning use in  ebsEditioning use in  ebs
Editioning use in ebs
 
Data Pipelines with Kafka Connect
Data Pipelines with Kafka ConnectData Pipelines with Kafka Connect
Data Pipelines with Kafka Connect
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of state
 
SOA_BPM_12c_launch_event_SOA_track_deepdive_developerproductivityandperforman...
SOA_BPM_12c_launch_event_SOA_track_deepdive_developerproductivityandperforman...SOA_BPM_12c_launch_event_SOA_track_deepdive_developerproductivityandperforman...
SOA_BPM_12c_launch_event_SOA_track_deepdive_developerproductivityandperforman...
 

More from confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

More from confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Recently uploaded

Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Anthony Dahanne
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 

Recently uploaded (20)

Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 

Capture the Streams of Database Changes

  • 1. Capture the Streams of Database Changes Randall Hauch Founder of Debezium project @rhauch
  • 2. Apach Kafka™ 2 Producers Consumers Apache Kafka Streams API Apache Kafka Connect API DB
  • 3. Change Data Capture Connectors 3 See the list at https://www.confluent.io/product/connectors/
  • 4. Apache Kafka™ Why capture streams of data changes? 4 DB Application
  • 6. Streaming analytics and machine learning 6 DB … Apache Kafka™
  • 7. Streaming ETL 7 DB2 Extract Transform Load DB Apache Kafka™
  • 8. Shared data in a microservice architecture 8 Bounded context DB A Service A Apache Kafka™ changes changes changes other data other data other data Bounded context DB B Service B Bounded context DB C Service C materialized views materialized views materialized views
  • 10. Kafka Consumers How do we get a stream of data changes? 10 DB Application ?
  • 11. Apache Kafka™ Consumers How do we get a stream of data changes? 11 Modify the app to write out events? DB Application Application 2 Application 3 What about the other apps that change data? Dual writes?!
  • 12. Apache Kafka™ Consumers How do we get a stream of data changes? 12 Or we can watch the database DB Application Need a connector to do this Just install, configure and run it, and it will adapt No need to change our apps! Change data capture! Kafka Connect Connector
  • 13. Databases 101 13 insert row 1 insert row 2 update row 1 insert row 3 delete row 2 insert row 4 update row 2 • Applications modify rows in transactions • DBMS records the changes in a log, then updates the tables • DBMS uses log for recovery, replication, … - MySQL binlog - MongoDB oplog - PostgreSQL WAL • We can (try to) use the log for CDC* Application *mileage may vary
  • 14. Change Data Capture (CDC) at work 14 • Read the changes from the database - Using the log or API - This is the hardest part • Write them in the same order • Don’t miss any changes - Okay, this is hard, too Table Stream
  • 15. Change Data Capture (CDC) at work 15 • Read the changes from the database - Using the log or API - This is the hardest part • Write them in the same order • Don’t miss any changes - Okay, this is hard, too Table Stream
  • 16. Change Data Capture (CDC) at work 16 • Read the changes from the database - Using the log or API - This is the hardest part • Write them in the same order • Don’t miss any changes - Okay, this is hard, too Table Stream Table*
  • 17. Change Data Capture (CDC) at work 17 • Read the changes from the database - Using the log or API - This is the hardest part • Write them in the same order • Don’t miss any changes - Okay, this is hard, too Table Stream Table*
  • 18. Stream-Table Duality 18 We can view a table as a stream and We can view a stream as a table
  • 19. Change Data Capture (CDC) at work 19 • Read the changes from the database - Using the log or API - This is the hardest part • Write them in the same order • Don’t miss any changes - Okay, this is hard, too Table Stream Table*
  • 20. What does a change event look like? 20 • Primary/unique key of the row • Kind of operation: insert, update, delete • State of the row after the changes • State of the row before the changes • Source-specific provenance metadata - location in the log - database name, table name - transaction ID, source timestamp, … • Capture timestamp
  • 21. What does a change event look like? 21 • Key - Primary/unique key of the row • Value - Operation - State of the row after the changes - State of the row before the changes (if available) - Source-specific provenance metadata - Capture timestamp • Timestamp This maps perfectly to a Kafka message!
  • 22. Single Message Transforms 22 • Simple transformations for a single message • Defined as part of Kafka Connect - Some useful transforms provided in-the-box - Easily implement your own • Optionally deploy 1+ transforms with each connector - Modify messages produced by source connector - Modify messages sent to sink connectors • Makes it much easier to mix and match connectors
  • 23. Connectors started long after DBs were created 23 • Databases don’t keep all past changes - The logs are not kept indefinitely • So CDC connectors often start by taking an initial snapshot - Capture initial state of every row at that time - Then capture and apply changes committed after initial copy started - Transition can be tricky, but is easier if changes are idempotent - Must handle failure at any point • Consumers are eventually consistent with upstream sources - More sophisticated consumers might process source transactions
  • 24. Debezium connectors 24 • MySQL connector - Multiple MySQL topologies - GTIDs, DDL and DML, table filters, events mirror table structures • MongoDB connector - Replica set or sharded cluster - Only insert events have “after” state; others have patch operation • PostgreSQL connector - Provides server-side logical decoding plugin - Table filters, events mirror table structures • SQL Server and Oracle connectors coming next
  • 25. Using Debezium + Kafka Connect 25 MySQL
  • 26. Using Debezium + Kafka Connect 26 Apache Kafka™ MySQL • Use existing Kafka cluster
  • 27. Using Debezium + Kafka Connect 27 Apache Kafka™Kafka Connect MySQL • Use existing Kafka cluster • Start Kafka Connect cluster
  • 28. Using Debezium + Kafka Connect 28 Apache Kafka™Kafka Connect MySQL MySQL Connector • Use existing Kafka cluster • Start Kafka Connect cluster • Deploy Debezium connector(s)
  • 29. Using Debezium + Kafka Connect 29 Apache Kafka™Kafka Connect MySQL • Use existing Kafka cluster • Start Kafka Connect cluster • Deploy Debezium connector(s), begin snapshot MySQL Connector
  • 30. • Use existing Kafka cluster • Start Kafka Connect cluster • Deploy Debezium connector(s), begin snapshot, capture changes Using Debezium + Kafka Connect 30 Apache Kafka™Kafka Connect MySQL MySQL Connector
  • 31. • Use existing Kafka cluster • Start Kafka Connect cluster • Deploy Debezium connector(s), begin snapshot, capture changes • Consume change events Using Debezium + Kafka Connect 31 Apache Kafka™Kafka Connect MySQL Consumers Consumers Consumers MySQL Connector
  • 32. • Use existing Kafka cluster • Start Kafka Connect cluster • Deploy Debezium connector(s), begin snapshot, capture changes • Pause, undeploy, or redeploy connector at any time Using Debezium + Kafka Connect 32 Apache Kafka™Kafka Connect MySQL Consumers Consumers Consumers MySQL Connector
  • 33. • Use existing Kafka cluster • Start Kafka Connect cluster • Deploy Debezium connector(s), begin snapshot, capture changes • Pause, undeploy, or redeploy connector at any time • Consumers will keep consuming or block until there are more events Using Debezium + Kafka Connect 33 Apache Kafka™Kafka Connect MySQL Consumers Consumers Consumers MySQL Connector
  • 34. Using Debezium + Kafka Connect 34 Kafka Connect Apache Kafka™Kafka Connect MySQL ConnectorMySQL PostgreSQL ConnectorPostgreSQL MySQL Connector MySQL MySQL Connector Consumers Consumers Consumers Consumers Consumers Consumers Consumers
  • 35. DB2 Kafka Connect Sink Connector Create data pipelines for data you already have 36 DB1 Extract Kafka Streams Transform Load Kafka Connect Source Connector
  • 36. Create data pipelines for data you already have 37 DB1 DB2 Extract Kafka Streams Transform Load Kafka Connect Source Connector Kafka Connect Sink Connector DB2 Kafka Streams Kafka Connect Sink Connector
  • 37. ApplicationsApplications Create data pipelines for data you already have 38 DB1 DB2 Kafka Streams Kafka Connect Source Connector Kafka Connect Sink Connector DB2 Kafka Streams Kafka Connect Sink Connector Applications & Frameworks
  • 38. Summary 39 • Just configure and deploy connectors - no custom code! • Continuously captures changes with low latency and without batching • Fault tolerant - failures only cause a delay in processing - still process events at least once - avoid dual-write problems • Use stream processing to combine/merge/join multiple low-level events • CDC is more complex, but amortize across multiple systems • Works with limited DBMSes (for now) that have APIs for CDC
  • 39. Interested? Want to contribute? 40 debezium.io @debezium