Apache Kafka is a popular distributed streaming data platform. A Kafka cluster stores streams of records (messages) in categories called topics. It is the architectural backbone of modern data analytics. Data flowing into Kafka often originates from native data streams such as social media streams, telemetry data, financial transactions and many others. But these data streams only contain part of the information. A lot of data necessary in stream processing is stored in traditional systems backed by relational databases. To implement new and modern, real-time solutions, an up-to-date view of that information is needed. So how do we make sure that information can flow between the RDMBS and Kafka, so that changes are available in Kafka as soon as possible in near-real-time? It this session, we present different approaches for integrating relational databases with Kafka, such as Kafka Connect, Oracle GoldenGate and bridging Kafka with Oracle Advanced Queuing (AQ).
3. gschmutz
Guido Schmutz
Bi-directional integration between Oracle RDBMS & Apache Kafka
Working at Trivadis for more than 22 years
Oracle GroundbreakerAmbassador& Oracle ACE Director
Consultant, Trainer Software Architect forJava, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
TechnologyManager @ Trivadis
More than 30 years of software developmentexperience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare:http://www.slideshare.net/gschmutz
Twitter: gschmutz
145th edition
6. gschmutz
Hadoop Clusterd
Hadoop Cluster
Big Data
Unified Architecture for Modern Data Analytics Solutions
Bi-directional integration between Oracle RDBMS & Apache Kafka
SQL
Search
BI Tools
Enterprise Data
Warehouse
Search / Explore
File Import / SQL Import
Event
Hub
Parallel
Processing
Storage
Storage
RawRefined
Results
Microservice State
{ }
API
Stream
Processor
State
{ }
API
Event
Stream
Event
Stream
Service
Stream Analytics
Microservices
Enterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Telemetry
Event Stream
7. gschmutz
Hadoop Clusterd
Hadoop Cluster
Big Data
Unified Architecture for Modern Data Analytics Solutions
Bi-directional integration between Oracle RDBMS & Apache Kafka
SQL
Search
BI Tools
Enterprise Data
Warehouse
Search / Explore
File Import / SQL Import
Event
Hub
Parallel
Processing
Storage
Storage
RawRefined
Results
Microservice State
{ }
API
Stream
Processor
State
{ }
API
Event
Stream
Event
Stream
Service
Stream Analytics
Microservices
Enterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Telemetry
Event Stream
8. gschmutz
Highly available, Pub/Sub infrastructure
Highly Scalable
Event Hub: Apache Kafka
Distributed Log at the Core
Logs do not (necessarily) forget
• Never
• Time (TTL) or Size-based
• Log-Compacted based
Bi-directional integration between Oracle RDBMS & Apache Kafka
9. gschmutz
Use Case
Customer Microservice
{
}
Customer API CustomerCustomer Logic
Order Processing System
{
}
Order API OrderOrder Logic
REST
REST
Event Hub
Customer
Mat View
Order
Customer
(compacted)
Notification Microservice
Notification Logic
Schema
Registry
Bi-directional integration between Oracle RDBMS & Apache Kafka
11. gschmutz
Blueprints Apache Kafka => Oracle RDBMS (K-DB)
Customer Microservice
{
}
Customer API CustomerCustomer Logic
Order Processing System
{
}
Order API OrderOrder Logic
REST
REST
Event Hub
Customer
Mat View
Order
(compacted)
Customer
(compacted)
Notification Microservice
Notification Logic
Schema
Registry
DB-K_1: Regular Polling ofRDBMS table/view
DB-K_2: Regular Polling ofRDBMS API
DB-K_3: Change Data Capture (CDC) on RDBMS
DB-K_4: Produce Messages to Event Hub from RDBMS
DB-K_5: Queuing on RDBMS to bridge to Event Hub
DB_K-1
DB_K-2
DB_K-3
DB_K-4
DB_K-5
Bi-directional integration between Oracle RDBMS & Apache Kafka
12. gschmutz
DB-K_1: Regular Polling of RDBMS table/view
Bi-directional integration between Oracle RDBMS & Apache Kafka
Event
Hub
Stream Data
Integration
API
Applications / Data Sources
Data FlowRDBMS
Application
Logic
Stream Data
Integration &
Analytics
Stream
Analytics
Data Flow
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
13. gschmutz
DB-K_1: Kafka Connect with JDBC Polling
Bi-directional integration between Oracle RDBMS & Apache Kafka
Event
Hub
Stream Data
Integration
API
Applications / Data Sources
Data FlowRDBMS
Application
Logic
Stream Data
Integration &
Analytics
Stream
Analytics
Data Flow
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
14. gschmutz
Stream Data Integration: Kafka Connect
Bi-directional integration between Oracle RDBMS & Apache Kafka
• Single Message Transforms (SMT)
allow simple transformations
• Connectors available from Confluent
as well as the community,
check hub.confluent.io for the list
curl -X "POST"
"http://192.168.69.138:8083/connectors"
-H "Content-Type: application/json"
-d $'{
"name": "mqtt-source",
"config": {
"connector.class": ”...MqttSourceConnector",
"tasks.max": "1",
"name": "mqtt-source",
"mqtt.server.uri": "tcp://mosquitto:1883",
"mqtt.topics": "truck/+/position",
"kafka.topic":"truck_position",
}
}'
• declarative style data flows
• simplicity - “simple things done simple”
• very well integrated with Kafka –
Kafka Connect Framework is part of
Apache Kafka
16. gschmutz
DB-K_2: Regular Polling of RDBMS API
Bi-directional integration between Oracle RDBMS & Apache Kafka
Event
Hub
Stream Data
Integration
API
Applications / Data Sources
Data Flow
RDBMS
Application
Logic
API
Stream Data
Integration &
Analytics
Stream
Analytics
Data Flow
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
17. gschmutz
DB-K_2: Kafka Connect & Oracle Rest Data Service
Bi-directional integration between Oracle RDBMS & Apache Kafka
Event
Hub
Stream Data
Integration
API
Applications / Data Sources
Data Flow
RDBMS
Application
Logic
API
Stream Data
Integration &
Analytics
Stream
Analytics
Data Flow
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
18. gschmutz
Oracle REST Data Services (ORDS)
Bi-directional integration between Oracle RDBMS & Apache Kafka
• makes it easy to develop modern REST interfaces for relational data in the Oracle
Database and the Oracle Database 18c JSON Document Store
• ORDS maps HTTP(S) verbs (GET, POST, PUT, DELETE, etc.) to database
transactions and returns any results formatted using JSON
• Java middle tier application on WebLogic, Tomcat, Docker, Standalone (for
development)
20. gschmutz
DB-K_2 – Setup ORDS (II)
Bi-directional integration between Oracle RDBMS & Apache Kafka
ORDS.DEFINE_HANDLER(
p_module_name => 'order_processing',
p_pattern => 'changes/:offset',
p_method => 'GET',
p_source_type => 'resource/lob',
p_items_per_page => 25,
p_comments => NULL,
p_source =>
'SELECT ''application/json'', json_object(''orderId'' VALUE po.id,
''orderDate'' VALUE po.order_date,
''orderMode'' VALUE po.order_mode,
''customer'' VALUE
json_object(''firstName'' VALUE cu.first_name,
''lastName'' VALUE cu.last_name),
''lineItems'' VALUE (SELECT json_arrayagg(
json_object(''ItemNumber'' VALUE li.id,
''Product'' VALUE
json_object(''productId'' VALUE li.product_id,
''unitPrice'' VALUE li.unit_price),
''quantity'' VALUE li.quantity))
FROM order_item_t li WHERE po.id = li.order_id),
''offset'' VALUE TO_CHAR(po.modified_at, ''YYYYMMDDHH24MISS''))
FROM order_t po LEFT JOIN customer_t cu ON (po.customer_id = cu.id)
WHERE po.modified_at > TO_DATE(:offset, ''YYYYMMDDHH24MISS'')'
21. gschmutz
Stream Data Integration: StreamSets
Continuous open source, intent-driven,
big data ingest
Visible, record-oriented approach fixes
combinatorial explosion
Both stream and batch processing
• Standalone, Spark cluster, MapReduce
cluster
IDE for pipeline development by ‘civilians’
special option for Edge computing
custom sources, sinks, processors
Supported by StreamSets
23. gschmutz
DB-K_3: Change Data Capture (CDC) on RDBMS
Bi-directional integration between Oracle RDBMS & Apache Kafka
Stream Data
Integration &
Analytics
Stream
Analytics
Event
Hub
Stream Data
Integration
API
Data Flow
Application / Data Sources
Data Flow
Application
Logic
RDBMS
Redo Log
REST to
Event Hub
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
24. gschmutz
DB-K_3: Oracle GoldenGate and Kafka Connect / Rest
Bi-directional integration between Oracle RDBMS & Apache Kafka
Stream Data
Integration &
Analytics
Stream
Analytics
Event
Hub
Stream Data
Integration
API
Data Flow
Application / Data Sources
Data Flow
Application
Logic
RDBMS
Redo Log
REST to
Event Hub
Rest Proxy
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
25. gschmutz
DB-K_3: Debezium and Kafka Connect
Bi-directional integration between Oracle RDBMS & Apache Kafka
Stream Data
Integration &
Analytics
Stream
Analytics
Event
Hub
Stream Data
Integration
API
Data Flow
Application / Data Sources
Data Flow
Application
Logic
RDBMS
Redo Log
REST to
Event Hub
Alternatives:
StreamSets Data Collector
Attunity
…
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
26. gschmutz
DB-K_4: Produce Messages to Event Hub from RDBMS
Bi-directional integration between Oracle RDBMS & Apache Kafka
Event
Hub
Stream Data
Integration
API
Applications / Data Sources
RDBMS
Application
Logic
API
Stream Data
Integration &
Analytics
Stream
Analytics
Data Flow
REST to
Event Hub
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
27. gschmutz
DB-K_4: Produce Messages to REST Proxy
Bi-directional integration between Oracle RDBMS & Apache Kafka
Event
Hub
Stream Data
Integration
API
Applications / Data Sources
RDBMS
Application
Logic
API
Stream Data
Integration &
Analytics
Stream
Analytics
Data Flow
REST to
Event Hub
Rest Proxy
?
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
28. gschmutz
DB-K_4: Oracle Big Data SQL writes to Kafka topic
Bi-directional integration between Oracle RDBMS & Apache Kafka
Event
Hub
Stream Data
Integration
API
Applications / Data Sources
RDBMS
Application
Logic
API
Stream Data
Integration &
Analytics
Stream
Analytics
Data Flow
REST to
Event Hub
Oracle Big Data SQL
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
29. gschmutz
DB-K_5: Queuing on RDBMS to bridge to Event Hub
Bi-directional integration between Oracle RDBMS & Apache Kafka
Stream Data
Integration &
Analytics
Stream
Analytics
Event
Hub
Stream Data
Integration
API
Data Flow
Application / Data Sources
Data Flow
Application
Logic
RDBMS
Queue
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
30. gschmutz
DB-K_5: Oracle Advanced Queuing & Kafka Connect JMS
Bi-directional integration between Oracle RDBMS & Apache Kafka
Stream Data
Integration &
Analytics
Stream
Analytics
Event
Hub
Stream Data
Integration
API
Data Flow
Application / Data Sources
Data Flow
Application
Logic
RDBMS
QueueAQ
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
35. gschmutz
Blueprints Apache Kafka => Oracle RDBMS (K-DB)
Bi-directional integration between Oracle RDBMS & Apache Kafka
Customer Microservice
{
}
Customer API CustomerCustomer Logic
Order Processing System
{
}
Order API OrderOrder Logic
REST
REST
Event Hub
Customer
Mat View
Order
(compacted)
Customer
(compacted)
Notification Microservice
Notification Logic
Schema
Registry
DB-K_1: Regular Polling ofRDBMS table/view
DB-K_2: Regular Polling ofRDBMS API
DB-K_3: Change Data Capture (CDC) on RDBMS
DB-K_4: Produce Messages to Event Hub from RDBMS
DB-K_5: Queuing on RDBMS to bridge to Event Hub
DB_K-1
DB_K-2
DB_K-3
DB_K-4
DB_K-5
K-DB_1: Write directly to RDBMS table/view
K-DB_2: Write over RDBMS API
K-DB_3: Consume from Event Hub
K-DB_4: Queuing on RDBMS to bridge with Event Hub
K_DB-1
K_DB-2
K_DB-3
K_DB-4
36. gschmutz
K-DB_1: Write directly to RDBMS table/view
Bi-directional integration between Oracle RDBMS & Apache Kafka
Event
Hub
Stream Data
Integration
API
Applications / Data Sources
Data FlowRDBMS
Application
Logic
Stream Data
Integration &
Analytics
Stream
Analytics
Data Flow
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
37. gschmutz
K-DB_1: Write to RDBMS table/view
Bi-directional integration between Oracle RDBMS & Apache Kafka
Event
Hub
Stream Data
Integration
API
Applications / Data Sources
Data FlowRDBMS
Application
Logic
Stream Data
Integration &
Analytics
Stream
Analytics
Data Flow
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
38. gschmutz
K-DB_2: Write over RDBMS API
Bi-directional integration between Oracle RDBMS & Apache Kafka
Event
Hub
Stream Data
Integration
API
Applications / Data Sources
Data Flow
RDBMS
Application
Logic
API
Stream Data
Integration &
Analytics
Stream
Analytics
Data Flow
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
39. gschmutz
K-DB_2: Write over Oracle REST Database service
Bi-directional integration between Oracle RDBMS & Apache Kafka
Event
Hub
Stream Data
Integration
API
Applications / Data Sources
Data Flow
RDBMS
Application
Logic
API
Stream Data
Integration &
Analytics
Stream
Analytics
Data Flow
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
41. gschmutz
K-DB_3: Consume from Event Hub
Bi-directional integration between Oracle RDBMS & Apache Kafka
Event
Hub
Stream Data
Integration
API
Applications / Data Sources
RDBMS
Application
Logic
API
Stream Data
Integration &
Analytics
Stream
Analytics
Data Flow
REST to
Event Hub
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
42. gschmutz
K-DB_3: Oracle Big Data SQL exposes topic as table
Bi-directional integration between Oracle RDBMS & Apache Kafka
Event
Hub
Stream Data
Integration
API
Applications / Data Sources
RDBMS
Application
Logic
API
Stream Data
Integration &
Analytics
Stream
Analytics
Data Flow
REST to
Event Hub
Oracle Big Data SQL
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
43. gschmutz
K-DB_4: Queuing on RDBMS to bridge with Event Hub
Bi-directional integration between Oracle RDBMS & Apache Kafka
Stream Data
Integration &
Analytics
Stream
Analytics
Event
Hub
Stream Data
Integration
API
Data Flow
Application / Data Sources
Data Flow
Application
Logic
RDBMS
Queue
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
44. gschmutz
K-DB_4: Use AQ on RDBMS to bridge with Event Hub
Bi-directional integration between Oracle RDBMS & Apache Kafka
Stream Data
Integration &
Analytics
Stream
Analytics
Event
Hub
Stream Data
Integration
API
Data Flow
Application / Data Sources
Data Flow
Application
Logic
RDBMS
QueueAQ
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
45. gschmutz
K-DB_4: Use AQ on RDBMS to bridge with Event Hub
Bi-directional integration between Oracle RDBMS & Apache Kafka
Stream Data
Integration &
Analytics
Stream
Analytics
Event
Hub
Stream Data
Integration
API
Data Flow
Application / Data Sources
Data Flow
Application
Logic
RDBMS
Queue
Flat Aggregate
Low Latency High Latency
Tight coupling Loose Coupling
Message
Latency
Coupling
MirrorMaker
AQ (Kafka API)
47. gschmutz
Demo
Customer Microservice
{
}
Customer API CustomerCustomer Logic
Order Processing System
{
}
Order API OrderOrder Logic
REST
REST
Event Hub
Customer
Mat View
Order
(compacted)
Customer
(compacted)
Notification Microservice
Notification Logic
Schema
Registry
K-DB_1: Write directly to RDBMS table/view
K-DB_2: Write over RDBMS API
K-DB_3: Consume from Event Hub
K-DB_4: Queuing on RDBMS to bridge with Event Hub
K_DB-1
K_DB-2
K_DB-3
K_DB-4
DB-K_1: Regular Polling ofRDBMS table/view
DB-K_2: Regular Polling ofRDBMS API
DB-K_3: Change Data Capture (CDC) on RDBMS
DB-K_4: Produce Messages to Event Hub from RDBMS
DB-K_5: Queuing on RDBMS to bridge to Event Hub
DB_K-1
DB_K-2
DB_K-3
DB_K-4
DB_K-5
Bi-directional integration between Oracle RDBMS & Apache Kafka