Databus - LinkedIn's Change Data Capture Pipeline

Recruiting SolutionsRecruiting SolutionsRecruiting Solutions
Databus
LinkedIn’s Change Data Capture Pipeline
Databus Team @ LinkedIn
Sunil Nagaraj
http://www.linkedin.com/in/sunilnagaraj
Eventbrite
May 07 2013

Talking Points
 Motivation and Use-Cases
 Design Decisions
 Architecture
 Sample Code
 Performance
 Databus at LinkedIn
 Review

The Consequence of Specialization in Data Systems
Data Consistency is critical !!!
Data Flow is essential

Extract changes from
database commit log
Tough but possible
Consistent!!!
Application code dual
writes to database and
pub-sub system
Easy on the surface
Consistent?
Two Ways

Change Extract: Databus
5
Primary
Data Store
Data Change Events
Standar
dization
Standar
dization
Standar
dization
Standar
dization
Standar
dization
Search
Index
Standar
dization
Standar
dization
Graph
Index
Standar
dization
Standar
dization
Read
Replicas
Updates
Databus

Example: External Indexes
 Description
– Full-text and faceted search
over profile data
 Requirements
– Timeline consistency
– Guaranteed delivery
– Low latency
– User-space visibility
6
Members
Update
skills
Recruiters
Search
Results
Change events
linkedin.com recruiter.linkedin.com
People
Search Index
Databus

A brief history of Databus
 2006-2010 : Databus became an established and vital
piece of infrastructure for consistent data flow from
Oracle
 2011 : Databus (V2) addressed scalability and operability
issues
 2012 : Databus supported change capture from Espresso
 2013 : Open Source Databus
– https://github.com/linkedin/databus

Databus Eco-system: Participants
Primary
Data Store
Source Databus
Consumer
Application
Change
Data
Capture
Change Event
Stream
events
events
change
data
• Support
transactions
• Extract changed
data of committed
transactions
• Transform to ‘user-
space’ events
• Preserve atomicity
• Receive change
events quickly
• Preserve
consistency with
source

Databus Eco-System : Realities
Databases
Source Databus
Fast
Consumer
Applications
Change
Data
Capture
Change Event
Stream
Slow
Consumer
New
Consumer
Every
change
Changes
since last
week
Changes
since last 5
seconds
Schema
s evolve
• Source cannot be burdened by ‘long look back’
extracts
• Applications cannot be forced to move to
latest version of schema at once
change
data
events

Key Design Decisions : Semantics
 Change Data Capture uses logical clocks attached to the
source (SCN)
– Change data stream is ordered by SCN
– Simplifies data portability , change stream is f(SourceState,SCN)
 Applications are idempotent
– At least once delivery
– Track progress reliably (SCN)
– Timeline consistency
10

Key Design Decisions : Systems
 Isolate fast consumers from slow consumers
– Workload separation between online(recent), catch-up (old),
bootstrap (all)
 Isolate sources from consumers
– Schema changes
– Physical layout changes
– Speed mismatch
 Schema-awareness
– Compatibility checks
– Filtering at change stream
11

The Components of Databus
12
DB
Change
Capture
Event Buffer
(In Memory)
change data
Consumer
Relay
Databus
Client
Application
online changes
Bootstrap
New
ApplicationConsistent
snapshot
Log Store
Snapshot
Store
online changes
Bootstrap
Consumer
older changes
Slow
Application
Metadata

Change Data Capture
 Contains logic to extract
changes from source from
specified SCN
 Implementations
– Oracle
 Trigger-based
 Commit ordering
 Special instrumentation required
– MySQL
 Custom-storage-engine based
EventProducer
start(SCN ) //capture changes from
specified SCN
SCN getSCN() //return latest SCN
Change Data Capture
SC
N
Database Schemas

MySQL : Change Data Capture
Databus 14
MySQL
Master
MySQL
Slave
MySql
replication
TCP
Channel
• MySQL Replication takes care of
• bin-log parsing
• Protocol between master and slave
• Handling restarts
• Relay
• Provides a TCP Protocol interface to push events
• Controls and Manages MySql Slave
Relay

Publish – Subscribe API
DB
Change
Data
Capture
Event Buffer
(In Memory)
publish
extract
(src,SCN)
Consumer
subscribe
(src,SCN)
EventBuffer
startEvents() //e.g. new txn
DbusEvent(enc(schema,changeData),src,pk)
appendEvent(DbusEvent, ...)
endEvents(SCN) //e.g. end of txn; commit
rollbackEvents() //abort this window
Consumer
register(source, ‘Callback’)
onStartConsumption() //once
onStartDataEventSequence(SCN)
onStartSource(src,Schema)
onDataEvent(DbusEvent e,…)
onEndSource(src,Schema)
onEndDataEventSequence(SCN)
onRollback(SCN)
onStopConsumption() //once

The Databus Change Event Stream
Event Buffer
(In Memory)
Relay
Bootstrap
Log Store
Snapshot
Store
online changes
• Provide APIs to obtain change events
• Query API specifies logical clock(SCN) and
source
• ‘Get change events greater than SCN’
• Filtering at source possible
• MOD, RANGE filter functions
applied to primary key of the event
• Batching/Chunking to guarantee
progress
• Does not contain state of consumers
• Contains references to metadata and
schemas
• Implementation
• HTTP server
• Persistent connection to clients
• REST API
Change Event Stream

Meta-data Management
 Event definition, serialization and transport
– Avro
 Oracle, MySQL
– Table schema generates Avro definition
 Schema evolution
– Only backwards-compatible changes allowed
 Isolation of applications from changes in source schema
 Many versions of a source used by applications , but one
version(latest) of the change stream exists

The Databus Relay
Change
Capture
Event Buffer
(In Memory)
Relay
Database Schemas
Src
Meta-
data
• Encapsulates change capture logic and
change event stream
• Source aware, schema aware
• Multi-tenant: Multiple Event Buffers
representing change events of different
databases
• Optimizations
• Index on SCN exists to quickly
locate physical offset in EventBuffer
• Locally stores SCN per source for
efficient restarts
• Large Event Buffers possible (> 2G)
SCN
store
API

Scaling Databus Relay
DB
Relay Relay Relay
• Peer relays, independent
• Increased load on the source
DB with each additional relay
instance
DB
Relay
Leader
Relay
(Follower
)
• Relays in leader-follower cluster
• Only the leader reads from DB ,
followers from leader
• Leadership assigned dynamically
• Small period of stream
unavailability during leadership
transfer
Relay
(Follower
)

The Bootstrap Service
 Bridges the continuum between stream and
batch systems
 Catch-all for slow / new consumers
 Isolate source instance from large scans
 Snapshot store has to be seeded once
 Optimizations
– Periodic merge
– Filtering pushed down to store
– Catch-up versus full bootstrap
 Guaranteed progress for consumers via
chunking
 Multi-tenant - can contain data from many
different databases
 Implementations
– Database (MySQL)
– Raw Files
Relay
Bootstra
p
Log Store
Snapshot
Store
online changes
Bootstrap
Consumer
seeding
Database

The Databus Client Library
 Glue between Databus Change
Stream and business logic in the
Consumer
 Switches between relay and bootstrap
as needed
 Optimizations
– Change events uses batch write
API without deserialization
 Periodically persists SCN for lossless
recovery
 Built-in support for parallelism
– Consumers need to be thread-safe
– Useful for scaling large batch processing
(bootstrap)
EventBuffer
Databus Change
Stream
Change
Stream Client
SCN
store
API
Dispatcher
Stream
Consumer
Bootstrap
Consumer
iterate
write
callback
read
Databus Client Library

Databus Applications
Consumer
S1
DatabusClient
Application
Consumer
S2
Consumer
Sn
S1
S2
Sn
Change
Streams
• Applications can process multiple
independent change streams
• Failure of one won’t affect
others
• Different logic and configuration
settings for bootstrap and online
consumption possible
• Processing can be tied to a
particular version of schema
• Able to override client library
persisted SCN

Client
Application
(i=1..k)
Client
Application
(k+1..N)
Change Stream
i= pk MOD N
(i=0..k-1)
(i=k..N-1)
• Databus Clients consume partitioned streams
• Partitioning strategy: Range or Hash
• Partitioning function applied at source
• Number of partitions (N) , and list of partitions (i) specified
statically in configuration
• Not easy to add/remove nodes
• Needs configuration change on all nodes
Client nodes uniform:
can process any
partition(s)
Clients distribute
processing load
Scaling Applications - I

Client
Application
N/m partitions
Application
N/m
partitions
Databus Stream
i= pk mod N
Dynamically
allocated
partitions
N partitions distributed
evenly amongst ‘m’
nodes
SCN written to central
location
• Databus Clients consume partitioned streams
• Partitioning strategy: MOD
• Partition function applied at source
• Number of partitions (N) , and cluster name specified
statically in configuration
• Easy to add or remove nodes
• Dynamic redistribution of partitions
• Fault tolerance for client nodes
Scaling Applications - II

Databus: Current Implementation
 OS - Linux, written in Java , runs Java 6
 All components have http interfaces
 Databus Client: Java
– Other language bindings possible
– All communication with change stream via http
 Libraries
– Netty , for http client-servers
– Avro , for serialization of change events
– Helix , for cluster awareness

Sample Code: Simple Application

Databus Performance : Relay
 Relay
– Saturates network with low CPU utilization
 CPU utilization increases with more clients
 Increased poll interval (increase consumer latency ) reduces CPU
utilization
– Scales to 100’s of consumers (client instances)

Performance: Relay Throughput
Databus 295/13/2013

Databus Performance : Consumer
 Consumer
– Latency primarily governed by ‘poll interval’
– Low overhead of library in event fetch
 Spike in latency due to network saturation at relay
 Scaling number of consumers
 Use partitioned consumption (filtering at relay )
– Reduces network utilization , but some increase in latency due to
filtering
 Increase ‘poll interval’ , tolerate higher latencies

Performance: Consumer Throughput
Databus 315/13/2013

Performance: End-End Latency
5/13/2013 Databus 32

Databus Bootstrap :Performance
 Bootstrap
– Should we serve from ‘catchup store’ or ‘snapshot store’
– Depends: Traffic patterns in the spectrum ‘all updates’ , ‘all
inserts’
– Tune service depending on fraction of update and inserts
 Favour snapshot based serving for update heavy traffic

Bootstrap Performance: Snapshot vs Catch-up
Databus 345/13/2013

M
Oracle Change
Event
Stream
M
Espresso
Change Event
Event Stream
Databus
Service
• Databus Change Stream is a
managed service
• Applications discover/lookup
coordinates of sources
• Multi-tenant , chained relays
• Many sources can be
bootstrapped from SCN 0
(beginning of time)
• Automated change stream
provisioning is a work in
progress
Databus at LinkedIn

Databus at LinkedIn : Monitoring
 Available out of the box as JMX Mbean
 Metrics for health
– lag between update time at DB and the time at which it was
received by application
– time of last contact to change event stream and source
 Metrics for capacity planning
– Event rate/ size
– Request rate
– Threads/ conns

Databus at LinkedIn: The Good
 Source isolation: Bootstrap benefits
– Typically, data extracted from sources just once (seeding)
– Bootstrap service used during launch of new applications
– Primary data store not subject to unpredictable high loads due to
lagging applications
 Common Data Format
– Avro offers ease-of-use , flexibility and performance
improvements (larger retention periods of change events in
Relay)
 Partitioned Stream Consumption
– Applications horizontally scaled to 100’s of instances

Databus at LinkedIn: Operational Niggles
 Oracle Change Capture Performance Bottlenecks
– Complex joins
– BLOBS and CLOBS
– High update rate driven contention on trigger table
 Bootstrap: Snapshot store seeding
– Consistent snapshot extraction from large sources
 Semi-automated change stream provisioning

Quick Review
 Specialization in Data Systems
– CDC pipeline is a first class infrastructure citizen up there with
stores and indexes
 Source Independent
– Change capture logic can be plugged in
 Use of SCN – an external clock attached to source
– Makes change stream more ‘portable’
– Easy for applications to reason about consistency with source
 Pub-Sub API support atomicity semantics of transactions
 Bootstrap Service
– Isolates the source from abusive scans
– Serves both streaming and batch use-cases
39

The Timeline Consistent Data Flow problem

Databus: First attempt (2007)
Issues
 Source database pressure
caused by slow consumers
 Brittle serialization

Databus - LinkedIn's Change Data Capture Pipeline

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Databus - LinkedIn's Change Data Capture Pipeline

Similar to Databus - LinkedIn's Change Data Capture Pipeline (20)

Databus - LinkedIn's Change Data Capture Pipeline

Editor's Notes