SlideShare a Scribd company logo
1 of 50
Download to read offline
Schema
Evolution 

Patterns
Alex Rasmussen
alex@bitsondisk.com John Gould (14.Sep.1804 - 3.Feb.1881) [Public domain]
Hi, I’m Alex!
https://www.bitsondisk.com/
LA-based 

Data Engineering 

Consultant
Twitter/GitHub/LinkedIn/…: 

@alexras
A Deceptively Hard Problem
•Classic three-tier web service
- Multiple servers for scalability
- Rolling updates for high availability
- API for extensibility
•How do we make changes to data?
- Let’s focus on one table, people
LB / API Gateway
DB
App App App App
Users API Clients
Deceptively Hard Problem #1
•Add administrative users
- Need to add is_admin to people table
- … but clients with the old schema will fail
to write if they don’t provide is_admin!
Won’t they?
Deceptively Hard Problem #2
•Splitting name into first_name, last_name
- Old clients will keep writing to name
- New clients will expect first_name and
last_name to be defined in old data
•How do we do this update safely?
Schema Evolution
•When your data’s shape (it’s schema) changes
•Why is this hard?
- Schemas can’t change everywhere instantly
- Client code can be very difficult to update
- If client and data schemas don’t agree, 

it can cause serious problems
How Do We Handle This?
•We need to give the illusion of instant
schema change to clients, with minimal
code change.
•In this talk, we’ll look at how.
Goal of This Talk
•Broadly-applicable concepts, techniques,

and patterns for schema evolution
- Schema compatibility for 

transparent schema change
- Data migration when compatibility 

isn’t possible or practical
•How this looks in practice
1. SCHEMA COMPATIBILITY
2. DATA MIGRATION
a. SINGLE SCHEMA
b. MULTI-SCHEMA
3. TAKEAWAYS
1. SCHEMA COMPATIBILITY
2. DATA MIGRATION
a. SINGLE SCHEMA
b. MULTI-SCHEMA
3. TAKEAWAYS
The Illusion of Instant Change
•Instant schema change everywhere isn’t possible,
but we want to give the illusion that it is
- Goal #1: Clients can still read and write safely,
even if their schemas are different
- Goal #2: Code change to clients is minimized
•Schema compatibility makes this easier to do
Schema Compatibility
•If two schemas are compatible, evolving
from one schema to another can be done

automatically on read
•Clients can be oblivious to schema change
•Two directions: backwards and forwards
Compatibility
X X+1
Backwards-Compatibility
Data written with old schema 

readable by clients with new schema
C
C
X X+1
Forwards-Compatibility
Data written with new schema 

readable by clients with old schema
Add a Field With a Default
name: string,
age: integer,
is_admin: boolean 

(default: false)
name: “Bob Jones”,
age: 42
name: “Tom Peters”,
age: 32,
is_admin: false
X X+1
CX CX+1
Backwards: reading , CX+1 adds is_admin = false
Forwards: reading , CX ignores is_admin
Remove a Field With a Default
name: “Alice Smith”,
age: 29,
is_admin: true
pto_days_left: 16
name: “Carol Danvers”,
age: 34,
is_admin: true
X X+1
CX CX+1
Backwards: reading , CX+1 ignores pto_days_left
Forwards: reading , CX adds pto_days_left = 0
name: string,
age: integer,
is_admin: boolean 

(default: false)
pto_days_left: integer
(default: 0)
Other Types of Changes
•Without defaults:
- Adding a field breaks backwards-compatibility 

(in older data, field value is undefined)
- Removing a field breaks forwards-compatibility 

(for older clients, field value is undefined)
•Renaming (e.g. ssn to social_security_number): 

it depends
In Practice: API Design
•So far, focused on DBs
•Compatibility is especially important for APIs
- Lots of clients you might not control
- API version bumps need to happen when
incompatible schema changes happen
In Practice - Protocol Buffers
message Person {
required string name = 1;
required int32 age = 2;
optional bool is_admin = 3 

[default = false];
}
•Field numbers make renames compatible
•In version 3, no required or optional - 

required broke backwards-compatibility too often
In Practice: Stripe
•Goal: API responses readable by all old clients w/o code change.
•API server has latest schema, but clients keep schema forever
•Solution: Version change modules applied in reverse order from
server’s version to client’s version (they admit: this is hard)
2 31
SC
3to2( )2to1( )
Recap
•Compatibility allows for transparent 

movement between schemas
•Changes can be 

backwards-compatible, 

forwards-compatible, both, or neither
•Ease-of-compatibility drives the design of many
messaging formats
1. SCHEMA COMPATIBILITY
2. DATA MIGRATION
a. SINGLE SCHEMA
b. MULTI-SCHEMA
3. TAKEAWAYS
Crossing Compatibility Gaps
•Need a plan for when compatibility 

isn’t an option
- Not all schema changes are compatible
- Not all incompatibilities are simple
- Not all compatible changes are practical
Complex Changes
name: string,
first_name: string,
last_name: string,
age: integer,
is_admin: boolean (default: false)
•Not obvious how to split; code changes required
•Two field additions without defaults: not backwards-compatible
•Field removal without default: not forwards-compatible
Impractical Changes
•e.g. Adding a column in MySQL (<v8)
requires locking/copying the table
- Days to weeks not unheard of for tables
with millions of rows
Crossing Compatibility Gaps
•Compatibility gaps are crossed with 

data migrations - minimally disruptive 

movement between schemas
•We’ll look at:
- Single-schema stores (e.g. RDBMS)
- Multi-schema stores (e.g. MongoDB, Kafka)
1. SCHEMA COMPATIBILITY
2. DATA MIGRATION
a. SINGLE SCHEMA
b. MULTI-SCHEMA
3. TAKEAWAYS
Three-Tier Web Architecture
S
C2C1 C3 C4
Load Balancer
name
“Bob Jones”
“Alice Smith”
“Jamie Lee Curtis”
first_name last_name
“Bob” “Jones”
“Alice” “Smith”
“Jamie Lee” “Curtis”
Single-Schema Migration
X X+1
C1 C2 C3 C4
S
Move from X 

to (incompatible) X + 1 

without downtime
Step 1: Create and migrate temporary store S’
C1 C2 C3 C4
S S’
X X+1
Step 2: Create a copier and an updater
C1 C2 C3 C4
S S’
U
C
X X+1
Step 3.1: Move clients over to new schema
C1 C2 C3 C4
S S’
U
C
X X+1
Step 3.2: Copy data, record / apply updates
S S’
U
C
X X+1
C1 C2 C3 C4
Step 4: Cutover - S’ becomes S
C1 C2 C3 C4
SSold
U
X X+1
Step 5: Drain updater, delete Sold
C1 C2 C3 C4
S
X X+1
In Practice - Percona
• pt-online-schema-change
- Copier: scan/copy in timed chunks
- Updater: synchronous table triggers
- Cutover: RENAME TABLE
In Practice - GitHub
•gh-ost
- Copier: chunked reads/writes
- Updater: read binlog, interleave copies
- Cutover: 2-step blocking swap
Recap
•In single-schema stores:
- Migrate clients gradually, maintaining the
illusion of the old schema to old clients
- Migrate data to new schema over time,
applying updates to old and new copies
- When migration complete, then cut over
1. SCHEMA COMPATIBILITY
2. DATA MIGRATION
a. SINGLE SCHEMA
b. MULTI-SCHEMA
3. TAKEAWAYS
Multi-Schema Stores
{
“name”: “Alice Smith”,
“age”: 29,
“organization”: “Engineering”
}
{
“name”: “Bob Jones”,
“age”: 42,
}
{
“name”: “Carol Danvers”,
“age”: 34,
“organization”: “Security”
}
•Data with different schemas
coexisting in the same store
•MongoDB: collections of
documents
•Kafka: topics of messages
•Want illusion of single schema
Multi-Schema Migration
C1 C2 C3
X X+1 Move data from 

schema X to

(backwards-incompatible)

schema X + 1

without blocking clients
C1 C2 C3
X X+1
Step 1: Old clients write with new schema, 

continue reading with old schema
(old clients are still compatible!)
C1 C2
C1 C2 C3
X X+1
Step 2: Migrate old data to new schema
C1 C2
C1 C2 C3
X X+1
Step 3: Old clients read and write 

with new schema
In Practice: Kafka (Confluent)
•Schema-aware clients transparently
apply compatible changes
•Backwards-incompatible changes: 

update writers first
•Forwards-incompatible changes: 

update readers first
Recap
•In multi-schema stores:
- Make old clients generate compatible data

(by writing or reading with new schema)
- Migrate old data to new schema
- Old clients read and write with new schema
1. SCHEMA COMPATIBILITY
2. DATA MIGRATION
a. SINGLE SCHEMA
b. MULTI-SCHEMA
3. TAKEAWAYS
Summary
•Schemas can’t change everywhere instantly
•Schema compatibility can transparently
provide the illusion of instant change
•Data migrations fill in compatibility gaps,
carefully keeping clients working
Takeaways
•This applies to DB schema changes and
API versioning, but it also applies to
CSV/JSON/Excel, etc.
•If your data has structure, it probably
has a schema, & these concepts apply
Takeaways
•Reason about schema evolution up-front to guide
your architecture choices
- Prefer compatible changes
•Have a plan for dealing with incompatibility
- Present the illusion of instant schema change
•Remember: this is a hard problem for everyone!
Thank You! Questions?
https://www.bitsondisk.com/
Consulting Inquiries: 

alex@bitsondisk.com
John Gould (14.Sep.1804 - 3.Feb.1881) [Public domain]

More Related Content

What's hot (13)

Romans 10 full outline
Romans 10 full outlineRomans 10 full outline
Romans 10 full outline
 
Grow in Christlikeness
Grow in Christlikeness Grow in Christlikeness
Grow in Christlikeness
 
El shaddai
El shaddaiEl shaddai
El shaddai
 
10 tips for Successful Crowdsourcing
10 tips for Successful Crowdsourcing10 tips for Successful Crowdsourcing
10 tips for Successful Crowdsourcing
 
Showing Compassion
Showing Compassion Showing Compassion
Showing Compassion
 
Friendship
FriendshipFriendship
Friendship
 
Don't waste your life
Don't waste your lifeDon't waste your life
Don't waste your life
 
SFC CLP Talk #1 - God's Love
SFC CLP  Talk #1 - God's LoveSFC CLP  Talk #1 - God's Love
SFC CLP Talk #1 - God's Love
 
The Fruits And Gifts Of The Holy Spirit
The Fruits And Gifts Of The Holy SpiritThe Fruits And Gifts Of The Holy Spirit
The Fruits And Gifts Of The Holy Spirit
 
Faith is encouraging sermon slides
Faith is encouraging sermon slidesFaith is encouraging sermon slides
Faith is encouraging sermon slides
 
Object Lesson - Balloon Faith
Object Lesson - Balloon FaithObject Lesson - Balloon Faith
Object Lesson - Balloon Faith
 
5. Fruit of the Spirit - Kindness
5. Fruit of the Spirit - Kindness5. Fruit of the Spirit - Kindness
5. Fruit of the Spirit - Kindness
 
Ask seek knock
Ask seek knockAsk seek knock
Ask seek knock
 

Similar to Schema Evolution Patterns - Texas Scalability Summit 2019

CTO Leadership Series: Schema Evolution Patterns
CTO Leadership Series: Schema Evolution PatternsCTO Leadership Series: Schema Evolution Patterns
CTO Leadership Series: Schema Evolution PatternsAggregage
 
CTO Leadership Series: Schema Evolution Patterns
CTO Leadership Series: Schema Evolution PatternsCTO Leadership Series: Schema Evolution Patterns
CTO Leadership Series: Schema Evolution PatternsBrittanyShear
 
Schema Evolution Patterns - Velocity SJ 2019
Schema Evolution Patterns - Velocity SJ 2019Schema Evolution Patterns - Velocity SJ 2019
Schema Evolution Patterns - Velocity SJ 2019Alex Rasmussen
 
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?Clustrix
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Sid Anand
 
Rails DB migrations
Rails DB migrationsRails DB migrations
Rails DB migrationsDenys Kurets
 
Project COLA: Use Case to create a scalable application in the cloud based on...
Project COLA: Use Case to create a scalable application in the cloud based on...Project COLA: Use Case to create a scalable application in the cloud based on...
Project COLA: Use Case to create a scalable application in the cloud based on...Project COLA
 
Presentation3 Multi-User Architecture.pdf
Presentation3 Multi-User Architecture.pdfPresentation3 Multi-User Architecture.pdf
Presentation3 Multi-User Architecture.pdfssuserd86b931
 
Couchbase Chennai Meetup: Developing with Couchbase- made easy
Couchbase Chennai Meetup:  Developing with Couchbase- made easyCouchbase Chennai Meetup:  Developing with Couchbase- made easy
Couchbase Chennai Meetup: Developing with Couchbase- made easyKarthik Babu Sekar
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...ScyllaDB
 
Always On - Zero Downtime releases
Always On - Zero Downtime releasesAlways On - Zero Downtime releases
Always On - Zero Downtime releasesAnders Lundsgård
 
Couchbase Singapore Meetup #2: Why Developing with Couchbase is easy !!
Couchbase Singapore Meetup #2:  Why Developing with Couchbase is easy !! Couchbase Singapore Meetup #2:  Why Developing with Couchbase is easy !!
Couchbase Singapore Meetup #2: Why Developing with Couchbase is easy !! Karthik Babu Sekar
 
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache PulsarPulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache PulsarShivji Kumar Jha
 
Nosql part1 8th December
Nosql part1 8th December Nosql part1 8th December
Nosql part1 8th December Ruru Chowdhury
 

Similar to Schema Evolution Patterns - Texas Scalability Summit 2019 (20)

CTO Leadership Series: Schema Evolution Patterns
CTO Leadership Series: Schema Evolution PatternsCTO Leadership Series: Schema Evolution Patterns
CTO Leadership Series: Schema Evolution Patterns
 
CTO Leadership Series: Schema Evolution Patterns
CTO Leadership Series: Schema Evolution PatternsCTO Leadership Series: Schema Evolution Patterns
CTO Leadership Series: Schema Evolution Patterns
 
Schema Evolution Patterns - Velocity SJ 2019
Schema Evolution Patterns - Velocity SJ 2019Schema Evolution Patterns - Velocity SJ 2019
Schema Evolution Patterns - Velocity SJ 2019
 
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
 
Master.pptx
Master.pptxMaster.pptx
Master.pptx
 
CDC to the Max!
CDC to the Max!CDC to the Max!
CDC to the Max!
 
No sql
No sqlNo sql
No sql
 
MongoDB
MongoDBMongoDB
MongoDB
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
 
Rails DB migrations
Rails DB migrationsRails DB migrations
Rails DB migrations
 
Project COLA: Use Case to create a scalable application in the cloud based on...
Project COLA: Use Case to create a scalable application in the cloud based on...Project COLA: Use Case to create a scalable application in the cloud based on...
Project COLA: Use Case to create a scalable application in the cloud based on...
 
Presentation3 Multi-User Architecture.pdf
Presentation3 Multi-User Architecture.pdfPresentation3 Multi-User Architecture.pdf
Presentation3 Multi-User Architecture.pdf
 
Couchbase Chennai Meetup: Developing with Couchbase- made easy
Couchbase Chennai Meetup:  Developing with Couchbase- made easyCouchbase Chennai Meetup:  Developing with Couchbase- made easy
Couchbase Chennai Meetup: Developing with Couchbase- made easy
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
 
Always On - Zero Downtime releases
Always On - Zero Downtime releasesAlways On - Zero Downtime releases
Always On - Zero Downtime releases
 
Couchbase Singapore Meetup #2: Why Developing with Couchbase is easy !!
Couchbase Singapore Meetup #2:  Why Developing with Couchbase is easy !! Couchbase Singapore Meetup #2:  Why Developing with Couchbase is easy !!
Couchbase Singapore Meetup #2: Why Developing with Couchbase is easy !!
 
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache PulsarPulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
 
Couchbase 3.0.2 d1
Couchbase 3.0.2  d1Couchbase 3.0.2  d1
Couchbase 3.0.2 d1
 
Nosql part1 8th December
Nosql part1 8th December Nosql part1 8th December
Nosql part1 8th December
 

More from Alex Rasmussen

EarthBound’s almost-Turing-complete text system!
EarthBound’s almost-Turing-complete text system!EarthBound’s almost-Turing-complete text system!
EarthBound’s almost-Turing-complete text system!Alex Rasmussen
 
How Do We Solve The World's Spreadsheet Problem? - Velocity NY 2018
How Do We Solve The World's Spreadsheet Problem? - Velocity NY 2018How Do We Solve The World's Spreadsheet Problem? - Velocity NY 2018
How Do We Solve The World's Spreadsheet Problem? - Velocity NY 2018Alex Rasmussen
 
Unorthodox Paths to High Performance - QCon NY 2016
Unorthodox Paths to High Performance - QCon NY 2016Unorthodox Paths to High Performance - QCon NY 2016
Unorthodox Paths to High Performance - QCon NY 2016Alex Rasmussen
 
Papers We Love January 2015 - Flat Datacenter Storage
Papers We Love January 2015 - Flat Datacenter StoragePapers We Love January 2015 - Flat Datacenter Storage
Papers We Love January 2015 - Flat Datacenter StorageAlex Rasmussen
 
Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)Alex Rasmussen
 
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)Alex Rasmussen
 

More from Alex Rasmussen (6)

EarthBound’s almost-Turing-complete text system!
EarthBound’s almost-Turing-complete text system!EarthBound’s almost-Turing-complete text system!
EarthBound’s almost-Turing-complete text system!
 
How Do We Solve The World's Spreadsheet Problem? - Velocity NY 2018
How Do We Solve The World's Spreadsheet Problem? - Velocity NY 2018How Do We Solve The World's Spreadsheet Problem? - Velocity NY 2018
How Do We Solve The World's Spreadsheet Problem? - Velocity NY 2018
 
Unorthodox Paths to High Performance - QCon NY 2016
Unorthodox Paths to High Performance - QCon NY 2016Unorthodox Paths to High Performance - QCon NY 2016
Unorthodox Paths to High Performance - QCon NY 2016
 
Papers We Love January 2015 - Flat Datacenter Storage
Papers We Love January 2015 - Flat Datacenter StoragePapers We Love January 2015 - Flat Datacenter Storage
Papers We Love January 2015 - Flat Datacenter Storage
 
Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)Themis: An I/O-Efficient MapReduce (SoCC 2012)
Themis: An I/O-Efficient MapReduce (SoCC 2012)
 
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
TritonSort: A Balanced Large-Scale Sorting System (NSDI 2011)
 

Recently uploaded

BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxalwaysnagaraju26
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 

Recently uploaded (20)

BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 

Schema Evolution Patterns - Texas Scalability Summit 2019

  • 1. Schema Evolution 
 Patterns Alex Rasmussen alex@bitsondisk.com John Gould (14.Sep.1804 - 3.Feb.1881) [Public domain]
  • 2. Hi, I’m Alex! https://www.bitsondisk.com/ LA-based 
 Data Engineering 
 Consultant Twitter/GitHub/LinkedIn/…: 
 @alexras
  • 3. A Deceptively Hard Problem •Classic three-tier web service - Multiple servers for scalability - Rolling updates for high availability - API for extensibility •How do we make changes to data? - Let’s focus on one table, people LB / API Gateway DB App App App App Users API Clients
  • 4. Deceptively Hard Problem #1 •Add administrative users - Need to add is_admin to people table - … but clients with the old schema will fail to write if they don’t provide is_admin! Won’t they?
  • 5. Deceptively Hard Problem #2 •Splitting name into first_name, last_name - Old clients will keep writing to name - New clients will expect first_name and last_name to be defined in old data •How do we do this update safely?
  • 6. Schema Evolution •When your data’s shape (it’s schema) changes •Why is this hard? - Schemas can’t change everywhere instantly - Client code can be very difficult to update - If client and data schemas don’t agree, 
 it can cause serious problems
  • 7. How Do We Handle This? •We need to give the illusion of instant schema change to clients, with minimal code change. •In this talk, we’ll look at how.
  • 8. Goal of This Talk •Broadly-applicable concepts, techniques,
 and patterns for schema evolution - Schema compatibility for 
 transparent schema change - Data migration when compatibility 
 isn’t possible or practical •How this looks in practice
  • 9. 1. SCHEMA COMPATIBILITY 2. DATA MIGRATION a. SINGLE SCHEMA b. MULTI-SCHEMA 3. TAKEAWAYS
  • 10. 1. SCHEMA COMPATIBILITY 2. DATA MIGRATION a. SINGLE SCHEMA b. MULTI-SCHEMA 3. TAKEAWAYS
  • 11. The Illusion of Instant Change •Instant schema change everywhere isn’t possible, but we want to give the illusion that it is - Goal #1: Clients can still read and write safely, even if their schemas are different - Goal #2: Code change to clients is minimized •Schema compatibility makes this easier to do
  • 12. Schema Compatibility •If two schemas are compatible, evolving from one schema to another can be done
 automatically on read •Clients can be oblivious to schema change •Two directions: backwards and forwards
  • 13. Compatibility X X+1 Backwards-Compatibility Data written with old schema 
 readable by clients with new schema C C X X+1 Forwards-Compatibility Data written with new schema 
 readable by clients with old schema
  • 14. Add a Field With a Default name: string, age: integer, is_admin: boolean 
 (default: false) name: “Bob Jones”, age: 42 name: “Tom Peters”, age: 32, is_admin: false X X+1 CX CX+1 Backwards: reading , CX+1 adds is_admin = false Forwards: reading , CX ignores is_admin
  • 15. Remove a Field With a Default name: “Alice Smith”, age: 29, is_admin: true pto_days_left: 16 name: “Carol Danvers”, age: 34, is_admin: true X X+1 CX CX+1 Backwards: reading , CX+1 ignores pto_days_left Forwards: reading , CX adds pto_days_left = 0 name: string, age: integer, is_admin: boolean 
 (default: false) pto_days_left: integer (default: 0)
  • 16. Other Types of Changes •Without defaults: - Adding a field breaks backwards-compatibility 
 (in older data, field value is undefined) - Removing a field breaks forwards-compatibility 
 (for older clients, field value is undefined) •Renaming (e.g. ssn to social_security_number): 
 it depends
  • 17. In Practice: API Design •So far, focused on DBs •Compatibility is especially important for APIs - Lots of clients you might not control - API version bumps need to happen when incompatible schema changes happen
  • 18. In Practice - Protocol Buffers message Person { required string name = 1; required int32 age = 2; optional bool is_admin = 3 
 [default = false]; } •Field numbers make renames compatible •In version 3, no required or optional - 
 required broke backwards-compatibility too often
  • 19. In Practice: Stripe •Goal: API responses readable by all old clients w/o code change. •API server has latest schema, but clients keep schema forever •Solution: Version change modules applied in reverse order from server’s version to client’s version (they admit: this is hard) 2 31 SC 3to2( )2to1( )
  • 20. Recap •Compatibility allows for transparent 
 movement between schemas •Changes can be 
 backwards-compatible, 
 forwards-compatible, both, or neither •Ease-of-compatibility drives the design of many messaging formats
  • 21. 1. SCHEMA COMPATIBILITY 2. DATA MIGRATION a. SINGLE SCHEMA b. MULTI-SCHEMA 3. TAKEAWAYS
  • 22. Crossing Compatibility Gaps •Need a plan for when compatibility 
 isn’t an option - Not all schema changes are compatible - Not all incompatibilities are simple - Not all compatible changes are practical
  • 23. Complex Changes name: string, first_name: string, last_name: string, age: integer, is_admin: boolean (default: false) •Not obvious how to split; code changes required •Two field additions without defaults: not backwards-compatible •Field removal without default: not forwards-compatible
  • 24. Impractical Changes •e.g. Adding a column in MySQL (<v8) requires locking/copying the table - Days to weeks not unheard of for tables with millions of rows
  • 25. Crossing Compatibility Gaps •Compatibility gaps are crossed with 
 data migrations - minimally disruptive 
 movement between schemas •We’ll look at: - Single-schema stores (e.g. RDBMS) - Multi-schema stores (e.g. MongoDB, Kafka)
  • 26. 1. SCHEMA COMPATIBILITY 2. DATA MIGRATION a. SINGLE SCHEMA b. MULTI-SCHEMA 3. TAKEAWAYS
  • 27. Three-Tier Web Architecture S C2C1 C3 C4 Load Balancer name “Bob Jones” “Alice Smith” “Jamie Lee Curtis” first_name last_name “Bob” “Jones” “Alice” “Smith” “Jamie Lee” “Curtis”
  • 28. Single-Schema Migration X X+1 C1 C2 C3 C4 S Move from X 
 to (incompatible) X + 1 
 without downtime
  • 29. Step 1: Create and migrate temporary store S’ C1 C2 C3 C4 S S’ X X+1
  • 30. Step 2: Create a copier and an updater C1 C2 C3 C4 S S’ U C X X+1
  • 31. Step 3.1: Move clients over to new schema C1 C2 C3 C4 S S’ U C X X+1
  • 32. Step 3.2: Copy data, record / apply updates S S’ U C X X+1 C1 C2 C3 C4
  • 33. Step 4: Cutover - S’ becomes S C1 C2 C3 C4 SSold U X X+1
  • 34. Step 5: Drain updater, delete Sold C1 C2 C3 C4 S X X+1
  • 35. In Practice - Percona • pt-online-schema-change - Copier: scan/copy in timed chunks - Updater: synchronous table triggers - Cutover: RENAME TABLE
  • 36. In Practice - GitHub •gh-ost - Copier: chunked reads/writes - Updater: read binlog, interleave copies - Cutover: 2-step blocking swap
  • 37. Recap •In single-schema stores: - Migrate clients gradually, maintaining the illusion of the old schema to old clients - Migrate data to new schema over time, applying updates to old and new copies - When migration complete, then cut over
  • 38. 1. SCHEMA COMPATIBILITY 2. DATA MIGRATION a. SINGLE SCHEMA b. MULTI-SCHEMA 3. TAKEAWAYS
  • 39. Multi-Schema Stores { “name”: “Alice Smith”, “age”: 29, “organization”: “Engineering” } { “name”: “Bob Jones”, “age”: 42, } { “name”: “Carol Danvers”, “age”: 34, “organization”: “Security” } •Data with different schemas coexisting in the same store •MongoDB: collections of documents •Kafka: topics of messages •Want illusion of single schema
  • 40. Multi-Schema Migration C1 C2 C3 X X+1 Move data from 
 schema X to
 (backwards-incompatible)
 schema X + 1
 without blocking clients
  • 41. C1 C2 C3 X X+1 Step 1: Old clients write with new schema, 
 continue reading with old schema (old clients are still compatible!) C1 C2
  • 42. C1 C2 C3 X X+1 Step 2: Migrate old data to new schema C1 C2
  • 43. C1 C2 C3 X X+1 Step 3: Old clients read and write 
 with new schema
  • 44. In Practice: Kafka (Confluent) •Schema-aware clients transparently apply compatible changes •Backwards-incompatible changes: 
 update writers first •Forwards-incompatible changes: 
 update readers first
  • 45. Recap •In multi-schema stores: - Make old clients generate compatible data
 (by writing or reading with new schema) - Migrate old data to new schema - Old clients read and write with new schema
  • 46. 1. SCHEMA COMPATIBILITY 2. DATA MIGRATION a. SINGLE SCHEMA b. MULTI-SCHEMA 3. TAKEAWAYS
  • 47. Summary •Schemas can’t change everywhere instantly •Schema compatibility can transparently provide the illusion of instant change •Data migrations fill in compatibility gaps, carefully keeping clients working
  • 48. Takeaways •This applies to DB schema changes and API versioning, but it also applies to CSV/JSON/Excel, etc. •If your data has structure, it probably has a schema, & these concepts apply
  • 49. Takeaways •Reason about schema evolution up-front to guide your architecture choices - Prefer compatible changes •Have a plan for dealing with incompatibility - Present the illusion of instant schema change •Remember: this is a hard problem for everyone!
  • 50. Thank You! Questions? https://www.bitsondisk.com/ Consulting Inquiries: 
 alex@bitsondisk.com John Gould (14.Sep.1804 - 3.Feb.1881) [Public domain]