SlideShare a Scribd company logo
1 of 66
Download to read offline
pg chameleon
MySQL to PostgreSQL replica made easy
Federico Campoli
Transferwise
PGCon, Ottawa
01 Jun 2018
http://www.pgdba.org
@4thdoctor scarf
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 1 / 46
Few words about the speaker
Born in 1972
Passionate about IT since 1982
mostly because of the TRON movie
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 2 / 46
Few words about the speaker
Born in 1972
Passionate about IT since 1982
mostly because of the TRON movie
Joined the Oracle DBA secret society in 2004
In love with PostgreSQL since 2006
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 2 / 46
Few words about the speaker
Born in 1972
Passionate about IT since 1982
mostly because of the TRON movie
Joined the Oracle DBA secret society in 2004
In love with PostgreSQL since 2006
Devrim PostgreSQL tattoo’s copycat
Works at Transferwise as Data Engineer
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 2 / 46
Disclaimer
I’m not a developer
I’m a DBA...
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 3 / 46
Disclaimer
I’m not a developer
I’m a DBA...which means being hated by everybody and hating everybody
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 3 / 46
Disclaimer
I’m not a developer
I’m a DBA...which means being hated by everybody and hating everybody
So, to put things in the right perspective...
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 3 / 46
Disclaimer
I’m not a developer
I’m a DBA...which means being hated by everybody and hating everybody
So, to put things in the right perspective...I use tabs
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 3 / 46
Palpatine
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 4 / 46
Table of contents
1 History
2 MySQL Replica in a nutshell
3 A chameleon in the middle
4 Replica in action
5 Lessons learned
6 Wrap up
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 5 / 46
History
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 6 / 46
The beginnings
Years 2006/2012
neo my2pg.py
I wrote the script because of a struggling phpbb on MySQL
The database migration was successful
However phpbb didn’t work very well with PostgreSQL.1
1Opening a new connection for each query is not the smartest thing to do.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 7 / 46
The beginnings
Years 2006/2012
neo my2pg.py
I wrote the script because of a struggling phpbb on MySQL
The database migration was successful
However phpbb didn’t work very well with PostgreSQL.1
The script is written in python 2.6
It’s a monolith script
And it’s slow, very slow
1Opening a new connection for each query is not the smartest thing to do.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 7 / 46
The beginnings
Years 2006/2012
neo my2pg.py
I wrote the script because of a struggling phpbb on MySQL
The database migration was successful
However phpbb didn’t work very well with PostgreSQL.1
The script is written in python 2.6
It’s a monolith script
And it’s slow, very slow
It’s a good checklist for things to avoid when coding
https://github.com/the4thdoctor/neo my2pg
1Opening a new connection for each query is not the smartest thing to do.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 7 / 46
I’m not scared of using the ORMs
Years 2013/2015
First attempt of pg chameleon
Developed in Python 2.7
Used SQLAlchemy for extracting the MySQL’s metadata
Proof of concept only
It was built during the years of the life on a roller coaster2
Therefore it was a just a way to discharge frustration
2Recording available here: http://www.pgbrighton.uk/post/backup recovery/
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 8 / 46
I’m not scared of using the ORMs
Years 2013/2015
First attempt of pg chameleon
Developed in Python 2.7
Used SQLAlchemy for extracting the MySQL’s metadata
Proof of concept only
It was built during the years of the life on a roller coaster2
Therefore it was a just a way to discharge frustration
Abandoned after a while
SQLAlchemy’s limitations were frustrating as well (see slide 3)
And pgloader did the same job much much better
2Recording available here: http://www.pgbrighton.uk/post/backup recovery/
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 8 / 46
pg chameleon reborn
Year 2016
I needed to replicate the data data from MySQL to PostgreSQL
http://tech.transferwise.com/scaling-our-analytics-database/
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 9 / 46
pg chameleon reborn
Year 2016
I needed to replicate the data data from MySQL to PostgreSQL
http://tech.transferwise.com/scaling-our-analytics-database/
The amazing library python-mysql-replication allowed me build a proof of
concept
Evolved later in pg chameleon 1.x
Kudos to the python-mysql-replication team!
https://github.com/noplay/python-mysql-replication
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 9 / 46
pg chameleon 1.x
Developed on the London to Brighton commute
Released as stable the 7th May 2017
Followed by 8 bugfix releases
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 10 / 46
pg chameleon 1.x
Developed on the London to Brighton commute
Released as stable the 7th May 2017
Followed by 8 bugfix releases
Compatible with CPython 2.7/3.3+
No more SQLAlchemy
The MySQL driver changed from MySQLdb to PyMySQL
Command line helper
Supports type override on the fly (Danger!)
Installs in virtualenv and system wide via pypi
Can detach the replica for minimal downtime migrations
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 10 / 46
pg chameleon versions 1’s limitations
All the affected tables are locked in read only mode during the init replica
process
During the init replica the data is not accessible
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 11 / 46
pg chameleon versions 1’s limitations
All the affected tables are locked in read only mode during the init replica
process
During the init replica the data is not accessible
The tables for being replicated require primary keys
No daemon, the process always stays in foreground
Single schema replica
One process per each schema
Network inefficient
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 11 / 46
pg chameleon versions 1’s limitations
All the affected tables are locked in read only mode during the init replica
process
During the init replica the data is not accessible
The tables for being replicated require primary keys
No daemon, the process always stays in foreground
Single schema replica
One process per each schema
Network inefficient
Read and replay not concurrent with risk of high lag
The optional threaded mode very inefficient and fragile
A single error in the replay process and the replica is broken
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 11 / 46
MySQL Replica in a nutshell
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 12 / 46
MySQL Replica
The MySQL replica is logical
When the replica is enabled the data changes are stored in the master’s
binary log files
The slave gets from the master’s binary log files
The slave saves the stream of data into local relay logs
The relay logs are replayed against the slave
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 13 / 46
MySQL Replica
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 14 / 46
Log formats
MySQL have three ways of storing the changes in the binary logs.
STATEMENT: It logs the statements which are replayed on the slave.
It’s the best solution for the bandwidth. However, when replaying statements
with not deterministic functions this format generates different values on the
slave (e.g. using an insert with a column autogenerated by the uuid function).
ROW: It’s deterministic. This format logs the row images.
MIXED takes the best of both worlds. The master logs the statements unless
a not deterministic function is used. In that case it logs the row image.
All three formats always log the DDL query events.
The python-mysql-replication library and therefore pg chameleon, require the
ROW format to work properly.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 15 / 46
A chameleon in the middle
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 16 / 46
pg chameleon
pg chameleon mimics a mysql slave’s behaviour
It performs the initial load for the replicated tables
It connects to the MySQL replica protocol
It stores the row images into a PostgreSQL table
A plpgSQL function decodes the rows and replay the changes
It can detach the replica for minimal downtime migrations
PostgreSQL acts as relay log and replication slave
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 17 / 46
MySQL replica + pg chameleon
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 18 / 46
pg chameleon 2.0 #1
Developed at the pgconf.eu 2017 and on the commute
Released as stable the 1st of January 2018
Compatible with python 3.3+
Installs in virtualenv and system wide via pypi
Replicates multiple schemas from a single MySQL into a target PostgreSQL
database
Conservative approach to the replica. Tables which generate errors are
automatically excluded from the replica
Daemonised replica process with two distinct subprocesses, for concurrent
read and replay
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 19 / 46
pg chameleon 2.0 #2
Soft locking replica initialisation. The tables are locked only during the copy.
Rollbar integration for a simpler error detection and messaging
Experimental support for the PostgreSQL source type
The tables are loaded in a separate schema which is swapped with the
existing.
This approach requires more space but it makes the init a replica virtually
painless, leaving the old data accessible until the init replica is complete.
The DDL are translated in the PostgreSQL dialect keeping the schema in
sync with MySQL automatically
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 20 / 46
Version 2.0’s limitations
Tables for being replicated require primary or unique keys
When detaching the replica the foreign keys are created always ON
DELETE/UPDATE RESTRICT
The source type PostgreSQL supports only the init replica process
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 21 / 46
Replica initialisation
The replica initialisation follows the same workflow as stated on the mysql online
manual.
Flush the tables with read lock
Get the master’s coordinates
Copy the data
Release the locks
However...
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 22 / 46
Replica initialisation
The replica initialisation follows the same workflow as stated on the mysql online
manual.
Flush the tables with read lock
Get the master’s coordinates
Copy the data
Release the locks
However...
pg chameleon flushes the tables with read lock one by one. The lock is held only
during the copy.
The log coordinates are stored in the replica catalogue along the table’s name and
used by the replica process to determine whether the table’s binlog data should be
used or not.
The replica starts inconsistent and gains consistency over time.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 22 / 46
Fallback on failure
The data is pulled from mysql using the CSV format in slices. This approach
prevents the memory overload.
Once the file is saved then is pushed into PostgreSQL using the COPY command.
However...
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 23 / 46
Fallback on failure
The data is pulled from mysql using the CSV format in slices. This approach
prevents the memory overload.
Once the file is saved then is pushed into PostgreSQL using the COPY command.
However...
COPY is fast but is single transaction
One failure and the entire batch is rolled back
If this happens the procedure loads the same data using the INSERT
statements
Which can be very slow
The process attempts to clean the NUL markers which are allowed by MySQL
If the row still fails on insert then it’s discarded
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 23 / 46
Replica in action
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 24 / 46
MySQL configuration
The mysql configuration file is usually stored in /etc/mysql/my.cnf
To enable the binary logging find the section [mysqld] and check that the
following parameters are set.
binlog_format= ROW
log-bin = mysql-bin
server-id = 1
binlog-row-image = FULL
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 25 / 46
MySQL user for replica
Setup a replication user on MySQL
CREATE USER usr_replica ;
SET PASSWORD FOR usr_replica =PASSWORD(’replica ’);
GRANT ALL ON sakila .* TO ’usr_replica ’;
GRANT RELOAD ON *.* to ’usr_replica ’;
GRANT REPLICATION CLIENT ON *.* to ’usr_replica ’;
GRANT REPLICATION SLAVE ON *.* to ’usr_replica ’;
FLUSH PRIVILEGES;
In our example we are using the sakila test database.
https://dev.mysql.com/doc/sakila/en/
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 26 / 46
PostgreSQL setup
Add an user on PostgreSQL capable to create schemas and relations in the
destination database
CREATE USER usr_replica WITH PASSWORD ’replica ’;
CREATE DATABASE db_replica WITH OWNER usr_replica;
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 27 / 46
Install pg chameleon
Install pg chameleon and create the configuration files
pip install pip --upgrade
pip install pg_chameleon
chameleon set_configuration_files
cd ~/.pg_chameleon/configuration
cp config-example.yml default.yml
Edit the file default.yml setting the correct values for connection and source.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 28 / 46
Configure global settings in default.yaml
PostgreSQL Connection
pg conn:
host: " localhost "
p or t : " 5432 "
u s e r : " usr_replica "
password: " replica "
database: " db_replica "
c h a r s e t : " utf8 "
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 29 / 46
Configure global settings in default.yaml
PostgreSQL Connection
pg conn:
host: " localhost "
p or t : " 5432 "
u s e r : " usr_replica "
password: " replica "
database: " db_replica "
c h a r s e t : " utf8 "
Rollbar configuration
r o l l b a r k e y : ’< rollbar_long_key>’
r o l l b a r e n v : ’pgcon - demo ’
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 29 / 46
Configure global settings in default.yaml
PostgreSQL Connection
pg conn:
host: " localhost "
p or t : " 5432 "
u s e r : " usr_replica "
password: " replica "
database: " db_replica "
c h a r s e t : " utf8 "
Rollbar configuration
r o l l b a r k e y : ’< rollbar_long_key>’
r o l l b a r e n v : ’pgcon - demo ’
Type override (optional)
t y p e o v e r r i d e :
" tinyint (1) ":
o v e r r i d e t o : b o o l e a n
o v e r r i d e t a b l e s :
- "*"
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 29 / 46
Configure the mysql source
s o u r c e s :
mysql:
db conn:
host: " localhost "
po r t : " 3306 "
u s e r : " usr_replica "
password: " replica "
c h a r s e t : ’utf8 ’
connect timeout: 10
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 30 / 46
Configure the mysql source
s o u r c e s :
mysql:
db conn:
host: " localhost "
po r t : " 3306 "
u s e r : " usr_replica "
password: " replica "
c h a r s e t : ’utf8 ’
connect timeout: 10
schema mappings:
s a k i l a : l o x o d o n t a a f r i c a n a
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 30 / 46
Configure the mysql source
s o u r c e s :
mysql:
db conn:
host: " localhost "
po r t : " 3306 "
u s e r : " usr_replica "
password: " replica "
c h a r s e t : ’utf8 ’
connect timeout: 10
schema mappings:
s a k i l a : l o x o d o n t a a f r i c a n a
l i m i t t a b l e s :
s k i p t a b l e s :
g r a n t s e l e c t t o :
- u s r r e a d o n l y
l o c k t i m e o u t : " 120 s"
m y s e r v e r i d : 100
r e p l i c a b a t c h s i z e : 10000
rep l ay max row s: 10000
b a t c h r e t e n t i o n : ’1 day ’
copy max memory: " 300 M"
copy mode: ’file ’
o u t d i r : /tmp
s l e e p l o o p : 1
o n e r r o r r e p l a y : c o n t i n u e
o n e r r o r r e a d : c o n t i n u e
auto maintenance: "1 day "
type: mysql
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 30 / 46
Add the source and initialise the replica
Add the source mysql and initialise the replica for it. We are using debug in order
to get the logging on the console.
chameleon create_replica_schema --debug
chameleon add_source --config default --source mysql --debug
chameleon init_replica --config default --source mysql --debug
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 31 / 46
Start the replica
Start the replica process
chameleon start_replica --config default --source mysql
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 32 / 46
Start the replica
Start the replica process
chameleon start_replica --config default --source mysql
Show the replica status
chameleon show_status --config default --source mysql
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 32 / 46
Time for a demo
Demo!
The demo will fail miserably for sure and you will hate this project forever.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 33 / 46
Lessons learned
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 34 / 46
Strictness is an illusion. MySQL doubly so
MySQL’s lack of strictness is not a mystery.
The funny way the default with NOT NULL is managed by MySQL can break the
replica.
Therefore any field with NOT NULL added after the initialisation are created
always as NULLable in PostgreSQL.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 35 / 46
The DDL. A real pain in the back
I initially tried to use sqlparse for tokenising the DDL emitted by MySQL.
Unfortunately didn’t worked as I expected.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 36 / 46
The DDL. A real pain in the back
I initially tried to use sqlparse for tokenising the DDL emitted by MySQL.
Unfortunately didn’t worked as I expected.
So I decided to use the regular expressions.
Some people, when confronted with a problem,
think "I know, I’ll use regular expressions."
Now they have two problems.
-- Jamie Zawinski
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 36 / 46
The DDL. A real pain in the back
I initially tried to use sqlparse for tokenising the DDL emitted by MySQL.
Unfortunately didn’t worked as I expected.
So I decided to use the regular expressions.
Some people, when confronted with a problem,
think "I know, I’ll use regular expressions."
Now they have two problems.
-- Jamie Zawinski
MySQL even in ROW format emits the DDL as statements
The class sql token uses the regular expressions to tokenise the DDL
The tokenised data is used to build the DDL in the PostgreSQL dialect
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 36 / 46
Wrap up
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 37 / 46
To boldly go where no chameleon has gone before
Short team goals, version 2.0
Re sync automatically the tables when they error on replay
Improve the replay speed and cpu efficiency
GTID support for MySQL source
Medium term goals version 2.1
Parallel copy and index creation in order to speed up the init replica process
Logical replica from PostgreSQL
Improve the default column handling
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 38 / 46
Igor, the green little guy
The chameleon logo has been developed by Elena Toma, a talented Italian Lady.
https://www.facebook.com/Tonkipapperoart/
The name Igor is inspired by Martin Feldman’s Igor portraited in Young
Frankenstein movie.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 39 / 46
Feedback please!
Please report any issue on github and follow pg chameleon on twitter for the
announcements.
https://github.com/the4thdoctor/pg chameleon
@pg chameleon
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 40 / 46
Did you say hire?
WE ARE HIRING!
https://transferwise.com/jobs/
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 41 / 46
That’s all folks!
Thank you for listening!
Any questions?
Please be very basic, I’m just an electrician after all.
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 42 / 46
Image credits
Palpatine,Dr. Evil disclaimer,It could work. Young Frankenstein source
memegenerator
MySQL Image source, WikiCommons
Hard Disk image, source WikiCommons
Tron image, source Tron Wikia
Twitter icon, source Open Icon Library
The PostgreSQL logo, copyright the PostgreSQL global development group
Boromir get rid of mysql, source imgflip
Morpheus, source imgflip
Keep calm chameleon, source imgflip
The dolphin picture - Copyright artnoose
Perseus, Framed - Copyright Federico Campoli
Pinkie Pie that’s all folks, Copyright by dan232323, used with permission
Doom, source RetroPie
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 43 / 46
License
This document is distributed under the terms of the Creative Commons
Attribution, Not Commercial, Share Alike
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 44 / 46
pg chameleon
MySQL to PostgreSQL replica made easy
Federico Campoli
Transferwise
PGCon, Ottawa
01 Jun 2018
http://www.pgdba.org
@4thdoctor scarf
Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 45 / 46

More Related Content

What's hot

Understanding PostgreSQL LW Locks
Understanding PostgreSQL LW LocksUnderstanding PostgreSQL LW Locks
Understanding PostgreSQL LW Locks
Jignesh Shah
 
patroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymentpatroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deployment
hyeongchae lee
 
Patroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companionPatroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companion
Alexander Kukushkin
 

What's hot (20)

MySQL InnoDB Clusterによる高可用性構成(DB Tech Showcase 2017)
MySQL InnoDB Clusterによる高可用性構成(DB Tech Showcase 2017)MySQL InnoDB Clusterによる高可用性構成(DB Tech Showcase 2017)
MySQL InnoDB Clusterによる高可用性構成(DB Tech Showcase 2017)
 
Untangling Cluster Management with Helix
Untangling Cluster Management with HelixUntangling Cluster Management with Helix
Untangling Cluster Management with Helix
 
MariaDB MaxScale monitor 매뉴얼
MariaDB MaxScale monitor 매뉴얼MariaDB MaxScale monitor 매뉴얼
MariaDB MaxScale monitor 매뉴얼
 
ChatGPTのデータソースにPostgreSQLを使う[詳細版](オープンデベロッパーズカンファレンス2023 発表資料)
ChatGPTのデータソースにPostgreSQLを使う[詳細版](オープンデベロッパーズカンファレンス2023 発表資料)ChatGPTのデータソースにPostgreSQLを使う[詳細版](オープンデベロッパーズカンファレンス2023 発表資料)
ChatGPTのデータソースにPostgreSQLを使う[詳細版](オープンデベロッパーズカンファレンス2023 発表資料)
 
MySQL Innovation Day Chicago - MySQL HA So Easy : That's insane !!
MySQL Innovation Day Chicago  - MySQL HA So Easy : That's insane !!MySQL Innovation Day Chicago  - MySQL HA So Easy : That's insane !!
MySQL Innovation Day Chicago - MySQL HA So Easy : That's insane !!
 
High Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniHigh Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando Patroni
 
Understanding PostgreSQL LW Locks
Understanding PostgreSQL LW LocksUnderstanding PostgreSQL LW Locks
Understanding PostgreSQL LW Locks
 
Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability
 
patroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deploymentpatroni-based citrus high availability environment deployment
patroni-based citrus high availability environment deployment
 
Openstack live migration
Openstack live migrationOpenstack live migration
Openstack live migration
 
Pacemakerを使いこなそう
Pacemakerを使いこなそうPacemakerを使いこなそう
Pacemakerを使いこなそう
 
Git branching strategies
Git branching strategiesGit branching strategies
Git branching strategies
 
Anthos Security: modernize your security posture for cloud native applications
Anthos Security: modernize your security posture for cloud native applicationsAnthos Security: modernize your security posture for cloud native applications
Anthos Security: modernize your security posture for cloud native applications
 
Pacemaker + PostgreSQL レプリケーション構成(PG-REX)のフェイルオーバー高速化
Pacemaker + PostgreSQL レプリケーション構成(PG-REX)のフェイルオーバー高速化Pacemaker + PostgreSQL レプリケーション構成(PG-REX)のフェイルオーバー高速化
Pacemaker + PostgreSQL レプリケーション構成(PG-REX)のフェイルオーバー高速化
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
 
Patroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companionPatroni: Kubernetes-native PostgreSQL companion
Patroni: Kubernetes-native PostgreSQL companion
 
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docxKeepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
Keepalived+MaxScale+MariaDB_운영매뉴얼_1.0.docx
 
Making Cassandra more capable, faster, and more reliable (at ApacheCon@Home 2...
Making Cassandra more capable, faster, and more reliable (at ApacheCon@Home 2...Making Cassandra more capable, faster, and more reliable (at ApacheCon@Home 2...
Making Cassandra more capable, faster, and more reliable (at ApacheCon@Home 2...
 
Innodb Deep Talk #2 でお話したスライド
Innodb Deep Talk #2 でお話したスライドInnodb Deep Talk #2 でお話したスライド
Innodb Deep Talk #2 でお話したスライド
 
PostgreSQL : Introduction
PostgreSQL : IntroductionPostgreSQL : Introduction
PostgreSQL : Introduction
 

Similar to pg_chameleon MySQL to PostgreSQL replica made easy

pgpool: Features and Development
pgpool: Features and Developmentpgpool: Features and Development
pgpool: Features and Development
elliando dias
 
Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2
ice799
 

Similar to pg_chameleon MySQL to PostgreSQL replica made easy (20)

Pg chameleon MySQL to PostgreSQL replica
Pg chameleon MySQL to PostgreSQL replicaPg chameleon MySQL to PostgreSQL replica
Pg chameleon MySQL to PostgreSQL replica
 
pg_chameleon a MySQL to PostgreSQL replica
pg_chameleon a MySQL to PostgreSQL replicapg_chameleon a MySQL to PostgreSQL replica
pg_chameleon a MySQL to PostgreSQL replica
 
The ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseThe ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in Transwerwise
 
PostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) AcidPostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) Acid
 
PostgreSQL - backup and recovery with large databases
PostgreSQL - backup and recovery with large databasesPostgreSQL - backup and recovery with large databases
PostgreSQL - backup and recovery with large databases
 
The ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseThe ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in Transwerwise
 
a look at the postgresql engine
a look at the postgresql enginea look at the postgresql engine
a look at the postgresql engine
 
Hitchikers guide handout
Hitchikers guide handoutHitchikers guide handout
Hitchikers guide handout
 
A couple of things about PostgreSQL...
A couple of things  about PostgreSQL...A couple of things  about PostgreSQL...
A couple of things about PostgreSQL...
 
The hitchhiker's guide to PostgreSQL
The hitchhiker's guide to PostgreSQLThe hitchhiker's guide to PostgreSQL
The hitchhiker's guide to PostgreSQL
 
Opencast Architecture
Opencast ArchitectureOpencast Architecture
Opencast Architecture
 
pgpool: Features and Development
pgpool: Features and Developmentpgpool: Features and Development
pgpool: Features and Development
 
Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016
Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016 Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016
Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016
 
Distributed System explained (with Java Microservices)
Distributed System explained (with Java Microservices)Distributed System explained (with Java Microservices)
Distributed System explained (with Java Microservices)
 
JPA Week3 Entity Mapping / Hexagonal Architecture
JPA Week3 Entity Mapping / Hexagonal ArchitectureJPA Week3 Entity Mapping / Hexagonal Architecture
JPA Week3 Entity Mapping / Hexagonal Architecture
 
Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2Infrastructure as code might be literally impossible part 2
Infrastructure as code might be literally impossible part 2
 
Migrating PostgreSQL to the Cloud
Migrating PostgreSQL to the CloudMigrating PostgreSQL to the Cloud
Migrating PostgreSQL to the Cloud
 
Pg big fast ugly acid
Pg big fast ugly acidPg big fast ugly acid
Pg big fast ugly acid
 
Puppetconf 2015 - Puppet Reporting with Elasticsearch Logstash and Kibana
Puppetconf 2015 - Puppet Reporting with Elasticsearch Logstash and KibanaPuppetconf 2015 - Puppet Reporting with Elasticsearch Logstash and Kibana
Puppetconf 2015 - Puppet Reporting with Elasticsearch Logstash and Kibana
 
MySQL Software Repositories
MySQL Software RepositoriesMySQL Software Repositories
MySQL Software Repositories
 

More from Federico Campoli

More from Federico Campoli (8)

Pg chameleon, mysql to postgresql replica made easy
Pg chameleon, mysql to postgresql replica made easyPg chameleon, mysql to postgresql replica made easy
Pg chameleon, mysql to postgresql replica made easy
 
Life on a_rollercoaster
Life on a_rollercoasterLife on a_rollercoaster
Life on a_rollercoaster
 
Backup recovery with PostgreSQL
Backup recovery with PostgreSQLBackup recovery with PostgreSQL
Backup recovery with PostgreSQL
 
Don't panic! - Postgres introduction
Don't panic! - Postgres introductionDon't panic! - Postgres introduction
Don't panic! - Postgres introduction
 
Streaming replication
Streaming replicationStreaming replication
Streaming replication
 
PostgreSql query planning and tuning
PostgreSql query planning and tuningPostgreSql query planning and tuning
PostgreSql query planning and tuning
 
PostgreSQL, The Big, The Fast and The Ugly
PostgreSQL, The Big, The Fast and The UglyPostgreSQL, The Big, The Fast and The Ugly
PostgreSQL, The Big, The Fast and The Ugly
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

pg_chameleon MySQL to PostgreSQL replica made easy

  • 1. pg chameleon MySQL to PostgreSQL replica made easy Federico Campoli Transferwise PGCon, Ottawa 01 Jun 2018 http://www.pgdba.org @4thdoctor scarf Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 1 / 46
  • 2. Few words about the speaker Born in 1972 Passionate about IT since 1982 mostly because of the TRON movie Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 2 / 46
  • 3. Few words about the speaker Born in 1972 Passionate about IT since 1982 mostly because of the TRON movie Joined the Oracle DBA secret society in 2004 In love with PostgreSQL since 2006 Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 2 / 46
  • 4. Few words about the speaker Born in 1972 Passionate about IT since 1982 mostly because of the TRON movie Joined the Oracle DBA secret society in 2004 In love with PostgreSQL since 2006 Devrim PostgreSQL tattoo’s copycat Works at Transferwise as Data Engineer Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 2 / 46
  • 5. Disclaimer I’m not a developer I’m a DBA... Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 3 / 46
  • 6. Disclaimer I’m not a developer I’m a DBA...which means being hated by everybody and hating everybody Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 3 / 46
  • 7. Disclaimer I’m not a developer I’m a DBA...which means being hated by everybody and hating everybody So, to put things in the right perspective... Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 3 / 46
  • 8. Disclaimer I’m not a developer I’m a DBA...which means being hated by everybody and hating everybody So, to put things in the right perspective...I use tabs Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 3 / 46
  • 9. Palpatine Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 4 / 46
  • 10. Table of contents 1 History 2 MySQL Replica in a nutshell 3 A chameleon in the middle 4 Replica in action 5 Lessons learned 6 Wrap up Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 5 / 46
  • 11. History Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 6 / 46
  • 12. The beginnings Years 2006/2012 neo my2pg.py I wrote the script because of a struggling phpbb on MySQL The database migration was successful However phpbb didn’t work very well with PostgreSQL.1 1Opening a new connection for each query is not the smartest thing to do. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 7 / 46
  • 13. The beginnings Years 2006/2012 neo my2pg.py I wrote the script because of a struggling phpbb on MySQL The database migration was successful However phpbb didn’t work very well with PostgreSQL.1 The script is written in python 2.6 It’s a monolith script And it’s slow, very slow 1Opening a new connection for each query is not the smartest thing to do. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 7 / 46
  • 14. The beginnings Years 2006/2012 neo my2pg.py I wrote the script because of a struggling phpbb on MySQL The database migration was successful However phpbb didn’t work very well with PostgreSQL.1 The script is written in python 2.6 It’s a monolith script And it’s slow, very slow It’s a good checklist for things to avoid when coding https://github.com/the4thdoctor/neo my2pg 1Opening a new connection for each query is not the smartest thing to do. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 7 / 46
  • 15. I’m not scared of using the ORMs Years 2013/2015 First attempt of pg chameleon Developed in Python 2.7 Used SQLAlchemy for extracting the MySQL’s metadata Proof of concept only It was built during the years of the life on a roller coaster2 Therefore it was a just a way to discharge frustration 2Recording available here: http://www.pgbrighton.uk/post/backup recovery/ Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 8 / 46
  • 16. I’m not scared of using the ORMs Years 2013/2015 First attempt of pg chameleon Developed in Python 2.7 Used SQLAlchemy for extracting the MySQL’s metadata Proof of concept only It was built during the years of the life on a roller coaster2 Therefore it was a just a way to discharge frustration Abandoned after a while SQLAlchemy’s limitations were frustrating as well (see slide 3) And pgloader did the same job much much better 2Recording available here: http://www.pgbrighton.uk/post/backup recovery/ Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 8 / 46
  • 17. pg chameleon reborn Year 2016 I needed to replicate the data data from MySQL to PostgreSQL http://tech.transferwise.com/scaling-our-analytics-database/ Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 9 / 46
  • 18. pg chameleon reborn Year 2016 I needed to replicate the data data from MySQL to PostgreSQL http://tech.transferwise.com/scaling-our-analytics-database/ The amazing library python-mysql-replication allowed me build a proof of concept Evolved later in pg chameleon 1.x Kudos to the python-mysql-replication team! https://github.com/noplay/python-mysql-replication Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 9 / 46
  • 19. pg chameleon 1.x Developed on the London to Brighton commute Released as stable the 7th May 2017 Followed by 8 bugfix releases Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 10 / 46
  • 20. pg chameleon 1.x Developed on the London to Brighton commute Released as stable the 7th May 2017 Followed by 8 bugfix releases Compatible with CPython 2.7/3.3+ No more SQLAlchemy The MySQL driver changed from MySQLdb to PyMySQL Command line helper Supports type override on the fly (Danger!) Installs in virtualenv and system wide via pypi Can detach the replica for minimal downtime migrations Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 10 / 46
  • 21. pg chameleon versions 1’s limitations All the affected tables are locked in read only mode during the init replica process During the init replica the data is not accessible Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 11 / 46
  • 22. pg chameleon versions 1’s limitations All the affected tables are locked in read only mode during the init replica process During the init replica the data is not accessible The tables for being replicated require primary keys No daemon, the process always stays in foreground Single schema replica One process per each schema Network inefficient Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 11 / 46
  • 23. pg chameleon versions 1’s limitations All the affected tables are locked in read only mode during the init replica process During the init replica the data is not accessible The tables for being replicated require primary keys No daemon, the process always stays in foreground Single schema replica One process per each schema Network inefficient Read and replay not concurrent with risk of high lag The optional threaded mode very inefficient and fragile A single error in the replay process and the replica is broken Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 11 / 46
  • 24. MySQL Replica in a nutshell Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 12 / 46
  • 25. MySQL Replica The MySQL replica is logical When the replica is enabled the data changes are stored in the master’s binary log files The slave gets from the master’s binary log files The slave saves the stream of data into local relay logs The relay logs are replayed against the slave Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 13 / 46
  • 26. MySQL Replica Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 14 / 46
  • 27. Log formats MySQL have three ways of storing the changes in the binary logs. STATEMENT: It logs the statements which are replayed on the slave. It’s the best solution for the bandwidth. However, when replaying statements with not deterministic functions this format generates different values on the slave (e.g. using an insert with a column autogenerated by the uuid function). ROW: It’s deterministic. This format logs the row images. MIXED takes the best of both worlds. The master logs the statements unless a not deterministic function is used. In that case it logs the row image. All three formats always log the DDL query events. The python-mysql-replication library and therefore pg chameleon, require the ROW format to work properly. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 15 / 46
  • 28. A chameleon in the middle Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 16 / 46
  • 29. pg chameleon pg chameleon mimics a mysql slave’s behaviour It performs the initial load for the replicated tables It connects to the MySQL replica protocol It stores the row images into a PostgreSQL table A plpgSQL function decodes the rows and replay the changes It can detach the replica for minimal downtime migrations PostgreSQL acts as relay log and replication slave Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 17 / 46
  • 30. MySQL replica + pg chameleon Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 18 / 46
  • 31. pg chameleon 2.0 #1 Developed at the pgconf.eu 2017 and on the commute Released as stable the 1st of January 2018 Compatible with python 3.3+ Installs in virtualenv and system wide via pypi Replicates multiple schemas from a single MySQL into a target PostgreSQL database Conservative approach to the replica. Tables which generate errors are automatically excluded from the replica Daemonised replica process with two distinct subprocesses, for concurrent read and replay Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 19 / 46
  • 32. pg chameleon 2.0 #2 Soft locking replica initialisation. The tables are locked only during the copy. Rollbar integration for a simpler error detection and messaging Experimental support for the PostgreSQL source type The tables are loaded in a separate schema which is swapped with the existing. This approach requires more space but it makes the init a replica virtually painless, leaving the old data accessible until the init replica is complete. The DDL are translated in the PostgreSQL dialect keeping the schema in sync with MySQL automatically Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 20 / 46
  • 33. Version 2.0’s limitations Tables for being replicated require primary or unique keys When detaching the replica the foreign keys are created always ON DELETE/UPDATE RESTRICT The source type PostgreSQL supports only the init replica process Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 21 / 46
  • 34. Replica initialisation The replica initialisation follows the same workflow as stated on the mysql online manual. Flush the tables with read lock Get the master’s coordinates Copy the data Release the locks However... Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 22 / 46
  • 35. Replica initialisation The replica initialisation follows the same workflow as stated on the mysql online manual. Flush the tables with read lock Get the master’s coordinates Copy the data Release the locks However... pg chameleon flushes the tables with read lock one by one. The lock is held only during the copy. The log coordinates are stored in the replica catalogue along the table’s name and used by the replica process to determine whether the table’s binlog data should be used or not. The replica starts inconsistent and gains consistency over time. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 22 / 46
  • 36. Fallback on failure The data is pulled from mysql using the CSV format in slices. This approach prevents the memory overload. Once the file is saved then is pushed into PostgreSQL using the COPY command. However... Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 23 / 46
  • 37. Fallback on failure The data is pulled from mysql using the CSV format in slices. This approach prevents the memory overload. Once the file is saved then is pushed into PostgreSQL using the COPY command. However... COPY is fast but is single transaction One failure and the entire batch is rolled back If this happens the procedure loads the same data using the INSERT statements Which can be very slow The process attempts to clean the NUL markers which are allowed by MySQL If the row still fails on insert then it’s discarded Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 23 / 46
  • 38. Replica in action Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 24 / 46
  • 39. MySQL configuration The mysql configuration file is usually stored in /etc/mysql/my.cnf To enable the binary logging find the section [mysqld] and check that the following parameters are set. binlog_format= ROW log-bin = mysql-bin server-id = 1 binlog-row-image = FULL Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 25 / 46
  • 40. MySQL user for replica Setup a replication user on MySQL CREATE USER usr_replica ; SET PASSWORD FOR usr_replica =PASSWORD(’replica ’); GRANT ALL ON sakila .* TO ’usr_replica ’; GRANT RELOAD ON *.* to ’usr_replica ’; GRANT REPLICATION CLIENT ON *.* to ’usr_replica ’; GRANT REPLICATION SLAVE ON *.* to ’usr_replica ’; FLUSH PRIVILEGES; In our example we are using the sakila test database. https://dev.mysql.com/doc/sakila/en/ Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 26 / 46
  • 41. PostgreSQL setup Add an user on PostgreSQL capable to create schemas and relations in the destination database CREATE USER usr_replica WITH PASSWORD ’replica ’; CREATE DATABASE db_replica WITH OWNER usr_replica; Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 27 / 46
  • 42. Install pg chameleon Install pg chameleon and create the configuration files pip install pip --upgrade pip install pg_chameleon chameleon set_configuration_files cd ~/.pg_chameleon/configuration cp config-example.yml default.yml Edit the file default.yml setting the correct values for connection and source. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 28 / 46
  • 43. Configure global settings in default.yaml PostgreSQL Connection pg conn: host: " localhost " p or t : " 5432 " u s e r : " usr_replica " password: " replica " database: " db_replica " c h a r s e t : " utf8 " Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 29 / 46
  • 44. Configure global settings in default.yaml PostgreSQL Connection pg conn: host: " localhost " p or t : " 5432 " u s e r : " usr_replica " password: " replica " database: " db_replica " c h a r s e t : " utf8 " Rollbar configuration r o l l b a r k e y : ’< rollbar_long_key>’ r o l l b a r e n v : ’pgcon - demo ’ Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 29 / 46
  • 45. Configure global settings in default.yaml PostgreSQL Connection pg conn: host: " localhost " p or t : " 5432 " u s e r : " usr_replica " password: " replica " database: " db_replica " c h a r s e t : " utf8 " Rollbar configuration r o l l b a r k e y : ’< rollbar_long_key>’ r o l l b a r e n v : ’pgcon - demo ’ Type override (optional) t y p e o v e r r i d e : " tinyint (1) ": o v e r r i d e t o : b o o l e a n o v e r r i d e t a b l e s : - "*" Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 29 / 46
  • 46. Configure the mysql source s o u r c e s : mysql: db conn: host: " localhost " po r t : " 3306 " u s e r : " usr_replica " password: " replica " c h a r s e t : ’utf8 ’ connect timeout: 10 Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 30 / 46
  • 47. Configure the mysql source s o u r c e s : mysql: db conn: host: " localhost " po r t : " 3306 " u s e r : " usr_replica " password: " replica " c h a r s e t : ’utf8 ’ connect timeout: 10 schema mappings: s a k i l a : l o x o d o n t a a f r i c a n a Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 30 / 46
  • 48. Configure the mysql source s o u r c e s : mysql: db conn: host: " localhost " po r t : " 3306 " u s e r : " usr_replica " password: " replica " c h a r s e t : ’utf8 ’ connect timeout: 10 schema mappings: s a k i l a : l o x o d o n t a a f r i c a n a l i m i t t a b l e s : s k i p t a b l e s : g r a n t s e l e c t t o : - u s r r e a d o n l y l o c k t i m e o u t : " 120 s" m y s e r v e r i d : 100 r e p l i c a b a t c h s i z e : 10000 rep l ay max row s: 10000 b a t c h r e t e n t i o n : ’1 day ’ copy max memory: " 300 M" copy mode: ’file ’ o u t d i r : /tmp s l e e p l o o p : 1 o n e r r o r r e p l a y : c o n t i n u e o n e r r o r r e a d : c o n t i n u e auto maintenance: "1 day " type: mysql Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 30 / 46
  • 49. Add the source and initialise the replica Add the source mysql and initialise the replica for it. We are using debug in order to get the logging on the console. chameleon create_replica_schema --debug chameleon add_source --config default --source mysql --debug chameleon init_replica --config default --source mysql --debug Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 31 / 46
  • 50. Start the replica Start the replica process chameleon start_replica --config default --source mysql Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 32 / 46
  • 51. Start the replica Start the replica process chameleon start_replica --config default --source mysql Show the replica status chameleon show_status --config default --source mysql Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 32 / 46
  • 52. Time for a demo Demo! The demo will fail miserably for sure and you will hate this project forever. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 33 / 46
  • 53. Lessons learned Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 34 / 46
  • 54. Strictness is an illusion. MySQL doubly so MySQL’s lack of strictness is not a mystery. The funny way the default with NOT NULL is managed by MySQL can break the replica. Therefore any field with NOT NULL added after the initialisation are created always as NULLable in PostgreSQL. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 35 / 46
  • 55. The DDL. A real pain in the back I initially tried to use sqlparse for tokenising the DDL emitted by MySQL. Unfortunately didn’t worked as I expected. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 36 / 46
  • 56. The DDL. A real pain in the back I initially tried to use sqlparse for tokenising the DDL emitted by MySQL. Unfortunately didn’t worked as I expected. So I decided to use the regular expressions. Some people, when confronted with a problem, think "I know, I’ll use regular expressions." Now they have two problems. -- Jamie Zawinski Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 36 / 46
  • 57. The DDL. A real pain in the back I initially tried to use sqlparse for tokenising the DDL emitted by MySQL. Unfortunately didn’t worked as I expected. So I decided to use the regular expressions. Some people, when confronted with a problem, think "I know, I’ll use regular expressions." Now they have two problems. -- Jamie Zawinski MySQL even in ROW format emits the DDL as statements The class sql token uses the regular expressions to tokenise the DDL The tokenised data is used to build the DDL in the PostgreSQL dialect Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 36 / 46
  • 58. Wrap up Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 37 / 46
  • 59. To boldly go where no chameleon has gone before Short team goals, version 2.0 Re sync automatically the tables when they error on replay Improve the replay speed and cpu efficiency GTID support for MySQL source Medium term goals version 2.1 Parallel copy and index creation in order to speed up the init replica process Logical replica from PostgreSQL Improve the default column handling Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 38 / 46
  • 60. Igor, the green little guy The chameleon logo has been developed by Elena Toma, a talented Italian Lady. https://www.facebook.com/Tonkipapperoart/ The name Igor is inspired by Martin Feldman’s Igor portraited in Young Frankenstein movie. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 39 / 46
  • 61. Feedback please! Please report any issue on github and follow pg chameleon on twitter for the announcements. https://github.com/the4thdoctor/pg chameleon @pg chameleon Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 40 / 46
  • 62. Did you say hire? WE ARE HIRING! https://transferwise.com/jobs/ Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 41 / 46
  • 63. That’s all folks! Thank you for listening! Any questions? Please be very basic, I’m just an electrician after all. Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 42 / 46
  • 64. Image credits Palpatine,Dr. Evil disclaimer,It could work. Young Frankenstein source memegenerator MySQL Image source, WikiCommons Hard Disk image, source WikiCommons Tron image, source Tron Wikia Twitter icon, source Open Icon Library The PostgreSQL logo, copyright the PostgreSQL global development group Boromir get rid of mysql, source imgflip Morpheus, source imgflip Keep calm chameleon, source imgflip The dolphin picture - Copyright artnoose Perseus, Framed - Copyright Federico Campoli Pinkie Pie that’s all folks, Copyright by dan232323, used with permission Doom, source RetroPie Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 43 / 46
  • 65. License This document is distributed under the terms of the Creative Commons Attribution, Not Commercial, Share Alike Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 44 / 46
  • 66. pg chameleon MySQL to PostgreSQL replica made easy Federico Campoli Transferwise PGCon, Ottawa 01 Jun 2018 http://www.pgdba.org @4thdoctor scarf Federico Campoli (Transferwise) pg chameleon PGCon, Ottawa01 Jun 2018 45 / 46