SlideShare a Scribd company logo
1 of 44
Download to read offline
pg_chameleon
Federico Campoli
Loxodata
Federico Campoli
Loxodata
MySQL to PostgreSQL replica made easy
2
Few words about the speaker
• Born in 1972
• Passionate about IT since 1982
• Joined the Oracle DBA secret society in 2004
• In love with PostgreSQL since 2006
• PostgreSQL tattoo on the right shoulder
• Works at Loxodata as PostgreSQL Consultant
3
Loxodata
A company built on 3 pillars
4
Loxodata
A comprehensive service offer
5
Agenda
• History
• The MySQL Replica
• pg_chameleon v2.0
• The replica in action
• Demo
• Wrap up
History
How the project startedHow the project started
7
The beginning
Years 2006/2012 neo_my2pg.py
● I wrote the script because of a struggling phpbb on MySQL
● The script is written in python 2.6
● It's a monolith script
● And it's slow, very slow
● It's a good checklist for things to avoid when coding
● https://github.com/the4thdoctor/neo_my2pg
8
I'm not scared of using the ORMs
Years 2013/2015 first attempt of pg_chameleon
● Developed in Python 2.7
● Used SQLAlchemy for extracting the MySQL's metadata
● Proof of concept only
● It was a just a way to discharge frustration
● Abandoned after a while
● SQLAlchemy's limitations were frustrating as well
● And pgloader did the same job much much better
9
pg_chameleon reborn
Year 2016
● I needed to replicate the data from MySQL to PostgreSQL
● http://tech.transferwise.com/scaling-our-analytics-database
● The amazing library python-mysql-replication allowed me build a proof of concept
● Which evolved later in pg_chameleon 1.x
Kudos to the python-mysql-replication team!
https://github.com/noplay/python-mysql-replication
10
pg_chameleon v1.x
● Developed on the London to Brighton commute
● Released as stable the 7th May 2017
● Followed by 8 bugfix releases
● Compatible with CPython 2.7/3.3+
● No more SQLAlchemy
● The MySQL driver changed from MySQLdb to PyMySQL
● Command line helper
● Supports type override on the fly (Danger!)
● Installs in virtualenv and system wide via pypi
● Can detach the replica for minimal downtime migrations
11
pg_chameleon v1.x limitations
● All the affected tables are locked in read only mode during the init_replica process
● During the init_replica the data is not accessible
● The tables for being replicated require primary keys
● No daemon, the process always stays in foreground
● Single schema replica
● One process per each schema
● Network inefficient
● Read and replay not concurrent with risk of high lag
● The optional threaded mode very inefficient and fragile
● A single error in the replay process and the replica is broken
The MySQL replica
Do I really need to do that?Do I really need to do that?
13
MySQL replica in a nutshell
● The MySQL replica is logical
● When the replica is enabled the data changes are stored in the master's binary log files
● The slave gets from the master's binary log files
● The slave saves the stream of data into local relay logs
● The relay logs are replayed against the slave
14
MySQL replica in a nutshell
15
The log formats
MySQL can store the changes in the binary logs in three different formats
● STATEMENT: It logs the statements which are replayed on the slave
It's the best solution for the bandwidth. However, when replaying statements with not deterministic
functions this format generates different values on the slave (e.g. using an insert with a column
autogenerated by the uuid function).
● ROW: It's deterministic. This format logs the row images.
● MIXED takes the best of both worlds. The master logs the statements unless a not deterministic
function is used. In that case it logs the row image.
● All three formats always log the DDL query events.
● The python-mysql-replication library used by pg_chameleon require the ROW format to work
properly.
pg_chameleon v2.0
Overview of the stable releaseOverview of the stable release
17
A chameleon in the middle
pg_chameleon mimics a mysql slave's behaviour
● It performs the initial load for the replicated tables
● It connects to the MySQL replica protocol
● It stores the row images into a PostgreSQL table
● A plpgSQL function decodes the rows and replay the changes
● It can detach the replica for minimal downtime migrations
PostgreSQL acts as relay log and replication slave
18
A chameleon in the middle
19
pg_chameleon v2.0
● Developed at the pgconf.eu 2017 and on the commute
● Released as stable the 1st of January 2018
● Compatible with python 3.3+
● Installs in virtualenv and system wide via pypi
● Replicates multiple schemas from a single MySQL into a target PostgreSQL database
● Conservative approach to the replica. Tables which generate errors are automatically excluded
from the replica
● Daemonised replica process with two distinct subprocesses, for concurrent read and replay
20
pg_chameleon v2.0
● Soft locking replica initialisation. The tables are locked only during the copy.
● Rollbar integration for a simpler error detection and messaging
● Experimental support for the PostgreSQL source type
● The tables are loaded in a separate schema which is swapped with the existing.
● This approach requires more space but it makes the init a replica virtually painless, leaving the old
data accessible until the init_replica is complete.
● The DDL are translated in the PostgreSQL dialect keeping the schema in sync with MySQL
automatically
● MySQL GTID support for switch across the replica cluster without need for init_replica
21
pg_chameleon v2.0 limitations
● The tables for being replicated require primary or unique keys
● When detaching the replica the foreign keys are created always ON DELETE/UPDATE RESTRICT
● The source type PostgreSQL supports only the init_replica process
● Problems on Amazon RDS with the json data type
● No support for MariaDB’s GTID
The replica in action
Setup pg_chameleon in few stepsSetup pg_chameleon in few steps
23
Replica initialisation
The replica initialisation follows the same workflow as stated on the mysql online manual.
● Flush the tables with read lock
● Get the master's coordinates
● Copy the data
● Release the locks
However...pg_chameleon flushes the tables with read lock one by one. The lock is held only during
the copy.
The log coordinates are stored in the replica catalogue along the table's name and used by the
replica process to determine whether the table's binlog data should be used or not.
24
Fallback on copy failure
During the init_replica, the data is pulled in slices from mysql in order to prevent the memory overload.
The saved file, in CSV format, is then loaded into PostgreSQL using the COPY command.
However...
● COPY is fast but is single transaction
● One error and the entire command is rolled back
● If this happens the procedure loads the same data using the INSERT statements
● Which can be very slow
● In case of failure during the insert pg_chameleon cleans the NUL markers allowed by MySQL but
not by PostgreSQL
● If the row still fails on insert then it's discarded
25
MySQL configuration
binlog_format= ROW
log-bin = mysql-bin
server-id = 1
binlog-row-image = FULL
The mysql configuration file on linux is usually stored in /etc/mysql/my.cnf
To enable the binary logging find the section [mysqld] and check that the
following parameters are set.
26
MySQL database user
CREATE USER usr_replica ;
SET PASSWORD FOR usr_replica=PASSWORD('replica');
GRANT ALL ON sakila.* TO 'usr_replica';
GRANT RELOAD ON *.* to 'usr_replica';
GRANT REPLICATION CLIENT ON *.* to 'usr_replica';
GRANT REPLICATION SLAVE ON *.* to 'usr_replica';
FLUSH PRIVILEGES;
Setup a MySQL database user for the replica
27
PostgreSQL database user
CREATE USER usr_replica WITH PASSWORD 'replica';
CREATE DATABASE db_replica WITH OWNER usr_replica;
Setup a PostgreSQL database user and a database owned by the freshly
created user
28
Install pg_chameleon
pip install pip --upgrade
pip install pg_chameleon
chameleon set_configuration_files
cd ~/.pg_chameleon/configuration
cp config-example.yml default.yml
Install pg_chameleon and create the configuration files.
Edit the file default.yml and set the correct values for connection and
source.
29
Global settings
pg_conn:
host: "localhost"
port: "5432"
user: "usr_replica"
password: "replica"
database: "db_replica"
charset: "utf8"
rollbar_key: '<rollbar_long_key>'
rollbar_env: 'demo'
type_override:
"tinyint(1)":
override_to: boolean
override_tables:
- "*"
30
MySQL source settings
sources:
mysql:
db_conn:
host: "localhost"
port: "3306"
user: "usr_replica"
password: "replica"
charset: 'utf8'
connect_timeout: 10
schema_mappings:
sakila: db_sakila
limit_tables:
skip_tables:
grant_select_to:
- usr_readonly
lock_timeout: "120s"
my_server_id: 100
31
MySQL source settings
replica_batch_size: 10000
replay_max_rows: 10000
batch_retention: '1 day'
copy_max_memory: "300M"
copy_mode: 'file'
out_dir: /tmp
sleep_loop: 1
on_error_replay: continue
on_error_read: continue
auto_maintenance: "1 day"
gtid_enable: No
type: mysql
32
MySQL source settings
skip_events:
insert:
- sakila.author #skips inserts on the table sakila.author
delete:
- sakila2 #skips deletes on schema sakila2
update:
33
Register the source and init the replica
Add the source mysql and initialise the replica for it.
We are using debug in order to get the logging on the console.
chameleon create_replica_schema --debug
chameleon add_source --config default --source mysql --debug
chameleon init_replica --config default --source mysql --debug
34
Start the replica for the source
chameleon start_replica --config default --source mysql
chameleon show_status --config default --source mysql
Demo
Which will fail miserably and you’ll hate this project foreverWhich will fail miserably and you’ll hate this project forever
Wrap up
Lesson learned, future development and thanksLesson learned, future development and thanks
37
NOT NULL and no default make slonik cry
The way the implicit defaults with NOT NULL are managed by MySQL
can break the replica on PostgreSQL.
Therefore any field with NOT NULL added after the init_replica is created
always as NULLable in PostgreSQL.
38
DDL a pain in the back
I initially tried to use sqlparse for tokenising the DDL emitted by MySQL.
Unfortunately didn't worked as I expected.
So I decided to use the regular expressions.
Some people, when confronted with a problem,
think "I know, I'll use regular expressions."
Now they have two problems.
-- Jamie Zawinski
●
MySQL even in ROW format emits the DDL as statements
●
The class sql_token uses the regular expressions to tokenise the DDL
●
The tokenised data is used to build the DDL in the PostgreSQL dialect
39
Future development
The development of the version 2.1 is started
Things that will likely appear in the not so distant future
●
Parallel copy and index creation in order to speed up the init_replica process
●
Logical replica from PostgreSQL
●
Improve the default column handling
●
Locale support
●
Auto sync for the tables removed from the replica
40
Igor, the little green guy
The chameleon logo has been developed by Elena Toma, a talented Italian Lady.
https://www.facebook.com/Tonkipapperoart/
The name Igor is inspired by Martin Feldman's Igor portraited in Young Frankenstein movie.
41
Feedback please!
Please report any issue on github and follow pg_chameleon on twitter for the announcements.
https://github.com/the4thdoctor/pg_chameleon
Twitter: @pg_chameleon
42
We are hiring!
Please send your CV to jobs@loxodata.com
43
Thank you for listening!
pg_chameleon
Federico Campoli
Loxodata
Federico Campoli
Loxodata
MySQL to PostgreSQL replica made easy

More Related Content

What's hot

Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdfOracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdfSrirakshaSrinivasan2
 
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재PgDay.Seoul
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Mydbops
 
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...ScaleGrid.io
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용I Goo Lee
 
Introduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparoundIntroduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparoundMasahiko Sawada
 
Postgresql Database Administration- Day3
Postgresql Database Administration- Day3Postgresql Database Administration- Day3
Postgresql Database Administration- Day3PoguttuezhiniVP
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PGConf APAC
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBMariaDB plc
 
Understanding Oracle GoldenGate 12c
Understanding Oracle GoldenGate 12cUnderstanding Oracle GoldenGate 12c
Understanding Oracle GoldenGate 12cIT Help Desk Inc
 
Oracle to Postgres Migration - part 1
Oracle to Postgres Migration - part 1Oracle to Postgres Migration - part 1
Oracle to Postgres Migration - part 1PgTraining
 
Understanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStoreUnderstanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStoreMariaDB plc
 
しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3
しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3
しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3オラクルエンジニア通信
 

What's hot (20)

Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdfOracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
Oracle_Multitenant_19c_-_All_About_Pluggable_D.pdf
 
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
 
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용MySQL 상태 메시지 분석 및 활용
MySQL 상태 메시지 분석 및 활용
 
Introduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparoundIntroduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparound
 
Postgresql Database Administration- Day3
Postgresql Database Administration- Day3Postgresql Database Administration- Day3
Postgresql Database Administration- Day3
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs PostgreSQL WAL for DBAs
PostgreSQL WAL for DBAs
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
 
Understanding Oracle GoldenGate 12c
Understanding Oracle GoldenGate 12cUnderstanding Oracle GoldenGate 12c
Understanding Oracle GoldenGate 12c
 
Oracle to Postgres Migration - part 1
Oracle to Postgres Migration - part 1Oracle to Postgres Migration - part 1
Oracle to Postgres Migration - part 1
 
Understanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStoreUnderstanding the architecture of MariaDB ColumnStore
Understanding the architecture of MariaDB ColumnStore
 
Introduction to Galera Cluster
Introduction to Galera ClusterIntroduction to Galera Cluster
Introduction to Galera Cluster
 
Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1
 
しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3
しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3
しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3
 
AWR and ASH Deep Dive
AWR and ASH Deep DiveAWR and ASH Deep Dive
AWR and ASH Deep Dive
 
Get to know PostgreSQL!
Get to know PostgreSQL!Get to know PostgreSQL!
Get to know PostgreSQL!
 

Similar to Pg chameleon, mysql to postgresql replica made easy

The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialJean-François Gagné
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerkuchinskaya
 
M|18 Battle of the Online Schema Change Methods
M|18 Battle of the Online Schema Change MethodsM|18 Battle of the Online Schema Change Methods
M|18 Battle of the Online Schema Change MethodsMariaDB plc
 
MySQL Live Migration - Common Scenarios
MySQL Live Migration - Common ScenariosMySQL Live Migration - Common Scenarios
MySQL Live Migration - Common ScenariosMydbops
 
The Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL DatabasesThe Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL DatabasesDave Stokes
 
MySQL Workbench for DFW Unix Users Group
MySQL Workbench for DFW Unix Users GroupMySQL Workbench for DFW Unix Users Group
MySQL Workbench for DFW Unix Users GroupDave Stokes
 
Upgrade to MySQL 5.6 without downtime
Upgrade to MySQL 5.6 without downtimeUpgrade to MySQL 5.6 without downtime
Upgrade to MySQL 5.6 without downtimeOlivier DASINI
 
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...VMware Tanzu
 
Pseudo GTID and Easy MySQL Replication Topology Management
Pseudo GTID and Easy MySQL Replication Topology ManagementPseudo GTID and Easy MySQL Replication Topology Management
Pseudo GTID and Easy MySQL Replication Topology ManagementShlomi Noach
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsJean-François Gagné
 
MongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-SetMongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-SetVivek Parihar
 
MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability SolutionsLenz Grimmer
 
Mysqlhacodebits20091203 1260184765-phpapp02
Mysqlhacodebits20091203 1260184765-phpapp02Mysqlhacodebits20091203 1260184765-phpapp02
Mysqlhacodebits20091203 1260184765-phpapp02Louis liu
 
MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability SolutionsLenz Grimmer
 
MySQL for Oracle DBAs
MySQL for Oracle DBAsMySQL for Oracle DBAs
MySQL for Oracle DBAsFromDual GmbH
 
Webinar Slides: Migrating to Galera Cluster
Webinar Slides: Migrating to Galera ClusterWebinar Slides: Migrating to Galera Cluster
Webinar Slides: Migrating to Galera ClusterSeveralnines
 
Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)Valerii Kravchuk
 
Troubleshooting MySQL from a MySQL Developer Perspective
Troubleshooting MySQL from a MySQL Developer PerspectiveTroubleshooting MySQL from a MySQL Developer Perspective
Troubleshooting MySQL from a MySQL Developer PerspectiveMarcelo Altmann
 

Similar to Pg chameleon, mysql to postgresql replica made easy (20)

The Accidental DBA
The Accidental DBAThe Accidental DBA
The Accidental DBA
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemaker
 
M|18 Battle of the Online Schema Change Methods
M|18 Battle of the Online Schema Change MethodsM|18 Battle of the Online Schema Change Methods
M|18 Battle of the Online Schema Change Methods
 
MySQL Live Migration - Common Scenarios
MySQL Live Migration - Common ScenariosMySQL Live Migration - Common Scenarios
MySQL Live Migration - Common Scenarios
 
The Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL DatabasesThe Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL Databases
 
MySQL Workbench for DFW Unix Users Group
MySQL Workbench for DFW Unix Users GroupMySQL Workbench for DFW Unix Users Group
MySQL Workbench for DFW Unix Users Group
 
Upgrade to MySQL 5.6 without downtime
Upgrade to MySQL 5.6 without downtimeUpgrade to MySQL 5.6 without downtime
Upgrade to MySQL 5.6 without downtime
 
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...Customize and Secure the Runtime and Dependencies of Your Procedural Language...
Customize and Secure the Runtime and Dependencies of Your Procedural Language...
 
Pseudo GTID and Easy MySQL Replication Topology Management
Pseudo GTID and Easy MySQL Replication Topology ManagementPseudo GTID and Easy MySQL Replication Topology Management
Pseudo GTID and Easy MySQL Replication Topology Management
 
MySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitationsMySQL Parallel Replication: inventory, use-case and limitations
MySQL Parallel Replication: inventory, use-case and limitations
 
MongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-SetMongoDb scalability and high availability with Replica-Set
MongoDb scalability and high availability with Replica-Set
 
MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability Solutions
 
Mysqlhacodebits20091203 1260184765-phpapp02
Mysqlhacodebits20091203 1260184765-phpapp02Mysqlhacodebits20091203 1260184765-phpapp02
Mysqlhacodebits20091203 1260184765-phpapp02
 
MySQL High Availability Solutions
MySQL High Availability SolutionsMySQL High Availability Solutions
MySQL High Availability Solutions
 
MySQL for Oracle DBAs
MySQL for Oracle DBAsMySQL for Oracle DBAs
MySQL for Oracle DBAs
 
Webinar Slides: Migrating to Galera Cluster
Webinar Slides: Migrating to Galera ClusterWebinar Slides: Migrating to Galera Cluster
Webinar Slides: Migrating to Galera Cluster
 
Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)Gdb basics for my sql db as (percona live europe 2019)
Gdb basics for my sql db as (percona live europe 2019)
 
HA with Galera
HA with GaleraHA with Galera
HA with Galera
 
Troubleshooting MySQL from a MySQL Developer Perspective
Troubleshooting MySQL from a MySQL Developer PerspectiveTroubleshooting MySQL from a MySQL Developer Perspective
Troubleshooting MySQL from a MySQL Developer Perspective
 

More from Federico Campoli

pg_chameleon a MySQL to PostgreSQL replica
pg_chameleon a MySQL to PostgreSQL replicapg_chameleon a MySQL to PostgreSQL replica
pg_chameleon a MySQL to PostgreSQL replicaFederico Campoli
 
The ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseThe ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
 
The ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseThe ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
 
Pg chameleon MySQL to PostgreSQL replica
Pg chameleon MySQL to PostgreSQL replicaPg chameleon MySQL to PostgreSQL replica
Pg chameleon MySQL to PostgreSQL replicaFederico Campoli
 
PostgreSQL - backup and recovery with large databases
PostgreSQL - backup and recovery with large databasesPostgreSQL - backup and recovery with large databases
PostgreSQL - backup and recovery with large databasesFederico Campoli
 
Backup recovery with PostgreSQL
Backup recovery with PostgreSQLBackup recovery with PostgreSQL
Backup recovery with PostgreSQLFederico Campoli
 
a look at the postgresql engine
a look at the postgresql enginea look at the postgresql engine
a look at the postgresql engineFederico Campoli
 
Don't panic! - Postgres introduction
Don't panic! - Postgres introductionDon't panic! - Postgres introduction
Don't panic! - Postgres introductionFederico Campoli
 
The hitchhiker's guide to PostgreSQL
The hitchhiker's guide to PostgreSQLThe hitchhiker's guide to PostgreSQL
The hitchhiker's guide to PostgreSQLFederico Campoli
 
PostgreSql query planning and tuning
PostgreSql query planning and tuningPostgreSql query planning and tuning
PostgreSql query planning and tuningFederico Campoli
 
PostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) AcidPostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) AcidFederico Campoli
 
PostgreSQL, The Big, The Fast and The Ugly
PostgreSQL, The Big, The Fast and The UglyPostgreSQL, The Big, The Fast and The Ugly
PostgreSQL, The Big, The Fast and The UglyFederico Campoli
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1Federico Campoli
 
A couple of things about PostgreSQL...
A couple of things  about PostgreSQL...A couple of things  about PostgreSQL...
A couple of things about PostgreSQL...Federico Campoli
 

More from Federico Campoli (18)

Hitchikers guide handout
Hitchikers guide handoutHitchikers guide handout
Hitchikers guide handout
 
pg_chameleon a MySQL to PostgreSQL replica
pg_chameleon a MySQL to PostgreSQL replicapg_chameleon a MySQL to PostgreSQL replica
pg_chameleon a MySQL to PostgreSQL replica
 
The ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseThe ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in Transwerwise
 
The ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in TranswerwiseThe ninja elephant, scaling the analytics database in Transwerwise
The ninja elephant, scaling the analytics database in Transwerwise
 
Pg chameleon MySQL to PostgreSQL replica
Pg chameleon MySQL to PostgreSQL replicaPg chameleon MySQL to PostgreSQL replica
Pg chameleon MySQL to PostgreSQL replica
 
Life on a_rollercoaster
Life on a_rollercoasterLife on a_rollercoaster
Life on a_rollercoaster
 
PostgreSQL - backup and recovery with large databases
PostgreSQL - backup and recovery with large databasesPostgreSQL - backup and recovery with large databases
PostgreSQL - backup and recovery with large databases
 
Backup recovery with PostgreSQL
Backup recovery with PostgreSQLBackup recovery with PostgreSQL
Backup recovery with PostgreSQL
 
a look at the postgresql engine
a look at the postgresql enginea look at the postgresql engine
a look at the postgresql engine
 
Don't panic! - Postgres introduction
Don't panic! - Postgres introductionDon't panic! - Postgres introduction
Don't panic! - Postgres introduction
 
The hitchhiker's guide to PostgreSQL
The hitchhiker's guide to PostgreSQLThe hitchhiker's guide to PostgreSQL
The hitchhiker's guide to PostgreSQL
 
Pg big fast ugly acid
Pg big fast ugly acidPg big fast ugly acid
Pg big fast ugly acid
 
Streaming replication
Streaming replicationStreaming replication
Streaming replication
 
PostgreSql query planning and tuning
PostgreSql query planning and tuningPostgreSql query planning and tuning
PostgreSql query planning and tuning
 
PostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) AcidPostgreSQL, the big the fast and the (NOSQL on) Acid
PostgreSQL, the big the fast and the (NOSQL on) Acid
 
PostgreSQL, The Big, The Fast and The Ugly
PostgreSQL, The Big, The Fast and The UglyPostgreSQL, The Big, The Fast and The Ugly
PostgreSQL, The Big, The Fast and The Ugly
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1
 
A couple of things about PostgreSQL...
A couple of things  about PostgreSQL...A couple of things  about PostgreSQL...
A couple of things about PostgreSQL...
 

Recently uploaded

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

Pg chameleon, mysql to postgresql replica made easy

  • 2. 2 Few words about the speaker • Born in 1972 • Passionate about IT since 1982 • Joined the Oracle DBA secret society in 2004 • In love with PostgreSQL since 2006 • PostgreSQL tattoo on the right shoulder • Works at Loxodata as PostgreSQL Consultant
  • 5. 5 Agenda • History • The MySQL Replica • pg_chameleon v2.0 • The replica in action • Demo • Wrap up
  • 6. History How the project startedHow the project started
  • 7. 7 The beginning Years 2006/2012 neo_my2pg.py ● I wrote the script because of a struggling phpbb on MySQL ● The script is written in python 2.6 ● It's a monolith script ● And it's slow, very slow ● It's a good checklist for things to avoid when coding ● https://github.com/the4thdoctor/neo_my2pg
  • 8. 8 I'm not scared of using the ORMs Years 2013/2015 first attempt of pg_chameleon ● Developed in Python 2.7 ● Used SQLAlchemy for extracting the MySQL's metadata ● Proof of concept only ● It was a just a way to discharge frustration ● Abandoned after a while ● SQLAlchemy's limitations were frustrating as well ● And pgloader did the same job much much better
  • 9. 9 pg_chameleon reborn Year 2016 ● I needed to replicate the data from MySQL to PostgreSQL ● http://tech.transferwise.com/scaling-our-analytics-database ● The amazing library python-mysql-replication allowed me build a proof of concept ● Which evolved later in pg_chameleon 1.x Kudos to the python-mysql-replication team! https://github.com/noplay/python-mysql-replication
  • 10. 10 pg_chameleon v1.x ● Developed on the London to Brighton commute ● Released as stable the 7th May 2017 ● Followed by 8 bugfix releases ● Compatible with CPython 2.7/3.3+ ● No more SQLAlchemy ● The MySQL driver changed from MySQLdb to PyMySQL ● Command line helper ● Supports type override on the fly (Danger!) ● Installs in virtualenv and system wide via pypi ● Can detach the replica for minimal downtime migrations
  • 11. 11 pg_chameleon v1.x limitations ● All the affected tables are locked in read only mode during the init_replica process ● During the init_replica the data is not accessible ● The tables for being replicated require primary keys ● No daemon, the process always stays in foreground ● Single schema replica ● One process per each schema ● Network inefficient ● Read and replay not concurrent with risk of high lag ● The optional threaded mode very inefficient and fragile ● A single error in the replay process and the replica is broken
  • 12. The MySQL replica Do I really need to do that?Do I really need to do that?
  • 13. 13 MySQL replica in a nutshell ● The MySQL replica is logical ● When the replica is enabled the data changes are stored in the master's binary log files ● The slave gets from the master's binary log files ● The slave saves the stream of data into local relay logs ● The relay logs are replayed against the slave
  • 14. 14 MySQL replica in a nutshell
  • 15. 15 The log formats MySQL can store the changes in the binary logs in three different formats ● STATEMENT: It logs the statements which are replayed on the slave It's the best solution for the bandwidth. However, when replaying statements with not deterministic functions this format generates different values on the slave (e.g. using an insert with a column autogenerated by the uuid function). ● ROW: It's deterministic. This format logs the row images. ● MIXED takes the best of both worlds. The master logs the statements unless a not deterministic function is used. In that case it logs the row image. ● All three formats always log the DDL query events. ● The python-mysql-replication library used by pg_chameleon require the ROW format to work properly.
  • 16. pg_chameleon v2.0 Overview of the stable releaseOverview of the stable release
  • 17. 17 A chameleon in the middle pg_chameleon mimics a mysql slave's behaviour ● It performs the initial load for the replicated tables ● It connects to the MySQL replica protocol ● It stores the row images into a PostgreSQL table ● A plpgSQL function decodes the rows and replay the changes ● It can detach the replica for minimal downtime migrations PostgreSQL acts as relay log and replication slave
  • 18. 18 A chameleon in the middle
  • 19. 19 pg_chameleon v2.0 ● Developed at the pgconf.eu 2017 and on the commute ● Released as stable the 1st of January 2018 ● Compatible with python 3.3+ ● Installs in virtualenv and system wide via pypi ● Replicates multiple schemas from a single MySQL into a target PostgreSQL database ● Conservative approach to the replica. Tables which generate errors are automatically excluded from the replica ● Daemonised replica process with two distinct subprocesses, for concurrent read and replay
  • 20. 20 pg_chameleon v2.0 ● Soft locking replica initialisation. The tables are locked only during the copy. ● Rollbar integration for a simpler error detection and messaging ● Experimental support for the PostgreSQL source type ● The tables are loaded in a separate schema which is swapped with the existing. ● This approach requires more space but it makes the init a replica virtually painless, leaving the old data accessible until the init_replica is complete. ● The DDL are translated in the PostgreSQL dialect keeping the schema in sync with MySQL automatically ● MySQL GTID support for switch across the replica cluster without need for init_replica
  • 21. 21 pg_chameleon v2.0 limitations ● The tables for being replicated require primary or unique keys ● When detaching the replica the foreign keys are created always ON DELETE/UPDATE RESTRICT ● The source type PostgreSQL supports only the init_replica process ● Problems on Amazon RDS with the json data type ● No support for MariaDB’s GTID
  • 22. The replica in action Setup pg_chameleon in few stepsSetup pg_chameleon in few steps
  • 23. 23 Replica initialisation The replica initialisation follows the same workflow as stated on the mysql online manual. ● Flush the tables with read lock ● Get the master's coordinates ● Copy the data ● Release the locks However...pg_chameleon flushes the tables with read lock one by one. The lock is held only during the copy. The log coordinates are stored in the replica catalogue along the table's name and used by the replica process to determine whether the table's binlog data should be used or not.
  • 24. 24 Fallback on copy failure During the init_replica, the data is pulled in slices from mysql in order to prevent the memory overload. The saved file, in CSV format, is then loaded into PostgreSQL using the COPY command. However... ● COPY is fast but is single transaction ● One error and the entire command is rolled back ● If this happens the procedure loads the same data using the INSERT statements ● Which can be very slow ● In case of failure during the insert pg_chameleon cleans the NUL markers allowed by MySQL but not by PostgreSQL ● If the row still fails on insert then it's discarded
  • 25. 25 MySQL configuration binlog_format= ROW log-bin = mysql-bin server-id = 1 binlog-row-image = FULL The mysql configuration file on linux is usually stored in /etc/mysql/my.cnf To enable the binary logging find the section [mysqld] and check that the following parameters are set.
  • 26. 26 MySQL database user CREATE USER usr_replica ; SET PASSWORD FOR usr_replica=PASSWORD('replica'); GRANT ALL ON sakila.* TO 'usr_replica'; GRANT RELOAD ON *.* to 'usr_replica'; GRANT REPLICATION CLIENT ON *.* to 'usr_replica'; GRANT REPLICATION SLAVE ON *.* to 'usr_replica'; FLUSH PRIVILEGES; Setup a MySQL database user for the replica
  • 27. 27 PostgreSQL database user CREATE USER usr_replica WITH PASSWORD 'replica'; CREATE DATABASE db_replica WITH OWNER usr_replica; Setup a PostgreSQL database user and a database owned by the freshly created user
  • 28. 28 Install pg_chameleon pip install pip --upgrade pip install pg_chameleon chameleon set_configuration_files cd ~/.pg_chameleon/configuration cp config-example.yml default.yml Install pg_chameleon and create the configuration files. Edit the file default.yml and set the correct values for connection and source.
  • 29. 29 Global settings pg_conn: host: "localhost" port: "5432" user: "usr_replica" password: "replica" database: "db_replica" charset: "utf8" rollbar_key: '<rollbar_long_key>' rollbar_env: 'demo' type_override: "tinyint(1)": override_to: boolean override_tables: - "*"
  • 30. 30 MySQL source settings sources: mysql: db_conn: host: "localhost" port: "3306" user: "usr_replica" password: "replica" charset: 'utf8' connect_timeout: 10 schema_mappings: sakila: db_sakila limit_tables: skip_tables: grant_select_to: - usr_readonly lock_timeout: "120s" my_server_id: 100
  • 31. 31 MySQL source settings replica_batch_size: 10000 replay_max_rows: 10000 batch_retention: '1 day' copy_max_memory: "300M" copy_mode: 'file' out_dir: /tmp sleep_loop: 1 on_error_replay: continue on_error_read: continue auto_maintenance: "1 day" gtid_enable: No type: mysql
  • 32. 32 MySQL source settings skip_events: insert: - sakila.author #skips inserts on the table sakila.author delete: - sakila2 #skips deletes on schema sakila2 update:
  • 33. 33 Register the source and init the replica Add the source mysql and initialise the replica for it. We are using debug in order to get the logging on the console. chameleon create_replica_schema --debug chameleon add_source --config default --source mysql --debug chameleon init_replica --config default --source mysql --debug
  • 34. 34 Start the replica for the source chameleon start_replica --config default --source mysql chameleon show_status --config default --source mysql
  • 35. Demo Which will fail miserably and you’ll hate this project foreverWhich will fail miserably and you’ll hate this project forever
  • 36. Wrap up Lesson learned, future development and thanksLesson learned, future development and thanks
  • 37. 37 NOT NULL and no default make slonik cry The way the implicit defaults with NOT NULL are managed by MySQL can break the replica on PostgreSQL. Therefore any field with NOT NULL added after the init_replica is created always as NULLable in PostgreSQL.
  • 38. 38 DDL a pain in the back I initially tried to use sqlparse for tokenising the DDL emitted by MySQL. Unfortunately didn't worked as I expected. So I decided to use the regular expressions. Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. -- Jamie Zawinski ● MySQL even in ROW format emits the DDL as statements ● The class sql_token uses the regular expressions to tokenise the DDL ● The tokenised data is used to build the DDL in the PostgreSQL dialect
  • 39. 39 Future development The development of the version 2.1 is started Things that will likely appear in the not so distant future ● Parallel copy and index creation in order to speed up the init_replica process ● Logical replica from PostgreSQL ● Improve the default column handling ● Locale support ● Auto sync for the tables removed from the replica
  • 40. 40 Igor, the little green guy The chameleon logo has been developed by Elena Toma, a talented Italian Lady. https://www.facebook.com/Tonkipapperoart/ The name Igor is inspired by Martin Feldman's Igor portraited in Young Frankenstein movie.
  • 41. 41 Feedback please! Please report any issue on github and follow pg_chameleon on twitter for the announcements. https://github.com/the4thdoctor/pg_chameleon Twitter: @pg_chameleon
  • 42. 42 We are hiring! Please send your CV to jobs@loxodata.com
  • 43. 43 Thank you for listening!