pg_chameleon is a lightweight replication system written in python. The tool connects to the mysql replication protocol and replicates the data in PostgreSQL.
The author's tool will talk about the history, the logic behind the functions available and will give an interactive usage example.
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Pg chameleon MySQL to PostgreSQL replica
1. pg chameleon
MySQL to PostgreSQL lightweight replica
Federico Campoli
Brighton PostgreSQL Meetup
18 November 2016
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 1 / 44
2. Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 The pg chameleon library
4 Caveats, traps, the usual political stuff...
5 Wrap up
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 2 / 44
3. Some history
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 The pg chameleon library
4 Caveats, traps, the usual political stuff...
5 Wrap up
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 3 / 44
4. Some history
The beginnings
Years 2006/2012
neo my2pg.py
Developed for helping a struggling phpbb
The database was successfully migrated from MySQL to PostgreSQL
The migration failed for other reasons
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 4 / 44
5. Some history
The beginnings
Years 2006/2012
neo my2pg.py
Developed for helping a struggling phpbb
The database was successfully migrated from MySQL to PostgreSQL
The migration failed for other reasons
It’s written in python 2.6
It’s a monolith script
And it’s slow, very slow
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 4 / 44
6. Some history
The beginnings
Years 2006/2012
neo my2pg.py
Developed for helping a struggling phpbb
The database was successfully migrated from MySQL to PostgreSQL
The migration failed for other reasons
It’s written in python 2.6
It’s a monolith script
And it’s slow, very slow
You can use it as checklist for things to avoid when coding
https://github.com/the4thdoctor/neo my2pg
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 4 / 44
7. Some history
I’m not scared of using the ORMs
Years 2013/2015
First attempt of pg chameleon
Developed in Python 2.7
SQLAlchemy was used for extracting the MySql’s metadata
Good proof of concept. No real hope to become usable
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 5 / 44
8. Some history
I’m not scared of using the ORMs
Years 2013/2015
First attempt of pg chameleon
Developed in Python 2.7
SQLAlchemy was used for extracting the MySql’s metadata
Good proof of concept. No real hope to become usable
Built during the years of the roller coaster
It was a just a way to discharge frustration
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 5 / 44
9. Some history
I’m not scared of using the ORMs
Years 2013/2015
First attempt of pg chameleon
Developed in Python 2.7
SQLAlchemy was used for extracting the MySql’s metadata
Good proof of concept. No real hope to become usable
Built during the years of the roller coaster
It was a just a way to discharge frustration
Abandoned because pgloader did the same and better
The ORM limitations didn’t help to keep the project alive
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 5 / 44
10. Some history
pg chameleon reborn
Year 2016
The project’s revamp the was triggered by a specific need.
What if were possible to replicate data from MySQL to PostgreSQL?
The library python-mysql-replication can decode the mysql replica when using
ROW based.
Trying won’t harm they said.
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 6 / 44
11. Some history
pg chameleon reborn
Is still on Python 2.7
Removed SQLAlchemy
Switched the mysql driver to PyMySQL
The library python-mysql-replication reads the MySQL replica
Provides a basic command line
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 7 / 44
12. MySQL Replica in a nutshell
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 The pg chameleon library
4 Caveats, traps, the usual political stuff...
5 Wrap up
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 8 / 44
13. MySQL Replica in a nutshell
MySQL Replica
MySQL saves the logical data rather the physical
The data changes are stored in a local binary log
The slave saves in its local relay logs the replication data pulled from the
master
The slave read the local relay logs and replays the data
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 9 / 44
14. MySQL Replica in a nutshell
MySQL Replica
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 10 / 44
15. MySQL Replica in a nutshell
Log formats
STATEMENT format logs the statements which are replayed on the slave.
It seems the best solution for performance.
Replaying not deterministic functions generate inconsistent slaves (e.g. uuid).
ROW is deterministic. It logs the changed row and the DDL queries.
This format is required for pg chameleon to work.
MIXED takes the best of both worlds. The master logs the statements unless
a not deterministic function is used. In that case it logs the row image.
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 11 / 44
16. MySQL Replica in a nutshell
A chameleon in the middle
pg chameleon mimics a mysql slave’s behaviour
Reads the replica
Stores the decoded rows into a PostgreSQL table
PostgreSQL acts as relay log and replication slave
A plpgSQL function decodes the rows and replay the changes
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 12 / 44
17. MySQL Replica in a nutshell
A chameleon in the middle
pg chameleon mimics a mysql slave’s behaviour
Reads the replica
Stores the decoded rows into a PostgreSQL table
PostgreSQL acts as relay log and replication slave
A plpgSQL function decodes the rows and replay the changes
With an extra cool feature.
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 12 / 44
18. MySQL Replica in a nutshell
A chameleon in the middle
pg chameleon mimics a mysql slave’s behaviour
Reads the replica
Stores the decoded rows into a PostgreSQL table
PostgreSQL acts as relay log and replication slave
A plpgSQL function decodes the rows and replay the changes
With an extra cool feature.
Initialise the PostgreSQL replica schema in just one command
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 12 / 44
19. MySQL Replica in a nutshell
MySQL replica + pg chameleon
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 13 / 44
20. The pg chameleon library
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 The pg chameleon library
4 Caveats, traps, the usual political stuff...
5 Wrap up
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 14 / 44
22. The pg chameleon library
pg chameleon.py
Command line wrapper
Use argparse to execute the commands
Can be simply extended to more commands
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 16 / 44
23. The pg chameleon library
pg chameleon.py
init replica copies the data from mysql and saves the master coordinates in
postgres
this command locks the mysql tables in read only mode during the
copy
start replica connects to the mysql master and replies the changes in
PostgreSQL
create schema,drop schema,upgrade schema manual actions on the
PostgreSQL service schema
not required in general because the init replica recreates the service schema
from scratch.
start replica runs the schema migrations if required before starting the
program loop
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 17 / 44
24. The pg chameleon library
global lib.py
class global config: loads the config.yaml into the class attributes
class replica engine: wraps the mysql and pgsql class methods and setup the
logging method. a global config instance is created for getting the
configuration settings
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 18 / 44
25. The pg chameleon library
mysql lib.py
class mysql connection: connects to mysql using the parameters provided by
replica engine
class mysql engine: does all the magic for the replication setup and execution
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 19 / 44
26. The pg chameleon library
mysql lib.py
class mysql engine
locks and release the tables for the init replica command
pulls out the data from mysql in csv format or insert statements
extracts the metadata from mysql’s information schema
copy the data into postgres using the class pg engine
fallsback to inserts if the copy fails for any reason
starts the replica stream using python-mysql-replication
decodes the replica events into a data dictionary which is saved by pg engine
when a replica binlog is read executes the postgres replay via pg engine
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 20 / 44
27. The pg chameleon library
pg lib.py
class pg encoder: extends the class JSON and adds some special handling for
types like decimal and datetime
class pgsql connection: connects to the PostgreSQL database
class pgsql engine: does all the magic for rebuilding the data structure,
loading data and migrating the schema
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 21 / 44
28. The pg chameleon library
pg lib.py
class pgsql engine
create and upgrade the service schema sch chameleon
builds the create statements for tables and indices using the metadata
provided by mysql engine
executes the create statements and register the mysql tables in sch chameleon
copy the data into the tables and fallsback to inserts if the copy fails
builds the primary keys and indices using the medatada provided by
mysql engine
store the json data from the replica and executes the replay
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 22 / 44
29. The pg chameleon library
sqlutil lib.py
Consists in just one class sql token which tokenise the mysql queries to be used by
pgsql engine for building the DDL in PostgreSQL’s dialect.
Currently under development
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 23 / 44
30. The pg chameleon library
config.yaml
my server id: the server id for the mysql replica. must be unique among the
replica cluster
copy max memory: the max amount of memory to use when copying the
table in PostgreSQL. Is possible to specify the value in (k)ilobytes,
(M)egabytes, (G)igabytes adding the suffix (e.g. 300M)
my database: mysql database to replicate. a schema with the same name will
be initialised in the postgres database
pg database: destination database in PostgreSQL.
copy mode: the allowed values are ‘file’ and ‘direct’. With direct the copy
happens on the fly. With file the table is first dumped in a csv file then
reloaded in PostgreSQL.
hexify: is a yaml list with the data types that require coversion in hex (e.g.
blob, binary). The conversion happens on the copy and on the replica.
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 24 / 44
31. The pg chameleon library
config.yaml
log dir: directory where the logs are stored
log level: logging verbosity. allowed values are debug, info, warning, error
log dest: log destination. stdout for debugging purposes, file for the normal
activity.
my charset mysql charset for the copy (please note the replica is always in
utf8)
pg charset: PostgreSQL connection’s charset.
tables limit: yaml list with the tables to replicate. if empty the entire mysql
database is replicated.
sleep loop seconds between a new replica batch attempt
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 25 / 44
34. The pg chameleon library
MySQL replica configuration
The mysql configuration file is usually stored in /etc/mysql/my.cnf
To enable the binary logging find the section [mysqld] and check the following
parameters are set.
binlog format Has to be ROW for capturing the DML events
log-bin any name is good (e.g. mysql-bin)
server-id has to be a numerical value unique along the replication cluster
The value 1 is used for the master
binlog row image has to be full as required by the python-mysql-replication
library
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 28 / 44
35. The pg chameleon library
MySQL setup
CREATE USER usr_replica ;
SET PASSWORD FOR usr_replica =PASSWORD(’replica ’);
GRANT ALL ON sakila .* TO ’usr_replica ’;
GRANT RELOAD ON *.* to ’usr_replica ’;
GRANT REPLICATION CLIENT ON *.* to ’usr_replica ’;
GRANT REPLICATION SLAVE ON *.* to ’usr_replica ’;
FLUSH PRIVILEGES;
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 29 / 44
36. The pg chameleon library
PostgreSQL setup
CREATE USER usr_replica WITH PASSWORD ’replica ’;
CREATE DATABASE db_replica WITH OWNER usr_replica;
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 30 / 44
37. The pg chameleon library
Replica setup
Setup copy config-yaml.example in config.yaml and setup the configuration
parameters
./pg_chameleon.py init_replica
Wait for the init replica completion then start the replica with
./pg_chameleon.py start_replica
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 31 / 44
38. Caveats, traps, the usual political stuff...
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 The pg chameleon library
4 Caveats, traps, the usual political stuff...
5 Wrap up
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 32 / 44
39. Caveats, traps, the usual political stuff...
Limitations
Tables for being replicated require primary keys
There is no cleanup for the rubbish accepted by mysql (e.g. nulls implicitly
converted to 0)
No Daemonisation yet
Binary data are hexified to avoid issues with PostgreSQL
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 33 / 44
40. Caveats, traps, the usual political stuff...
What does it work
Replicate mysql schema into PostgreSQL
Locks the tables in mysql and gets the master coordinates
Create primary keys and indices on PostgreSQL
Write MySQL row events in PostgreSQL
Replay of the replicated data in PostgreSQL
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 34 / 44
41. Caveats, traps, the usual political stuff...
What does seem to work
Enum support
Binary import into bytea (hex conversion)
Initial copy based on copy to file or in memory
Fall back to inserts in case of rubbish data (slow)
Replication of CREATE and DROP TABLE statements
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 35 / 44
42. Caveats, traps, the usual political stuff...
What doesn’t work
replication of ALTER TABLE statements
Materialisation of the MySQL views
Foreign keys import in PostgreSQL
Daemonisation, background workers for replay, postgres extension
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 36 / 44
43. Wrap up
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 The pg chameleon library
4 Caveats, traps, the usual political stuff...
5 Wrap up
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 37 / 44
44. Wrap up
Igor, the green little guy
The chameleon logo has been developed by Elena Toma, a talented Italian Lady.
https://www.facebook.com/Tonkipapperoart/
The name Igor is inspired by Martin Feldman’s Igor portraited in Young
Frankenstein movie.
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 38 / 44
45. Wrap up
Some numbers
Lines of code
global lib.py 163
mysql lib.py 521
pg lib.py 557
sql util.py 208
create schema.sql 354
Total lines in libraries 1449
Total lines including SQL 1803
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 39 / 44
46. Wrap up
pg chameleon’s license
Old plain 2clause BSD License
Copyright (c) 2016, Federico Campoli
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 40 / 44
47. Wrap up
Please Test!
That’s all!
Please clone the repository, test and break the tool!
Report issues!
https://github.com/the4thdoctor/pg chameleon
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 41 / 44
48. Wrap up
Boring legal stuff
MySQL Image source WikiCommons
Hard Disk image source WikiCommons
Slonik logo, copyright PostgreSQL Global development group
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 42 / 44
49. Wrap up
Contacts and license
Twitter: 4thdoctor scarf
Blog:http://www.pgdba.co.uk
Brighton PostgreSQL Meetup:
http://www.meetup.com/Brighton-PostgreSQL-Meetup/
This document is distributed under the terms of the Creative Commons
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 43 / 44
50. Wrap up
pg chameleon
MySQL to PostgreSQL lightweight replica
Federico Campoli
Brighton PostgreSQL Meetup
18 November 2016
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 44 / 44