pg chameleon mysql to postgresql replica
TRANSCRIPT
pg chameleonMySQL to PostgreSQL lightweight replica
Federico Campoli
Brighton PostgreSQL Meetup
18 November 2016
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 1 / 44
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 The pg chameleon library
4 Caveats, traps, the usual political stuff...
5 Wrap up
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 2 / 44
Some history
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 The pg chameleon library
4 Caveats, traps, the usual political stuff...
5 Wrap up
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 3 / 44
Some history
The beginnings
Years 2006/2012
neo my2pg.py
Developed for helping a struggling phpbb
The database was successfully migrated from MySQL to PostgreSQL
The migration failed for other reasons
It’s written in python 2.6
It’s a monolith script
And it’s slow, very slow
You can use it as checklist for things to avoid when coding
https://github.com/the4thdoctor/neo my2pg
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 4 / 44
Some history
The beginnings
Years 2006/2012
neo my2pg.py
Developed for helping a struggling phpbb
The database was successfully migrated from MySQL to PostgreSQL
The migration failed for other reasons
It’s written in python 2.6
It’s a monolith script
And it’s slow, very slow
You can use it as checklist for things to avoid when coding
https://github.com/the4thdoctor/neo my2pg
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 4 / 44
Some history
The beginnings
Years 2006/2012
neo my2pg.py
Developed for helping a struggling phpbb
The database was successfully migrated from MySQL to PostgreSQL
The migration failed for other reasons
It’s written in python 2.6
It’s a monolith script
And it’s slow, very slow
You can use it as checklist for things to avoid when coding
https://github.com/the4thdoctor/neo my2pg
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 4 / 44
Some history
I’m not scared of using the ORMs
Years 2013/2015First attempt of pg chameleon
Developed in Python 2.7
SQLAlchemy was used for extracting the MySql’s metadata
Good proof of concept. No real hope to become usable
Built during the years of the roller coaster
It was a just a way to discharge frustration
Abandoned because pgloader did the same and better
The ORM limitations didn’t help to keep the project alive
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 5 / 44
Some history
I’m not scared of using the ORMs
Years 2013/2015First attempt of pg chameleon
Developed in Python 2.7
SQLAlchemy was used for extracting the MySql’s metadata
Good proof of concept. No real hope to become usable
Built during the years of the roller coaster
It was a just a way to discharge frustration
Abandoned because pgloader did the same and better
The ORM limitations didn’t help to keep the project alive
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 5 / 44
Some history
I’m not scared of using the ORMs
Years 2013/2015First attempt of pg chameleon
Developed in Python 2.7
SQLAlchemy was used for extracting the MySql’s metadata
Good proof of concept. No real hope to become usable
Built during the years of the roller coaster
It was a just a way to discharge frustration
Abandoned because pgloader did the same and better
The ORM limitations didn’t help to keep the project alive
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 5 / 44
Some history
pg chameleon reborn
Year 2016The project’s revamp the was triggered by a specific need.
What if were possible to replicate data from MySQL to PostgreSQL?
The library python-mysql-replication can decode the mysql replica when usingROW based.
Trying won’t harm they said.
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 6 / 44
Some history
pg chameleon reborn
Is still on Python 2.7
Removed SQLAlchemy
Switched the mysql driver to PyMySQL
The library python-mysql-replication reads the MySQL replica
Provides a basic command line
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 7 / 44
MySQL Replica in a nutshell
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 The pg chameleon library
4 Caveats, traps, the usual political stuff...
5 Wrap up
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 8 / 44
MySQL Replica in a nutshell
MySQL Replica
MySQL saves the logical data rather the physical
The data changes are stored in a local binary log
The slave saves in its local relay logs the replication data pulled from themaster
The slave read the local relay logs and replays the data
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 9 / 44
MySQL Replica in a nutshell
MySQL Replica
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 10 / 44
MySQL Replica in a nutshell
Log formats
STATEMENT format logs the statements which are replayed on the slave.It seems the best solution for performance.Replaying not deterministic functions generate inconsistent slaves (e.g. uuid).
ROW is deterministic. It logs the changed row and the DDL queries.This format is required for pg chameleon to work.
MIXED takes the best of both worlds. The master logs the statements unlessa not deterministic function is used. In that case it logs the row image.
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 11 / 44
MySQL Replica in a nutshell
A chameleon in the middle
pg chameleon mimics a mysql slave’s behaviour
Reads the replica
Stores the decoded rows into a PostgreSQL table
PostgreSQL acts as relay log and replication slave
A plpgSQL function decodes the rows and replay the changes
With an extra cool feature.
Initialise the PostgreSQL replica schema in just one command
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 12 / 44
MySQL Replica in a nutshell
A chameleon in the middle
pg chameleon mimics a mysql slave’s behaviour
Reads the replica
Stores the decoded rows into a PostgreSQL table
PostgreSQL acts as relay log and replication slave
A plpgSQL function decodes the rows and replay the changes
With an extra cool feature.
Initialise the PostgreSQL replica schema in just one command
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 12 / 44
MySQL Replica in a nutshell
A chameleon in the middle
pg chameleon mimics a mysql slave’s behaviour
Reads the replica
Stores the decoded rows into a PostgreSQL table
PostgreSQL acts as relay log and replication slave
A plpgSQL function decodes the rows and replay the changes
With an extra cool feature.
Initialise the PostgreSQL replica schema in just one command
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 12 / 44
MySQL Replica in a nutshell
MySQL replica + pg chameleon
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 13 / 44
The pg chameleon library
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 The pg chameleon library
4 Caveats, traps, the usual political stuff...
5 Wrap up
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 14 / 44
The pg chameleon library
Project structure
project directory
pg chameleon.py
config
config.yaml
logs
pg chameleon
lib
global lib.py
mysql lib.py
pg lib.py
sqlutil lib.py
sql
upgrade
create schema.sql
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 15 / 44
The pg chameleon library
pg chameleon.py
Command line wrapper
Use argparse to execute the commands
Can be simply extended to more commands
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 16 / 44
The pg chameleon library
pg chameleon.py
init replica copies the data from mysql and saves the master coordinates inpostgresthis command locks the mysql tables in read only mode during thecopy
start replica connects to the mysql master and replies the changes inPostgreSQL
create schema,drop schema,upgrade schema manual actions on thePostgreSQL service schemanot required in general because the init replica recreates the service schemafrom scratch.start replica runs the schema migrations if required before starting theprogram loop
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 17 / 44
The pg chameleon library
global lib.py
class global config: loads the config.yaml into the class attributes
class replica engine: wraps the mysql and pgsql class methods and setup thelogging method. a global config instance is created for getting theconfiguration settings
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 18 / 44
The pg chameleon library
mysql lib.py
class mysql connection: connects to mysql using the parameters provided byreplica engine
class mysql engine: does all the magic for the replication setup and execution
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 19 / 44
The pg chameleon library
mysql lib.py
class mysql engine
locks and release the tables for the init replica command
pulls out the data from mysql in csv format or insert statements
extracts the metadata from mysql’s information schema
copy the data into postgres using the class pg engine
fallsback to inserts if the copy fails for any reason
starts the replica stream using python-mysql-replication
decodes the replica events into a data dictionary which is saved by pg engine
when a replica binlog is read executes the postgres replay via pg engine
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 20 / 44
The pg chameleon library
pg lib.py
class pg encoder: extends the class JSON and adds some special handling fortypes like decimal and datetime
class pgsql connection: connects to the PostgreSQL database
class pgsql engine: does all the magic for rebuilding the data structure,loading data and migrating the schema
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 21 / 44
The pg chameleon library
pg lib.py
class pgsql engine
create and upgrade the service schema sch chameleon
builds the create statements for tables and indices using the metadataprovided by mysql engine
executes the create statements and register the mysql tables in sch chameleon
copy the data into the tables and fallsback to inserts if the copy fails
builds the primary keys and indices using the medatada provided bymysql engine
store the json data from the replica and executes the replay
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 22 / 44
The pg chameleon library
sqlutil lib.py
Consists in just one class sql token which tokenise the mysql queries to be used bypgsql engine for building the DDL in PostgreSQL’s dialect.
Currently under development
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 23 / 44
The pg chameleon library
config.yaml
my server id: the server id for the mysql replica. must be unique among thereplica cluster
copy max memory: the max amount of memory to use when copying thetable in PostgreSQL. Is possible to specify the value in (k)ilobytes,(M)egabytes, (G)igabytes adding the suffix (e.g. 300M)
my database: mysql database to replicate. a schema with the same name willbe initialised in the postgres database
pg database: destination database in PostgreSQL.
copy mode: the allowed values are ‘file’ and ‘direct’. With direct the copyhappens on the fly. With file the table is first dumped in a csv file thenreloaded in PostgreSQL.
hexify: is a yaml list with the data types that require coversion in hex (e.g.blob, binary). The conversion happens on the copy and on the replica.
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 24 / 44
The pg chameleon library
config.yaml
log dir: directory where the logs are stored
log level: logging verbosity. allowed values are debug, info, warning, error
log dest: log destination. stdout for debugging purposes, file for the normalactivity.
my charset mysql charset for the copy (please note the replica is always inutf8)
pg charset: PostgreSQL connection’s charset.
tables limit: yaml list with the tables to replicate. if empty the entire mysqldatabase is replicated.
sleep loop seconds between a new replica batch attempt
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 25 / 44
The pg chameleon library
config.yaml
MySQL connection parameters
mysql_conn:
host: localhost
port: 3306
user: replication_username
passwd: never_commit_passwords
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 26 / 44
The pg chameleon library
config.yaml
PostgreSQL connection parameters
pg_conn:
host: localhost
port: 5432
user: replication_username
password: never_commit_passwords
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 27 / 44
The pg chameleon library
MySQL replica configuration
The mysql configuration file is usually stored in /etc/mysql/my.cnfTo enable the binary logging find the section [mysqld] and check the followingparameters are set.
binlog format Has to be ROW for capturing the DML events
log-bin any name is good (e.g. mysql-bin)
server-id has to be a numerical value unique along the replication clusterThe value 1 is used for the master
binlog row image has to be full as required by the python-mysql-replicationlibrary
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 28 / 44
The pg chameleon library
MySQL setup
CREATE USER usr_replica ;SET PASSWORD FOR usr_replica=PASSWORD(’replica ’);GRANT ALL ON sakila .* TO ’usr_replica ’;GRANT RELOAD ON *.* to ’usr_replica ’;GRANT REPLICATION CLIENT ON *.* to ’usr_replica ’;GRANT REPLICATION SLAVE ON *.* to ’usr_replica ’;FLUSH PRIVILEGES;
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 29 / 44
The pg chameleon library
PostgreSQL setup
CREATE USER usr_replica WITH PASSWORD ’replica ’;CREATE DATABASE db_replica WITH OWNER usr_replica;
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 30 / 44
The pg chameleon library
Replica setup
Setup copy config-yaml.example in config.yaml and setup the configurationparameters
./pg_chameleon.py init_replica
Wait for the init replica completion then start the replica with
./pg_chameleon.py start_replica
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 31 / 44
Caveats, traps, the usual political stuff...
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 The pg chameleon library
4 Caveats, traps, the usual political stuff...
5 Wrap up
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 32 / 44
Caveats, traps, the usual political stuff...
Limitations
Tables for being replicated require primary keys
There is no cleanup for the rubbish accepted by mysql (e.g. nulls implicitlyconverted to 0)
No Daemonisation yet
Binary data are hexified to avoid issues with PostgreSQL
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 33 / 44
Caveats, traps, the usual political stuff...
What does it work
Replicate mysql schema into PostgreSQL
Locks the tables in mysql and gets the master coordinates
Create primary keys and indices on PostgreSQL
Write MySQL row events in PostgreSQL
Replay of the replicated data in PostgreSQL
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 34 / 44
Caveats, traps, the usual political stuff...
What does seem to work
Enum support
Binary import into bytea (hex conversion)
Initial copy based on copy to file or in memory
Fall back to inserts in case of rubbish data (slow)
Replication of CREATE and DROP TABLE statements
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 35 / 44
Caveats, traps, the usual political stuff...
What doesn’t work
replication of ALTER TABLE statements
Materialisation of the MySQL views
Foreign keys import in PostgreSQL
Daemonisation, background workers for replay, postgres extension
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 36 / 44
Wrap up
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 The pg chameleon library
4 Caveats, traps, the usual political stuff...
5 Wrap up
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 37 / 44
Wrap up
Igor, the green little guy
The chameleon logo has been developed by Elena Toma, a talented Italian Lady.
https://www.facebook.com/Tonkipapperoart/
The name Igor is inspired by Martin Feldman’s Igor portraited in YoungFrankenstein movie.
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 38 / 44
Wrap up
Some numbers
Lines of code
global lib.py 163
mysql lib.py 521
pg lib.py 557
sql util.py 208
create schema.sql 354
Total lines in libraries 1449
Total lines including SQL 1803
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 39 / 44
Wrap up
pg chameleon’s license
Old plain 2clause BSD License
Copyright (c) 2016, Federico CampoliAll rights reserved.Redistribution and use in source and binary forms, with or withoutmodification, are permitted provided that the following conditions are met:* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentationand/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THEIMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AREDISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLEFOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIALDAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS ORSERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVERCAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USEOF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 40 / 44
Wrap up
Please Test!
That’s all!
Please clone the repository, test and break the tool!
Report issues!
https://github.com/the4thdoctor/pg chameleon
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 41 / 44
Wrap up
Boring legal stuff
MySQL Image source WikiCommons
Hard Disk image source WikiCommons
Slonik logo, copyright PostgreSQL Global development group
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 42 / 44
Wrap up
Contacts and license
Twitter: 4thdoctor scarf
Blog:http://www.pgdba.co.uk
Brighton PostgreSQL Meetup:http://www.meetup.com/Brighton-PostgreSQL-Meetup/
This document is distributed under the terms of the Creative Commons
Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 43 / 44