pg chameleon mysql to postgresql replica

50
pg chameleon MySQL to PostgreSQL lightweight replica Federico Campoli Brighton PostgreSQL Meetup 18 November 2016 Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 1 / 44

Upload: federico-campoli

Post on 09-Feb-2017

407 views

Category:

Technology


6 download

TRANSCRIPT

pg chameleonMySQL to PostgreSQL lightweight replica

Federico Campoli

Brighton PostgreSQL Meetup

18 November 2016

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 1 / 44

Table of contents

1 Some history

2 MySQL Replica in a nutshell

3 The pg chameleon library

4 Caveats, traps, the usual political stuff...

5 Wrap up

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 2 / 44

Some history

Table of contents

1 Some history

2 MySQL Replica in a nutshell

3 The pg chameleon library

4 Caveats, traps, the usual political stuff...

5 Wrap up

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 3 / 44

Some history

The beginnings

Years 2006/2012

neo my2pg.py

Developed for helping a struggling phpbb

The database was successfully migrated from MySQL to PostgreSQL

The migration failed for other reasons

It’s written in python 2.6

It’s a monolith script

And it’s slow, very slow

You can use it as checklist for things to avoid when coding

https://github.com/the4thdoctor/neo my2pg

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 4 / 44

Some history

The beginnings

Years 2006/2012

neo my2pg.py

Developed for helping a struggling phpbb

The database was successfully migrated from MySQL to PostgreSQL

The migration failed for other reasons

It’s written in python 2.6

It’s a monolith script

And it’s slow, very slow

You can use it as checklist for things to avoid when coding

https://github.com/the4thdoctor/neo my2pg

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 4 / 44

Some history

The beginnings

Years 2006/2012

neo my2pg.py

Developed for helping a struggling phpbb

The database was successfully migrated from MySQL to PostgreSQL

The migration failed for other reasons

It’s written in python 2.6

It’s a monolith script

And it’s slow, very slow

You can use it as checklist for things to avoid when coding

https://github.com/the4thdoctor/neo my2pg

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 4 / 44

Some history

I’m not scared of using the ORMs

Years 2013/2015First attempt of pg chameleon

Developed in Python 2.7

SQLAlchemy was used for extracting the MySql’s metadata

Good proof of concept. No real hope to become usable

Built during the years of the roller coaster

It was a just a way to discharge frustration

Abandoned because pgloader did the same and better

The ORM limitations didn’t help to keep the project alive

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 5 / 44

Some history

I’m not scared of using the ORMs

Years 2013/2015First attempt of pg chameleon

Developed in Python 2.7

SQLAlchemy was used for extracting the MySql’s metadata

Good proof of concept. No real hope to become usable

Built during the years of the roller coaster

It was a just a way to discharge frustration

Abandoned because pgloader did the same and better

The ORM limitations didn’t help to keep the project alive

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 5 / 44

Some history

I’m not scared of using the ORMs

Years 2013/2015First attempt of pg chameleon

Developed in Python 2.7

SQLAlchemy was used for extracting the MySql’s metadata

Good proof of concept. No real hope to become usable

Built during the years of the roller coaster

It was a just a way to discharge frustration

Abandoned because pgloader did the same and better

The ORM limitations didn’t help to keep the project alive

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 5 / 44

Some history

pg chameleon reborn

Year 2016The project’s revamp the was triggered by a specific need.

What if were possible to replicate data from MySQL to PostgreSQL?

The library python-mysql-replication can decode the mysql replica when usingROW based.

Trying won’t harm they said.

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 6 / 44

Some history

pg chameleon reborn

Is still on Python 2.7

Removed SQLAlchemy

Switched the mysql driver to PyMySQL

The library python-mysql-replication reads the MySQL replica

Provides a basic command line

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 7 / 44

MySQL Replica in a nutshell

Table of contents

1 Some history

2 MySQL Replica in a nutshell

3 The pg chameleon library

4 Caveats, traps, the usual political stuff...

5 Wrap up

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 8 / 44

MySQL Replica in a nutshell

MySQL Replica

MySQL saves the logical data rather the physical

The data changes are stored in a local binary log

The slave saves in its local relay logs the replication data pulled from themaster

The slave read the local relay logs and replays the data

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 9 / 44

MySQL Replica in a nutshell

MySQL Replica

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 10 / 44

MySQL Replica in a nutshell

Log formats

STATEMENT format logs the statements which are replayed on the slave.It seems the best solution for performance.Replaying not deterministic functions generate inconsistent slaves (e.g. uuid).

ROW is deterministic. It logs the changed row and the DDL queries.This format is required for pg chameleon to work.

MIXED takes the best of both worlds. The master logs the statements unlessa not deterministic function is used. In that case it logs the row image.

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 11 / 44

MySQL Replica in a nutshell

A chameleon in the middle

pg chameleon mimics a mysql slave’s behaviour

Reads the replica

Stores the decoded rows into a PostgreSQL table

PostgreSQL acts as relay log and replication slave

A plpgSQL function decodes the rows and replay the changes

With an extra cool feature.

Initialise the PostgreSQL replica schema in just one command

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 12 / 44

MySQL Replica in a nutshell

A chameleon in the middle

pg chameleon mimics a mysql slave’s behaviour

Reads the replica

Stores the decoded rows into a PostgreSQL table

PostgreSQL acts as relay log and replication slave

A plpgSQL function decodes the rows and replay the changes

With an extra cool feature.

Initialise the PostgreSQL replica schema in just one command

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 12 / 44

MySQL Replica in a nutshell

A chameleon in the middle

pg chameleon mimics a mysql slave’s behaviour

Reads the replica

Stores the decoded rows into a PostgreSQL table

PostgreSQL acts as relay log and replication slave

A plpgSQL function decodes the rows and replay the changes

With an extra cool feature.

Initialise the PostgreSQL replica schema in just one command

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 12 / 44

MySQL Replica in a nutshell

MySQL replica + pg chameleon

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 13 / 44

The pg chameleon library

Table of contents

1 Some history

2 MySQL Replica in a nutshell

3 The pg chameleon library

4 Caveats, traps, the usual political stuff...

5 Wrap up

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 14 / 44

The pg chameleon library

Project structure

project directory

pg chameleon.py

config

config.yaml

logs

pg chameleon

lib

global lib.py

mysql lib.py

pg lib.py

sqlutil lib.py

sql

upgrade

create schema.sql

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 15 / 44

The pg chameleon library

pg chameleon.py

Command line wrapper

Use argparse to execute the commands

Can be simply extended to more commands

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 16 / 44

The pg chameleon library

pg chameleon.py

init replica copies the data from mysql and saves the master coordinates inpostgresthis command locks the mysql tables in read only mode during thecopy

start replica connects to the mysql master and replies the changes inPostgreSQL

create schema,drop schema,upgrade schema manual actions on thePostgreSQL service schemanot required in general because the init replica recreates the service schemafrom scratch.start replica runs the schema migrations if required before starting theprogram loop

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 17 / 44

The pg chameleon library

global lib.py

class global config: loads the config.yaml into the class attributes

class replica engine: wraps the mysql and pgsql class methods and setup thelogging method. a global config instance is created for getting theconfiguration settings

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 18 / 44

The pg chameleon library

mysql lib.py

class mysql connection: connects to mysql using the parameters provided byreplica engine

class mysql engine: does all the magic for the replication setup and execution

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 19 / 44

The pg chameleon library

mysql lib.py

class mysql engine

locks and release the tables for the init replica command

pulls out the data from mysql in csv format or insert statements

extracts the metadata from mysql’s information schema

copy the data into postgres using the class pg engine

fallsback to inserts if the copy fails for any reason

starts the replica stream using python-mysql-replication

decodes the replica events into a data dictionary which is saved by pg engine

when a replica binlog is read executes the postgres replay via pg engine

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 20 / 44

The pg chameleon library

pg lib.py

class pg encoder: extends the class JSON and adds some special handling fortypes like decimal and datetime

class pgsql connection: connects to the PostgreSQL database

class pgsql engine: does all the magic for rebuilding the data structure,loading data and migrating the schema

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 21 / 44

The pg chameleon library

pg lib.py

class pgsql engine

create and upgrade the service schema sch chameleon

builds the create statements for tables and indices using the metadataprovided by mysql engine

executes the create statements and register the mysql tables in sch chameleon

copy the data into the tables and fallsback to inserts if the copy fails

builds the primary keys and indices using the medatada provided bymysql engine

store the json data from the replica and executes the replay

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 22 / 44

The pg chameleon library

sqlutil lib.py

Consists in just one class sql token which tokenise the mysql queries to be used bypgsql engine for building the DDL in PostgreSQL’s dialect.

Currently under development

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 23 / 44

The pg chameleon library

config.yaml

my server id: the server id for the mysql replica. must be unique among thereplica cluster

copy max memory: the max amount of memory to use when copying thetable in PostgreSQL. Is possible to specify the value in (k)ilobytes,(M)egabytes, (G)igabytes adding the suffix (e.g. 300M)

my database: mysql database to replicate. a schema with the same name willbe initialised in the postgres database

pg database: destination database in PostgreSQL.

copy mode: the allowed values are ‘file’ and ‘direct’. With direct the copyhappens on the fly. With file the table is first dumped in a csv file thenreloaded in PostgreSQL.

hexify: is a yaml list with the data types that require coversion in hex (e.g.blob, binary). The conversion happens on the copy and on the replica.

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 24 / 44

The pg chameleon library

config.yaml

log dir: directory where the logs are stored

log level: logging verbosity. allowed values are debug, info, warning, error

log dest: log destination. stdout for debugging purposes, file for the normalactivity.

my charset mysql charset for the copy (please note the replica is always inutf8)

pg charset: PostgreSQL connection’s charset.

tables limit: yaml list with the tables to replicate. if empty the entire mysqldatabase is replicated.

sleep loop seconds between a new replica batch attempt

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 25 / 44

The pg chameleon library

config.yaml

MySQL connection parameters

mysql_conn:

host: localhost

port: 3306

user: replication_username

passwd: never_commit_passwords

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 26 / 44

The pg chameleon library

config.yaml

PostgreSQL connection parameters

pg_conn:

host: localhost

port: 5432

user: replication_username

password: never_commit_passwords

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 27 / 44

The pg chameleon library

MySQL replica configuration

The mysql configuration file is usually stored in /etc/mysql/my.cnfTo enable the binary logging find the section [mysqld] and check the followingparameters are set.

binlog format Has to be ROW for capturing the DML events

log-bin any name is good (e.g. mysql-bin)

server-id has to be a numerical value unique along the replication clusterThe value 1 is used for the master

binlog row image has to be full as required by the python-mysql-replicationlibrary

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 28 / 44

The pg chameleon library

MySQL setup

CREATE USER usr_replica ;SET PASSWORD FOR usr_replica=PASSWORD(’replica ’);GRANT ALL ON sakila .* TO ’usr_replica ’;GRANT RELOAD ON *.* to ’usr_replica ’;GRANT REPLICATION CLIENT ON *.* to ’usr_replica ’;GRANT REPLICATION SLAVE ON *.* to ’usr_replica ’;FLUSH PRIVILEGES;

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 29 / 44

The pg chameleon library

PostgreSQL setup

CREATE USER usr_replica WITH PASSWORD ’replica ’;CREATE DATABASE db_replica WITH OWNER usr_replica;

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 30 / 44

The pg chameleon library

Replica setup

Setup copy config-yaml.example in config.yaml and setup the configurationparameters

./pg_chameleon.py init_replica

Wait for the init replica completion then start the replica with

./pg_chameleon.py start_replica

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 31 / 44

Caveats, traps, the usual political stuff...

Table of contents

1 Some history

2 MySQL Replica in a nutshell

3 The pg chameleon library

4 Caveats, traps, the usual political stuff...

5 Wrap up

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 32 / 44

Caveats, traps, the usual political stuff...

Limitations

Tables for being replicated require primary keys

There is no cleanup for the rubbish accepted by mysql (e.g. nulls implicitlyconverted to 0)

No Daemonisation yet

Binary data are hexified to avoid issues with PostgreSQL

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 33 / 44

Caveats, traps, the usual political stuff...

What does it work

Replicate mysql schema into PostgreSQL

Locks the tables in mysql and gets the master coordinates

Create primary keys and indices on PostgreSQL

Write MySQL row events in PostgreSQL

Replay of the replicated data in PostgreSQL

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 34 / 44

Caveats, traps, the usual political stuff...

What does seem to work

Enum support

Binary import into bytea (hex conversion)

Initial copy based on copy to file or in memory

Fall back to inserts in case of rubbish data (slow)

Replication of CREATE and DROP TABLE statements

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 35 / 44

Caveats, traps, the usual political stuff...

What doesn’t work

replication of ALTER TABLE statements

Materialisation of the MySQL views

Foreign keys import in PostgreSQL

Daemonisation, background workers for replay, postgres extension

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 36 / 44

Wrap up

Table of contents

1 Some history

2 MySQL Replica in a nutshell

3 The pg chameleon library

4 Caveats, traps, the usual political stuff...

5 Wrap up

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 37 / 44

Wrap up

Igor, the green little guy

The chameleon logo has been developed by Elena Toma, a talented Italian Lady.

https://www.facebook.com/Tonkipapperoart/

The name Igor is inspired by Martin Feldman’s Igor portraited in YoungFrankenstein movie.

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 38 / 44

Wrap up

Some numbers

Lines of code

global lib.py 163

mysql lib.py 521

pg lib.py 557

sql util.py 208

create schema.sql 354

Total lines in libraries 1449

Total lines including SQL 1803

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 39 / 44

Wrap up

pg chameleon’s license

Old plain 2clause BSD License

Copyright (c) 2016, Federico CampoliAll rights reserved.Redistribution and use in source and binary forms, with or withoutmodification, are permitted provided that the following conditions are met:* Redistributions of source code must retain the above copyright notice, this

list of conditions and the following disclaimer.* Redistributions in binary form must reproduce the above copyright notice,

this list of conditions and the following disclaimer in the documentationand/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THEIMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE AREDISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLEFOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIALDAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS ORSERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVERCAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USEOF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 40 / 44

Wrap up

Please Test!

That’s all!

Please clone the repository, test and break the tool!

Report issues!

https://github.com/the4thdoctor/pg chameleon

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 41 / 44

Wrap up

Boring legal stuff

MySQL Image source WikiCommons

Hard Disk image source WikiCommons

Slonik logo, copyright PostgreSQL Global development group

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 42 / 44

Wrap up

Contacts and license

Twitter: 4thdoctor scarf

Blog:http://www.pgdba.co.uk

Brighton PostgreSQL Meetup:http://www.meetup.com/Brighton-PostgreSQL-Meetup/

This document is distributed under the terms of the Creative Commons

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 43 / 44

Wrap up

pg chameleonMySQL to PostgreSQL lightweight replica

Federico Campoli

Brighton PostgreSQL Meetup

18 November 2016

Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 44 / 44