Transcript
Page 1: Elephants in the Cloud

Mike Fowler, PGConf.EU 2017-10-26, Warsaw, Poland

Elephants in the Cloud

Page 2: Elephants in the Cloud

● Senior Site Reliability Engineer in the Public Cloud Practice of claranet

● Background in Software Engineering, Systems Engineering, System & Database Administration

● Been using PostgreSQL since 7.4

● Contributed to YAWL, PostgreSQL & Terraform open source projects

– XMLEXISTS/xpath_exists()

– xml_is_well_formed()

● Regular speaker at PGDay UK

About Me

Page 3: Elephants in the Cloud

● Hosted PostgreSQL

● Overview of public cloud hosting options

● Database migration strategies

Overview

Page 4: Elephants in the Cloud

● Your database somewhere else

● A managed service

– Some providers offer full DBA support

– Cloud providers give only the infrastructure

● Typically provisioned through an API or GUI

– i.e. a self-service environment

What is hosted PostgreSQL?

Page 5: Elephants in the Cloud

● Reduces adoption costs

● Installation & configuration is already done

– Generally sane defaults, some tuning often required

● Needn’t worry about physical servers

● Opex instead of Capex

● Most routine DBA tasks are done for you

● Easier to grow

Benefits of Hosted PostgreSQL

Page 6: Elephants in the Cloud

● Less control

● Latency

● Some features are disabled

● Migrating existing databases is hard

● Potential for vendor lock-in

● Resource limits

Drawbacks of Hosted PostgreSQL

Page 7: Elephants in the Cloud

● Automatic scaling

● It will fix your slow queries/reports/application

● You don’t need a DBA

– They need to be planning for scale

– They need to help address slowness

Fallacies of Hosted PostgreSQL

Page 8: Elephants in the Cloud

● We’ll look only at Public Cloud offerings

● Current major offerings

– Amazon Relation Database Service (RDS)

– Amazon Aurora

– Heroku

● Future major offerings

– Google Cloud SQL

– Microsoft Azure

Hosting Options

Page 9: Elephants in the Cloud

● PostgreSQL 9.3.12 – 9.6.3 supported

● Numerous instance types

– Costs range from $0.018 to $7.96 per hour

– Select from 1 vCPU up to 64 vCPUs, all 64-bit

– Memory ranges from 1GB to 488GB

● Flexible storage options

– Choose between SSD or Provisioned IOPS

– Up to 6TB with up to 30,000 IOPS

Amazon RDS

Page 10: Elephants in the Cloud

● High availability multi-availability zone option

– Synchronous replica

– Automatic failover (~2 minutes)

● Up to 5 read-only replicas (asynchronous replication)

● Configurable automatic backups with PITR

● Monthly uptime percentage of 99.95 per instance

– Allows for approximately 22 minutes downtime

Amazon RDS

Page 11: Elephants in the Cloud

● Became generally available yesterday (2017-10-25)

● Currently available in 3 US regions and Ireland

● 6 instance types costing from $0.29 to $9.28 per hour

● Compatible with PostgreSQL 9.6

● Up to 2x throughput of conventional PostgreSQL

● Up to 16 read replicas with sub-10ms replica lag

● Auto-growing filesystem up to 64TB

– Filesystem is shared between 3 availability zones

Amazon Aurora

Page 12: Elephants in the Cloud

Amazon Aurora

https://aws.amazon.com/blogs/aws/amazon-aurora-update-postgresql-compatibility/

Page 13: Elephants in the Cloud

● Supports PostgreSQL 9.3, 9.4, 9.5 & 9.6 (10 in beta)

● Simpler pricing based on choice of tier ($0-8.5k pcm)

● Tier dictates resource limits

– Maximum number of rows (Hobby only)

– Cache size (1GB - 240GB)

– Storage limit (64GB - 1TB)

– Connection limit (120 - 500)

– Rollback (4 days – 1 week)

Heroku

Page 14: Elephants in the Cloud

● Fork & Follow

● Some of your data may end up in the US

– Logs (can be blocked at creation time)

– Snapshots & Dataclips

● Not possible to replicate out

– No permission for Bucardo, Londiste & Slony

– Remote slave is prohibited

– Only way is dump & restore

Heroku

Page 15: Elephants in the Cloud

● Currently in Beta (no SLA)

● Only supports PostgreSQL 9.6

● No replication support

● Posed to be a serious rival to RDS

– Billing per minute

– Automatic scaling of filesystem

– Similar variety of instance types

Google Cloud SQL

Page 16: Elephants in the Cloud

● Currently in preview (no SLA)

● Supports PostgreSQL 9.5 & 9.6

● Replication is seamless

– Automated failover

– PITR

● Selectable compute units

– Roughly equates to maximum connections

Microsoft Azure

Page 17: Elephants in the Cloud

● At Rest (AES-256)

– Enabled by default for Google & Azure

– User enabled at create for Amazon (RDS/Aurora)

– Only available in premium plans for Heroku

● In transit (SSL)

– Enforced on Heroku & Azure

– Can be enforced on Google & Amazon

● pgcrypto is available on all platforms

Encryption

Page 18: Elephants in the Cloud

● Providers dictate the availability of extensions

– Well documented except for Aurora

● Only step required is to run CREATE EXTENSION

● Many extensions are widely available

– pgcrypto

– PostGIS

– pg_stat_statements

– hstore

– plpgsql

Extensions

Page 19: Elephants in the Cloud

● Dump & Restore

● Replication failover

● Amazon’s Database Migration Service

● PITR + Logical decoding

Migration Strategies

Page 20: Elephants in the Cloud

● Simplest strategy

– Perceived as low risk for data loss

– Less “moving parts”● Just a pg_dump & pg_restore

● Downtime is function of database size

Dump & Restore

Page 21: Elephants in the Cloud

● Move historic data ahead of time

– Opportunity to clear out unused data

– Consider introducing partitions

● Consider moving the dump closer to the target

– e.g. Upload to EC2 instance in the same region as the RDS instance and run pg_restore from there

● Over provision resources

– Gives higher throughput during data load

– Downscale once operational

Strategies to Minimise Downtime

Page 22: Elephants in the Cloud

● No one supports external masters!

● Trigger based replication failover

– Slony, Londiste & Bucardo

● Can be used on most any version of PostgreSQL

● Some restrictions apply

– DDL is not supported

– Rows must be uniquely identifiable

Replication Failover

Page 23: Elephants in the Cloud

● Presents some risk to production environment

– Initial overhead of replicating each table● Gradually add tables to the configuration to

spread the load

– Per-transaction overhead● Write latency to remote slave● Heavy write workload could lead to high

replication lag

● This also works to replicate out of RDS but not Heroku

Replication Failover

Page 24: Elephants in the Cloud

● Web service supporting migration to/from AWS

– Can be RDS or an EC2 instance

– External DB can be anywhere

● Perform a one time load or continual load

– Change Data Capture

● “Supports” heterogeneous migrations

– Oracle, MySQL, SQL Server, MongoDB

– Schema Conversion Tool exists to assist

AWS Database Migration Service

Page 25: Elephants in the Cloud

● PostgreSQL as a source database

– User only needs to see all data

● PostgreSQL as a streaming source database

– Tables must have a primary key

– Requires 9.4 with wal_level = logical

– Must have room for a new replication slot

– User must be a SUPERUSER

AWS Database Migration Service

Page 26: Elephants in the Cloud

● PostgreSQL as a target database

– Import your schema using pg_dump

– session_replication_role needs to be set to replica

– Set target table preparation mode to DO_NOTHING

● There are drawbacks of using DMS

– Replication lag

– Process is fiddly

AWS Database Migration Service

Page 27: Elephants in the Cloud

● Most involved approach, least downtime

● Combines point-in-time recovery with the changes captured by logical decoding to create a replica

● Need to be running at least PostgreSQL 9.4 with WAL level logical and have WAL archiving configured

● DDL not supported, still need unique rows

● Recommend barman for managing WALhttp://www.pgbarman.org/

● Recommend decoder_raw as logical decoding plugingithub.com/michaelpq/pg_plugins/tree/master/decoder_raw

PITR & Logical decoding

Page 28: Elephants in the Cloud

1. Create a logical replication slot

SELECT * FROM pg_create_logical_replication_slot ('logical_slot', 'decoder_raw');

2. Note the transaction ID (catalog_xmin)

SELECT catalog_xmin FROM pg_replication_slots WHERE slot_name = ‘logical_slot’;

PITR & Logical decoding

Page 29: Elephants in the Cloud

3. Perform a barman backup

$ barman backup master

4. Perform a barman PITR

$ barman recover –target-xid (catalog_xmin - 1) master latest

5. Start database and verify correct recovery

PITR & Logical decoding

Page 30: Elephants in the Cloud

5. Perform pg_dump on the readonly barman node

6. Restore to public cloud

7. Read output of logical decoding and write to cloud

PITR & Logical decoding

Page 31: Elephants in the Cloud

● Hosted PostgreSQL gives you high performance PostgreSQL without the hassle of hardware, maintenance and configuration

● Opex instead of Capex

● Consider the limitations of your intended platform

● There are multiple options for migration

Summary

Page 32: Elephants in the Cloud

Questions?

Please remember to leave feedback about my talk, other talks and the conference in general at

https://2017.pgconf.eu/f

Mike Fowler

gh-mlfowler

mlfowler

mike dot fowler at claranet dot uk

Page 33: Elephants in the Cloud

Top Related