foss4 guk2016 aileenheal

19
Managing National Load OS MasterMap and Maintaining History Aileen Heal, Astun Technology

Upload: aileen-heal-nee-romeril

Post on 11-Apr-2017

395 views

Category:

Software


0 download

TRANSCRIPT

Managing National Load OS MasterMap

and Maintaining HistoryAileen Heal, Astun Technology

OS MasterMap Topography Layer,PostGIS,Loader,

PostgreSQL HSTORE,&

PostgreSQL Audit trigger 91plus

PostgreSQL PostgreSQL is a powerful, open source object-relational database system. PostgreSQL evolved from the Ingres project. The project was lead by Michael Stonebaker in1986. In 1995, two Ph.D. students from Stonebraker's lab, Andrew Yu and Jolly Chen, replaced Postgres' POSTQUEL query language with an extended subset of SQL. They renamed the system to Postgres95. In 1996, Postgres95 departed from academia and started a new life in the open source world, when the database system took its current name: PostgreSQL. "Postgres" is still used as an easy-to-pronounce nick-name.

www.postgresql.org

PostGIS is a spatial database extender for PostgreSQL. It adds support for geographic objects allowing location queries to be run in SQL.PostGIS first release was in 2001.

http://postgis.net/

OS MasterMap Topography LayerOS MasterMap Topography Layer is the most detailed and accurate view of Great Britain's landscape – from roads to fields, to buildings and trees, fences, paths and more.There are approx 460 million features in OS MasterMap Topography.

Change Only Update (CoU)Due to the large number of features in OS MasterMap. Updates are available as CoU. Typically a CoU for national supply is < 6 million features.

https://www.ordnancesurvey.co.uk/business-and-government/products/topography-layer.html

LoaderA powerful GML & KML loader (and translator) written in Python that makes use of OGR 1.9.

Source data can be in GML (including .gz) or KML format and can be output to any of the formats supported by OGR. The source data can be prepared and enhanced during loading to ● make it suitable for loading with OGR (useful with complex feature types) ● to add value by deriving attributes

Fairly fast (national cover OS MasterMap in 2 days)● Run 6 instances in parallel● Use OGR PGDump driver to output the SQL and use COPY utility to load

data.

http://github.com/AstunTechnology/Loader

Loading CoULoading CoU data is basically the same as loading standard MasterMap.

Two differences1. Extra feature type

Departed Features.

2. Need to apply the changes after you have loaded the data. It is easier if you load the CoU data into a separate schema e.g.● osmm_topo● osmm_topo_cou

Loading CoU cont... Applying the Changes

Remove all the departed features from the main holding. Then, for all the changed records do an UPSERT.

For speed we do a delete and insert.

We do a little bit more….

BUT what about keeping the history? For that we use AUDIT :)

Identify changed areas● Add the geometry to the departed feature table

● Create a view of changed features

● Create a table of 500m grid squares which have changed

● Use this table to update the tile caches where the data has changed

Two phased validation● Compare number of features loaded in CoU tables for each file

with report generated by a python scripts which parses the .gz files

● Load FVDs and compare TOIDs, version number & version data with updated data.

PostgreSQL HSTORE

A PostgreSQL extension which implements the hstore data type for storing sets of key/value pairs within a single PostgreSQL value.

This can be useful in various scenarios, such as rows with many attributes that are rarely examined, or semi-structured data.

Keys and values are simply text strings.

Key function/ operators are:

http://www.postgresql.org/docs/current/static/hstore.html

hstore(record) construct an hstore from a record or row

populate_record(record, hstore)

replace fields in record with matching values from hstore

hstore – hstore delete matching pairs from left operand (so can store changes)

PostgreSQL Audit trigger 91plus

https://wiki.postgresql.org/wiki/Audit_trigger_91plus

● Generic trigger function used for recording changes to tables into an audit log table.

● Row values are recorded as HSTORE fields rather than as flat text. ● Auditing can be done coarsely at a statement level or finely at a

row level. ● Control is per-audited-table.

Trigger does not track:● SELECT● DDL like ALTER TABLE● Changes to system catalogs● Trigger does record that a truncate has happened

but not the values of of the rows affected by the truncate

What's great about PostgreSQL Audit trigger 91plus?

Obviously the Audit triggers need to be applied before changes are made to the data.

Let's look at some audit data....

● Very simple to turn audit onSELECT audit.audit_table('<schema name>.<table name');

● You can audit any table in the database.

● No extra columns required on the tables being audited.

● All changes is held in the table audit.logged_actions.

● Changes are only visible to roles which have the appropriate privileges

How to Create a “Point in Time” Snapshot 1st check the table has not be truncated after the ‘snapshot date’!

Using HSTORE function populate_record create view ‘changes_after' ● the 1st change per primary key from the audit table after the ‘date’● an extra column indicating the change i.e. D, U, I

Changes after view...

CREATE TEMPORARY VIEW changes_after asSELECT DISTINCT ON (fid) * FROM (SELECT action, (populate_record(null::osmm_topo.topographicarea,row_data)).* FROM audit.logged_actions WHERE logged_actions.schema_name = 'osmm_topo' AND logged_actions.table_name = 'topographicarea' AND logged_actions.action_tstamp_tx > '2016-06-01 20:00:00'::timestamp ORDER BY fid, event_id ) foo;

Create Snapshot

Create a view/table which includes

● All the records in the current table whose PK is not in changes_after view

plus● All the records in changes_after view where change is D or U

Let's look at some example data....

CREATE TABLE snapshot AS SELECT *, null as action FROM osmm_topo.topographicarea WHERE fid NOT IN ( SELECT fid FROM changes_after )

UNION SELECT * FROM changes_after WHERE action in ('D','U');

Before CoU was applied.

© Crown copyright and database rights 2016 Ordnance Survey 100019153

After CoU was applied.

© Crown copyright and database rights 2016 Ordnance Survey 100019153

Side by side

© Crown copyright and database rights 2016 Ordnance Survey 100019153

So there you have it...

Questions?