gbroccolo foss4 guk2016_brin4postgis

Post on 23-Feb-2017

459 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

June, 14th 2016

BRIN indexes on

geospatial

databases

www.2ndquadrant.com

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

~$ whoami~$ whoami

Giuseppe Broccolo, PhDGiuseppe Broccolo, PhD PostgreSQL & PostGIS consultantPostgreSQL & PostGIS consultant

@giubro gbroccolo7

giuseppe.broccolo@2ndquadrant.it

gbroccolo gemini__81

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

Indexes & PostgreSQLIndexes & PostgreSQL

• Binary structures on disc:

– Indexing organised in nodes into 8kB pages

– Nodes point to table rows

• Speed up data access: ~O(log N)

– High performance until the index can be contained in RAM

• Size: ~O(N)

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

Geospatial indexes in PostgreSQLGeospatial indexes in PostgreSQL

• All datatypes can be indexed in principle...– ...just define the needed Operator Class!

• Operators & support functions

– ...define how the data has to be stored into the index nodes

• Geospatial data:– can be larger than 8kB– Is generally a varlena– a single node cannot be contained into a single page

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

PostGIS datatypes & GiST indexingPostGIS datatypes & GiST indexing

In PostGIS (since v0.6) geometry/geography areconverted into their float-precision bounding boxes

gidx/box2df are then used as storagedatatype to populate the index’s nodes

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

GIS & BigDataGIS & BigData

LiDAR surveys

Maps of entire countries

It’s relatively easy to have to work with terabytes of geospatial data…

...can indexes be contained in RAM?

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

A new index: Block Range Indexing (PG 9.5)A new index: Block Range Indexing (PG 9.5)

CREATE INDEX idx ON table

USING brin(column) WITH (pages_per_range=10);

data must be physically sorted on disk!

index node → row

index node → block

less specific than GiST!

Really small!

8kB8kB8kB8kB8kB8kB8kB8kB

8kB8kB8kB8kB

(S. Riggs, A. Herrera)

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

BRIN support for PostGISBRIN support for PostGIS

https://github.com/dalibo/postgis/tree/for_upstream

Julien RouhaudRonan Dunklau me

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

BRIN support for PostGISBRIN support for PostGIS

NO kNN SUPPORT!!

– Different OpClass for • 2D (default), 3D, 4D geometry• geography

• box2d/box3d

• cross-operators defined in the OpFamily

– Storage datatype: float-precision (such as in GiST)• gidx (3D, 4D geometry)

• box2df (2D geometry, geography)

– Operators: &&, @, ~ (2D) - &&& (3D, 4D)

8kB8kB8kB8kB

8kB8kB8kB8kB

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

11stst example: “sorted” example: “sorted” geometrygeometry points points~10M, 3D, SRID=0, distribution range: 0÷100kunits

=# CREATE INDEX idx_gist ON points USING gist

-# (geom gist_geometry_ops_nd);

=# CREATE INDEX idx_brin_128 ON points USING brin

-# (geom brin_geometry_inclusion_ops_3d);

=# CREATE INDEX idx_brin_10 ON points USING brin

-# (geom brin_geometry_inclusion_ops_3d)

-# WITH (pages_per_range=10);

BRIN(128)/GiST 1/2000

BRIN(10)/GiST 1/1000

BRIN(128)/GiST 1/30

BRIN(10)/GiST 1/20

SizesBuilding

times

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

11stst example: “sorted” example: “sorted” geometrygeometry points points~10M, 3D, SRID=0, distribution range: 0÷100kunits

=# SELECT * FROM points

-# WHERE ‘BOX3D(10. 10. 10., 12., 12., 12.)’::box3d &&& geom;

BRIN(128)/GiST 50/1

BRIN(10)/GiST 6/1

BRIN(128)/SeqScan 1/60

BRIN(10)/SeqScan 1/350

Execution times

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

22ndnd example: “unsorted” example: “unsorted” geometrygeometry points points~10M, 3D, SRID=0, distribution range: 0÷100kunits

=# CREATE TABLE unsorted_points AS

-# SELECT * FROM points ORDER BY random();

=# SELECT * FROM unsorted_points

-# WHERE ‘BOX3D(10. 10. 10., 12., 12., 12.)’::box3d &&& geom;

BRIN(128)/GiST 2800/1

BRIN(10)/GiST 400/1

BRIN(128)/SeqScan 1/1

BRIN(10)/SeqScan 1/12

Execution times

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

22ndnd example: “unsorted” example: “unsorted” geometrygeometry points points

=# CREATE EXTENSION pageinspect;

=# SELECT blknum, value FROM brin_page_items(get_raw_page('idx_brin_128', 2), 'idx_brin_128')

-# LIMIT 5;

blknum | value

--------+------------------------------------------------------------------------

0 | {GIDX( 1.11394846439 1.64980053902 1.11335003376, 99982.2578125 99982.64062599982.1171875 ) .. f .. f}

128 | {GIDX( 0.224781364202 0.184096381068 0.798862099648, 99995.859375 99995.17187599995.0390625 ) .. f .. f}

256 | {GIDX( 6.25802087784 6.50119876862 6.8031873703, 99993.15625 99993.820312599993.2109375 ) .. f .. f}

384 | {GIDX( 1.9203556776 1.11907613277 1.55043530464, 99995.421875 99995.23437599995.421875 ) .. f .. f}

512 | {GIDX( 3.44181084633 3.4120452404 3.41762590408, 99996.3125 99996.7109375 99996.59375) .. f .. f}

(5 rows)

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

...what about ...what about INSERTINSERT’s?’s?

INSERT INTO points

SELECT ST_MakePoint(a, a, a)

FROM generate_series(1, 1000) AS f(a);

GiST 6 times than insertions without index

BRIN (10) 3 times than insertions without index

BRIN (128) 2 times than insertions without index

...blocks could be “not sorted” anymore!

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

Manage LiDAR

data with

PostgreSQL

is it possible?

Is the relational approach valid for point clouds?Is the relational approach valid for point clouds?

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

PostgreSQL & LiDARPostgreSQL & LiDAR• PostgreSQL is a relational DBMS with an extension for LiDAR data:

pg_pointcloud

• Part of the OpenGeo suite, completely compatible with PostGIS– two new datatype: pcpoint, pcpatch (compressed set of points)

• N points (with all attributes from the survey) → 1 patch → 1 record

– compatible with PDAL drivers to import data directly from .las 32b 32b 32b 16b

https://github.com/pgpointcloud/pointcloud

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

Relational approach to LiDAR data with PGRelational approach to LiDAR data with PG

• GiST indexing in PostGIS• GiST performances:

– storage: 1TB RAID1, RAM 16GB, 8 CPU @3.3GHz,PostgreSQL9.3

– index size ~O(table size)– Index was used:

• up to ~300M points in bbox inclusion searches• up to ~10M points in kNN searches

LiDAR size: ~O(109÷1011) → few % can be properly indexed!

http://www.slideshare.net/GiuseppeBroccolo/gbroccolo-foss4-geugeodbindex

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

The LiDAR dataset: the ahn2 projectThe LiDAR dataset: the ahn2 project

• 3D point cloud, coverage: almost the whole Netherlands– EPSG: 28992, ~8 points/m2

• 1.6TB, ~250G points in ~560M patches (compression: ~10x)– PDAL driver – filter.chipper

• available RAM: 16GB

• the point structure: X Y Z scan LAS time RGB chipper

32b 32b 32b 40b 16b 64b 48b 32b

the “indexed” part (can be converted to PostGIS datatype)

(thanks to Tom Van Tilburg)

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

Typical searches on ahn2 - Typical searches on ahn2 - x3d_viewerx3d_viewer

• Intersection with a polygon (PostGIS)– the part that lasts longer – need indexes here!

• Patch “explosion” + NN sorting (pg_PointCloud+PostGIS)

• constrained Delaunay Triangulation (SFCGAL)

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

WITH patches AS (

SELECT patches FROM ahn2

WHERE patches && ST_GeomFromText(‘POLYGON(...)’)

), points AS (

SELECT ST_Explode(patches) AS points

FROM patches

), sorted_points AS (

SELECT points,

ST_DumpPoints(ST_GeomFromText(‘POLYGON(...)’))).geom AS poly_pt

FROM points ORDER BY points <#> poly_pt LIMIT 1;

), sel AS (

SELECT points FROM sorted_points

WHERE points && ST_GeomFromText(‘POLYGON(...)’)

)

SELECT ST_Dump(ST_Triangulate2DZ(ST_Collect(points))) FROM sel;

All in the DB & just with one query!All in the DB & just with one query!

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

patches && polygons - GiST performancepatches && polygons - GiST performance

GiST 2.5 days

GiST 29GB

index buildingindex size

polygonsize

timing

~O(10m) ~20ms

~O(100m) ~60ms

~O(1Km) ~3s

~O(10km) hours

searches based on GiST

index not contained inRAM anymore

(~5G points → ~3%)

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

patches && polygons - BRIN performancepatches && polygons - BRIN performance

BRIN 4 h BRIN 15MB

index building

polygonsize

timing

~O(m) ~150s

searches based on BRIN

index size

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

patches && polygons - BRIN performancepatches && polygons - BRIN performance

BRIN 5 h BRIN 14MB

index building

polygonsize timing

~O(m) ~150s

searches based on BRIN

How data was inserted...

LASLASLASLASLASLAS

chipper

chipper

chipper

PDAL driver(parallel processes)

ahn2

index size

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

Spatial sorting of a LiDAR datasetSpatial sorting of a LiDAR dataset

CREATE INDEX patch_geohash ON ahn2

USING btree (ST_GeoHash(ST_Transform(Geometry(patch), 4326), 20));

CLUSTER ahn2 USING patch_geohash;

Morton order [http://geohash.org/]

Need more than 2X table size free space

Relatively fast!

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

Spatial sorting of a LiDAR datasetSpatial sorting of a LiDAR dataset

CREATE TABLE ahn2_sorted ON ahn2 AS

SELECT * FROM ahn2

ORDER BY ST_GeoHash(ST_Transform(Geometry(patch), 4326), 10)

COLLATE “C”;

Morton order [http://geohash.org/]

Just need size for a second table

Slower than clustering using the index!

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

Spatial sorting of a LiDAR datasetSpatial sorting of a LiDAR dataset

CREATE TABLE ahn2_sorted ON ahn2 AS

SELECT * FROM ahn2

ORDER BY ST_GeoHash(ST_Transform(Geometry(patch), 4326), 10)

COLLATE “C”;

Morton order [http://geohash.org/]

Just need size for a second table

Slower than clustering using the index!

45 hours

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

patches && polygons – BRIN performancepatches && polygons – BRIN performance

polygonsize

BRINtiming

GiSTtiming

~O(10m) ~380ms ~20ms

~O(100m) ~400ms ~60ms

~O(1Km) ~2.7s ~3s

~O(10km) ~4.7s hours

• After sorting: 150s→380ms

• GiST X20 faster than BRIN [r~O(10m)]

– BRIN X200 faster than SeqScan

• BRIN X1000 faster than SeqScan [r~O(100m)]•

• BRIN ~ GiST [r~O(1Km)]

• Just BRIN is used for searches r>1km

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

is the drop in performance acceptable?is the drop in performance acceptable?

BRIN searches = x20 GiST searches

= x10÷x100 GiST searches

= x10÷x100 GiST searches

patch explosion+

NN sorting

constrainedtriangulation

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

the future of the project...add kNN supportthe future of the project...add kNN support

8kB8kB8kB8kB8kB8kB8kB8kB

8kB8kB8kB8kB

8kB8kB8kB8kB8kB8kB8kB8kB

8kB8kB8kB8kB

8kB8kB8kB8kB

8kB8kB8kB8kB

k

get blocks gradually included within the k parameter, as already madein GiST with single tuples

WIP…sponsorships are welcome!

ORDER BY geoms <-> point LIMIT N

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

ConclusionsConclusions• BRINs can be successfully used in geo DB based on PostgreSQL

– totally support PostGIS datatype• A patch almost ready for the next PostGIS release (2.3.0)

– easier indexes maintenance, low indexing time, low impact on concurrent INSERTs

– less specific than GiST...but not to much!

• Really small indexes!– GiST performances drop as well as it cannot be totally contained in RAM

• Can be relational approach to LiDAR data valid with PostgreSQL?– Yes, at least for bbox inclusion searches

– make sure that data has the same sequentiality of .las

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

Creative Commons license

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

http://creativecommons.org/licenses/by-nc-sa/2.5/it/© 2016 2ndQuadrant Italia – http://www.2ndquadrant.it

top related