gbroccolo foss4 guk2016_brin4postgis

31
2ndQuadrant Italia Giuseppe Broccolo – [email protected] June, 14 th 2016 BRIN indexes on geospatial databases www.2ndquadrant.com

Upload: giuseppe-broccolo

Post on 23-Feb-2017

458 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

June, 14th 2016

BRIN indexes on

geospatial

databases

www.2ndquadrant.com

Page 2: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

~$ whoami~$ whoami

Giuseppe Broccolo, PhDGiuseppe Broccolo, PhD PostgreSQL & PostGIS consultantPostgreSQL & PostGIS consultant

@giubro gbroccolo7

[email protected]

gbroccolo gemini__81

Page 3: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

Indexes & PostgreSQLIndexes & PostgreSQL

• Binary structures on disc:

– Indexing organised in nodes into 8kB pages

– Nodes point to table rows

• Speed up data access: ~O(log N)

– High performance until the index can be contained in RAM

• Size: ~O(N)

Page 4: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

Geospatial indexes in PostgreSQLGeospatial indexes in PostgreSQL

• All datatypes can be indexed in principle...– ...just define the needed Operator Class!

• Operators & support functions

– ...define how the data has to be stored into the index nodes

• Geospatial data:– can be larger than 8kB– Is generally a varlena– a single node cannot be contained into a single page

Page 5: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

PostGIS datatypes & GiST indexingPostGIS datatypes & GiST indexing

In PostGIS (since v0.6) geometry/geography areconverted into their float-precision bounding boxes

gidx/box2df are then used as storagedatatype to populate the index’s nodes

Page 6: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

GIS & BigDataGIS & BigData

LiDAR surveys

Maps of entire countries

It’s relatively easy to have to work with terabytes of geospatial data…

...can indexes be contained in RAM?

Page 7: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

A new index: Block Range Indexing (PG 9.5)A new index: Block Range Indexing (PG 9.5)

CREATE INDEX idx ON table

USING brin(column) WITH (pages_per_range=10);

data must be physically sorted on disk!

index node → row

index node → block

less specific than GiST!

Really small!

8kB8kB8kB8kB8kB8kB8kB8kB

8kB8kB8kB8kB

(S. Riggs, A. Herrera)

Page 8: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

BRIN support for PostGISBRIN support for PostGIS

https://github.com/dalibo/postgis/tree/for_upstream

Julien RouhaudRonan Dunklau me

Page 9: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

BRIN support for PostGISBRIN support for PostGIS

NO kNN SUPPORT!!

– Different OpClass for • 2D (default), 3D, 4D geometry• geography

• box2d/box3d

• cross-operators defined in the OpFamily

– Storage datatype: float-precision (such as in GiST)• gidx (3D, 4D geometry)

• box2df (2D geometry, geography)

– Operators: &&, @, ~ (2D) - &&& (3D, 4D)

8kB8kB8kB8kB

8kB8kB8kB8kB

Page 10: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

11stst example: “sorted” example: “sorted” geometrygeometry points points~10M, 3D, SRID=0, distribution range: 0÷100kunits

=# CREATE INDEX idx_gist ON points USING gist

-# (geom gist_geometry_ops_nd);

=# CREATE INDEX idx_brin_128 ON points USING brin

-# (geom brin_geometry_inclusion_ops_3d);

=# CREATE INDEX idx_brin_10 ON points USING brin

-# (geom brin_geometry_inclusion_ops_3d)

-# WITH (pages_per_range=10);

BRIN(128)/GiST 1/2000

BRIN(10)/GiST 1/1000

BRIN(128)/GiST 1/30

BRIN(10)/GiST 1/20

SizesBuilding

times

Page 11: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

11stst example: “sorted” example: “sorted” geometrygeometry points points~10M, 3D, SRID=0, distribution range: 0÷100kunits

=# SELECT * FROM points

-# WHERE ‘BOX3D(10. 10. 10., 12., 12., 12.)’::box3d &&& geom;

BRIN(128)/GiST 50/1

BRIN(10)/GiST 6/1

BRIN(128)/SeqScan 1/60

BRIN(10)/SeqScan 1/350

Execution times

Page 12: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

22ndnd example: “unsorted” example: “unsorted” geometrygeometry points points~10M, 3D, SRID=0, distribution range: 0÷100kunits

=# CREATE TABLE unsorted_points AS

-# SELECT * FROM points ORDER BY random();

=# SELECT * FROM unsorted_points

-# WHERE ‘BOX3D(10. 10. 10., 12., 12., 12.)’::box3d &&& geom;

BRIN(128)/GiST 2800/1

BRIN(10)/GiST 400/1

BRIN(128)/SeqScan 1/1

BRIN(10)/SeqScan 1/12

Execution times

Page 13: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

22ndnd example: “unsorted” example: “unsorted” geometrygeometry points points

=# CREATE EXTENSION pageinspect;

=# SELECT blknum, value FROM brin_page_items(get_raw_page('idx_brin_128', 2), 'idx_brin_128')

-# LIMIT 5;

blknum | value

--------+------------------------------------------------------------------------

0 | {GIDX( 1.11394846439 1.64980053902 1.11335003376, 99982.2578125 99982.64062599982.1171875 ) .. f .. f}

128 | {GIDX( 0.224781364202 0.184096381068 0.798862099648, 99995.859375 99995.17187599995.0390625 ) .. f .. f}

256 | {GIDX( 6.25802087784 6.50119876862 6.8031873703, 99993.15625 99993.820312599993.2109375 ) .. f .. f}

384 | {GIDX( 1.9203556776 1.11907613277 1.55043530464, 99995.421875 99995.23437599995.421875 ) .. f .. f}

512 | {GIDX( 3.44181084633 3.4120452404 3.41762590408, 99996.3125 99996.7109375 99996.59375) .. f .. f}

(5 rows)

Page 14: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

...what about ...what about INSERTINSERT’s?’s?

INSERT INTO points

SELECT ST_MakePoint(a, a, a)

FROM generate_series(1, 1000) AS f(a);

GiST 6 times than insertions without index

BRIN (10) 3 times than insertions without index

BRIN (128) 2 times than insertions without index

...blocks could be “not sorted” anymore!

Page 15: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

Manage LiDAR

data with

PostgreSQL

is it possible?

Is the relational approach valid for point clouds?Is the relational approach valid for point clouds?

Page 16: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

PostgreSQL & LiDARPostgreSQL & LiDAR• PostgreSQL is a relational DBMS with an extension for LiDAR data:

pg_pointcloud

• Part of the OpenGeo suite, completely compatible with PostGIS– two new datatype: pcpoint, pcpatch (compressed set of points)

• N points (with all attributes from the survey) → 1 patch → 1 record

– compatible with PDAL drivers to import data directly from .las 32b 32b 32b 16b

https://github.com/pgpointcloud/pointcloud

Page 17: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

Relational approach to LiDAR data with PGRelational approach to LiDAR data with PG

• GiST indexing in PostGIS• GiST performances:

– storage: 1TB RAID1, RAM 16GB, 8 CPU @3.3GHz,PostgreSQL9.3

– index size ~O(table size)– Index was used:

• up to ~300M points in bbox inclusion searches• up to ~10M points in kNN searches

LiDAR size: ~O(109÷1011) → few % can be properly indexed!

http://www.slideshare.net/GiuseppeBroccolo/gbroccolo-foss4-geugeodbindex

Page 18: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

The LiDAR dataset: the ahn2 projectThe LiDAR dataset: the ahn2 project

• 3D point cloud, coverage: almost the whole Netherlands– EPSG: 28992, ~8 points/m2

• 1.6TB, ~250G points in ~560M patches (compression: ~10x)– PDAL driver – filter.chipper

• available RAM: 16GB

• the point structure: X Y Z scan LAS time RGB chipper

32b 32b 32b 40b 16b 64b 48b 32b

the “indexed” part (can be converted to PostGIS datatype)

(thanks to Tom Van Tilburg)

Page 19: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

Typical searches on ahn2 - Typical searches on ahn2 - x3d_viewerx3d_viewer

• Intersection with a polygon (PostGIS)– the part that lasts longer – need indexes here!

• Patch “explosion” + NN sorting (pg_PointCloud+PostGIS)

• constrained Delaunay Triangulation (SFCGAL)

Page 20: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

WITH patches AS (

SELECT patches FROM ahn2

WHERE patches && ST_GeomFromText(‘POLYGON(...)’)

), points AS (

SELECT ST_Explode(patches) AS points

FROM patches

), sorted_points AS (

SELECT points,

ST_DumpPoints(ST_GeomFromText(‘POLYGON(...)’))).geom AS poly_pt

FROM points ORDER BY points <#> poly_pt LIMIT 1;

), sel AS (

SELECT points FROM sorted_points

WHERE points && ST_GeomFromText(‘POLYGON(...)’)

)

SELECT ST_Dump(ST_Triangulate2DZ(ST_Collect(points))) FROM sel;

All in the DB & just with one query!All in the DB & just with one query!

Page 21: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

patches && polygons - GiST performancepatches && polygons - GiST performance

GiST 2.5 days

GiST 29GB

index buildingindex size

polygonsize

timing

~O(10m) ~20ms

~O(100m) ~60ms

~O(1Km) ~3s

~O(10km) hours

searches based on GiST

index not contained inRAM anymore

(~5G points → ~3%)

Page 22: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

patches && polygons - BRIN performancepatches && polygons - BRIN performance

BRIN 4 h BRIN 15MB

index building

polygonsize

timing

~O(m) ~150s

searches based on BRIN

index size

Page 23: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

patches && polygons - BRIN performancepatches && polygons - BRIN performance

BRIN 5 h BRIN 14MB

index building

polygonsize timing

~O(m) ~150s

searches based on BRIN

How data was inserted...

LASLASLASLASLASLAS

chipper

chipper

chipper

PDAL driver(parallel processes)

ahn2

index size

Page 24: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

Spatial sorting of a LiDAR datasetSpatial sorting of a LiDAR dataset

CREATE INDEX patch_geohash ON ahn2

USING btree (ST_GeoHash(ST_Transform(Geometry(patch), 4326), 20));

CLUSTER ahn2 USING patch_geohash;

Morton order [http://geohash.org/]

Need more than 2X table size free space

Relatively fast!

Page 25: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

Spatial sorting of a LiDAR datasetSpatial sorting of a LiDAR dataset

CREATE TABLE ahn2_sorted ON ahn2 AS

SELECT * FROM ahn2

ORDER BY ST_GeoHash(ST_Transform(Geometry(patch), 4326), 10)

COLLATE “C”;

Morton order [http://geohash.org/]

Just need size for a second table

Slower than clustering using the index!

Page 26: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

Spatial sorting of a LiDAR datasetSpatial sorting of a LiDAR dataset

CREATE TABLE ahn2_sorted ON ahn2 AS

SELECT * FROM ahn2

ORDER BY ST_GeoHash(ST_Transform(Geometry(patch), 4326), 10)

COLLATE “C”;

Morton order [http://geohash.org/]

Just need size for a second table

Slower than clustering using the index!

45 hours

Page 27: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

patches && polygons – BRIN performancepatches && polygons – BRIN performance

polygonsize

BRINtiming

GiSTtiming

~O(10m) ~380ms ~20ms

~O(100m) ~400ms ~60ms

~O(1Km) ~2.7s ~3s

~O(10km) ~4.7s hours

• After sorting: 150s→380ms

• GiST X20 faster than BRIN [r~O(10m)]

– BRIN X200 faster than SeqScan

• BRIN X1000 faster than SeqScan [r~O(100m)]•

• BRIN ~ GiST [r~O(1Km)]

• Just BRIN is used for searches r>1km

Page 28: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

is the drop in performance acceptable?is the drop in performance acceptable?

BRIN searches = x20 GiST searches

= x10÷x100 GiST searches

= x10÷x100 GiST searches

patch explosion+

NN sorting

constrainedtriangulation

Page 29: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

the future of the project...add kNN supportthe future of the project...add kNN support

8kB8kB8kB8kB8kB8kB8kB8kB

8kB8kB8kB8kB

8kB8kB8kB8kB8kB8kB8kB8kB

8kB8kB8kB8kB

8kB8kB8kB8kB

8kB8kB8kB8kB

k

get blocks gradually included within the k parameter, as already madein GiST with single tuples

WIP…sponsorships are welcome!

ORDER BY geoms <-> point LIMIT N

Page 30: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

ConclusionsConclusions• BRINs can be successfully used in geo DB based on PostgreSQL

– totally support PostGIS datatype• A patch almost ready for the next PostGIS release (2.3.0)

– easier indexes maintenance, low indexing time, low impact on concurrent INSERTs

– less specific than GiST...but not to much!

• Really small indexes!– GiST performances drop as well as it cannot be totally contained in RAM

• Can be relational approach to LiDAR data valid with PostgreSQL?– Yes, at least for bbox inclusion searches

– make sure that data has the same sequentiality of .las

Page 31: Gbroccolo foss4 guk2016_brin4postgis

2ndQuadrant Italia Giuseppe Broccolo – [email protected]

Creative Commons license

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

http://creativecommons.org/licenses/by-nc-sa/2.5/it/© 2016 2ndQuadrant Italia – http://www.2ndquadrant.it