gbroccolo foss4 guk2016_brin4postgis
TRANSCRIPT
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
June, 14th 2016
BRIN indexes on
geospatial
databases
www.2ndquadrant.com
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
~$ whoami~$ whoami
Giuseppe Broccolo, PhDGiuseppe Broccolo, PhD PostgreSQL & PostGIS consultantPostgreSQL & PostGIS consultant
@giubro gbroccolo7
gbroccolo gemini__81
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
Indexes & PostgreSQLIndexes & PostgreSQL
• Binary structures on disc:
– Indexing organised in nodes into 8kB pages
– Nodes point to table rows
• Speed up data access: ~O(log N)
– High performance until the index can be contained in RAM
• Size: ~O(N)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
Geospatial indexes in PostgreSQLGeospatial indexes in PostgreSQL
• All datatypes can be indexed in principle...– ...just define the needed Operator Class!
• Operators & support functions
– ...define how the data has to be stored into the index nodes
• Geospatial data:– can be larger than 8kB– Is generally a varlena– a single node cannot be contained into a single page
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
PostGIS datatypes & GiST indexingPostGIS datatypes & GiST indexing
In PostGIS (since v0.6) geometry/geography areconverted into their float-precision bounding boxes
gidx/box2df are then used as storagedatatype to populate the index’s nodes
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
GIS & BigDataGIS & BigData
LiDAR surveys
Maps of entire countries
It’s relatively easy to have to work with terabytes of geospatial data…
...can indexes be contained in RAM?
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
A new index: Block Range Indexing (PG 9.5)A new index: Block Range Indexing (PG 9.5)
CREATE INDEX idx ON table
USING brin(column) WITH (pages_per_range=10);
data must be physically sorted on disk!
index node → row
index node → block
less specific than GiST!
Really small!
8kB8kB8kB8kB8kB8kB8kB8kB
8kB8kB8kB8kB
(S. Riggs, A. Herrera)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
BRIN support for PostGISBRIN support for PostGIS
https://github.com/dalibo/postgis/tree/for_upstream
Julien RouhaudRonan Dunklau me
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
BRIN support for PostGISBRIN support for PostGIS
NO kNN SUPPORT!!
– Different OpClass for • 2D (default), 3D, 4D geometry• geography
• box2d/box3d
• cross-operators defined in the OpFamily
– Storage datatype: float-precision (such as in GiST)• gidx (3D, 4D geometry)
• box2df (2D geometry, geography)
– Operators: &&, @, ~ (2D) - &&& (3D, 4D)
8kB8kB8kB8kB
8kB8kB8kB8kB
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
11stst example: “sorted” example: “sorted” geometrygeometry points points~10M, 3D, SRID=0, distribution range: 0÷100kunits
=# CREATE INDEX idx_gist ON points USING gist
-# (geom gist_geometry_ops_nd);
=# CREATE INDEX idx_brin_128 ON points USING brin
-# (geom brin_geometry_inclusion_ops_3d);
=# CREATE INDEX idx_brin_10 ON points USING brin
-# (geom brin_geometry_inclusion_ops_3d)
-# WITH (pages_per_range=10);
BRIN(128)/GiST 1/2000
BRIN(10)/GiST 1/1000
BRIN(128)/GiST 1/30
BRIN(10)/GiST 1/20
SizesBuilding
times
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
11stst example: “sorted” example: “sorted” geometrygeometry points points~10M, 3D, SRID=0, distribution range: 0÷100kunits
=# SELECT * FROM points
-# WHERE ‘BOX3D(10. 10. 10., 12., 12., 12.)’::box3d &&& geom;
BRIN(128)/GiST 50/1
BRIN(10)/GiST 6/1
BRIN(128)/SeqScan 1/60
BRIN(10)/SeqScan 1/350
Execution times
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
22ndnd example: “unsorted” example: “unsorted” geometrygeometry points points~10M, 3D, SRID=0, distribution range: 0÷100kunits
=# CREATE TABLE unsorted_points AS
-# SELECT * FROM points ORDER BY random();
=# SELECT * FROM unsorted_points
-# WHERE ‘BOX3D(10. 10. 10., 12., 12., 12.)’::box3d &&& geom;
BRIN(128)/GiST 2800/1
BRIN(10)/GiST 400/1
BRIN(128)/SeqScan 1/1
BRIN(10)/SeqScan 1/12
Execution times
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
22ndnd example: “unsorted” example: “unsorted” geometrygeometry points points
=# CREATE EXTENSION pageinspect;
=# SELECT blknum, value FROM brin_page_items(get_raw_page('idx_brin_128', 2), 'idx_brin_128')
-# LIMIT 5;
blknum | value
--------+------------------------------------------------------------------------
0 | {GIDX( 1.11394846439 1.64980053902 1.11335003376, 99982.2578125 99982.64062599982.1171875 ) .. f .. f}
128 | {GIDX( 0.224781364202 0.184096381068 0.798862099648, 99995.859375 99995.17187599995.0390625 ) .. f .. f}
256 | {GIDX( 6.25802087784 6.50119876862 6.8031873703, 99993.15625 99993.820312599993.2109375 ) .. f .. f}
384 | {GIDX( 1.9203556776 1.11907613277 1.55043530464, 99995.421875 99995.23437599995.421875 ) .. f .. f}
512 | {GIDX( 3.44181084633 3.4120452404 3.41762590408, 99996.3125 99996.7109375 99996.59375) .. f .. f}
(5 rows)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
...what about ...what about INSERTINSERT’s?’s?
INSERT INTO points
SELECT ST_MakePoint(a, a, a)
FROM generate_series(1, 1000) AS f(a);
GiST 6 times than insertions without index
BRIN (10) 3 times than insertions without index
BRIN (128) 2 times than insertions without index
...blocks could be “not sorted” anymore!
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
Manage LiDAR
data with
PostgreSQL
is it possible?
Is the relational approach valid for point clouds?Is the relational approach valid for point clouds?
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
PostgreSQL & LiDARPostgreSQL & LiDAR• PostgreSQL is a relational DBMS with an extension for LiDAR data:
pg_pointcloud
• Part of the OpenGeo suite, completely compatible with PostGIS– two new datatype: pcpoint, pcpatch (compressed set of points)
• N points (with all attributes from the survey) → 1 patch → 1 record
– compatible with PDAL drivers to import data directly from .las 32b 32b 32b 16b
https://github.com/pgpointcloud/pointcloud
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
Relational approach to LiDAR data with PGRelational approach to LiDAR data with PG
• GiST indexing in PostGIS• GiST performances:
– storage: 1TB RAID1, RAM 16GB, 8 CPU @3.3GHz,PostgreSQL9.3
– index size ~O(table size)– Index was used:
• up to ~300M points in bbox inclusion searches• up to ~10M points in kNN searches
LiDAR size: ~O(109÷1011) → few % can be properly indexed!
http://www.slideshare.net/GiuseppeBroccolo/gbroccolo-foss4-geugeodbindex
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
The LiDAR dataset: the ahn2 projectThe LiDAR dataset: the ahn2 project
• 3D point cloud, coverage: almost the whole Netherlands– EPSG: 28992, ~8 points/m2
• 1.6TB, ~250G points in ~560M patches (compression: ~10x)– PDAL driver – filter.chipper
• available RAM: 16GB
• the point structure: X Y Z scan LAS time RGB chipper
32b 32b 32b 40b 16b 64b 48b 32b
the “indexed” part (can be converted to PostGIS datatype)
(thanks to Tom Van Tilburg)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
Typical searches on ahn2 - Typical searches on ahn2 - x3d_viewerx3d_viewer
• Intersection with a polygon (PostGIS)– the part that lasts longer – need indexes here!
• Patch “explosion” + NN sorting (pg_PointCloud+PostGIS)
• constrained Delaunay Triangulation (SFCGAL)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
WITH patches AS (
SELECT patches FROM ahn2
WHERE patches && ST_GeomFromText(‘POLYGON(...)’)
), points AS (
SELECT ST_Explode(patches) AS points
FROM patches
), sorted_points AS (
SELECT points,
ST_DumpPoints(ST_GeomFromText(‘POLYGON(...)’))).geom AS poly_pt
FROM points ORDER BY points <#> poly_pt LIMIT 1;
), sel AS (
SELECT points FROM sorted_points
WHERE points && ST_GeomFromText(‘POLYGON(...)’)
)
SELECT ST_Dump(ST_Triangulate2DZ(ST_Collect(points))) FROM sel;
All in the DB & just with one query!All in the DB & just with one query!
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
patches && polygons - GiST performancepatches && polygons - GiST performance
GiST 2.5 days
GiST 29GB
index buildingindex size
polygonsize
timing
~O(10m) ~20ms
~O(100m) ~60ms
~O(1Km) ~3s
~O(10km) hours
searches based on GiST
index not contained inRAM anymore
(~5G points → ~3%)
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
patches && polygons - BRIN performancepatches && polygons - BRIN performance
BRIN 4 h BRIN 15MB
index building
polygonsize
timing
~O(m) ~150s
searches based on BRIN
index size
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
patches && polygons - BRIN performancepatches && polygons - BRIN performance
BRIN 5 h BRIN 14MB
index building
polygonsize timing
~O(m) ~150s
searches based on BRIN
How data was inserted...
LASLASLASLASLASLAS
chipper
chipper
chipper
PDAL driver(parallel processes)
ahn2
index size
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
Spatial sorting of a LiDAR datasetSpatial sorting of a LiDAR dataset
CREATE INDEX patch_geohash ON ahn2
USING btree (ST_GeoHash(ST_Transform(Geometry(patch), 4326), 20));
CLUSTER ahn2 USING patch_geohash;
Morton order [http://geohash.org/]
Need more than 2X table size free space
Relatively fast!
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
Spatial sorting of a LiDAR datasetSpatial sorting of a LiDAR dataset
CREATE TABLE ahn2_sorted ON ahn2 AS
SELECT * FROM ahn2
ORDER BY ST_GeoHash(ST_Transform(Geometry(patch), 4326), 10)
COLLATE “C”;
Morton order [http://geohash.org/]
Just need size for a second table
Slower than clustering using the index!
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
Spatial sorting of a LiDAR datasetSpatial sorting of a LiDAR dataset
CREATE TABLE ahn2_sorted ON ahn2 AS
SELECT * FROM ahn2
ORDER BY ST_GeoHash(ST_Transform(Geometry(patch), 4326), 10)
COLLATE “C”;
Morton order [http://geohash.org/]
Just need size for a second table
Slower than clustering using the index!
45 hours
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
patches && polygons – BRIN performancepatches && polygons – BRIN performance
polygonsize
BRINtiming
GiSTtiming
~O(10m) ~380ms ~20ms
~O(100m) ~400ms ~60ms
~O(1Km) ~2.7s ~3s
~O(10km) ~4.7s hours
• After sorting: 150s→380ms
• GiST X20 faster than BRIN [r~O(10m)]
– BRIN X200 faster than SeqScan
• BRIN X1000 faster than SeqScan [r~O(100m)]•
• BRIN ~ GiST [r~O(1Km)]
• Just BRIN is used for searches r>1km
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
is the drop in performance acceptable?is the drop in performance acceptable?
BRIN searches = x20 GiST searches
= x10÷x100 GiST searches
= x10÷x100 GiST searches
patch explosion+
NN sorting
constrainedtriangulation
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
the future of the project...add kNN supportthe future of the project...add kNN support
8kB8kB8kB8kB8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
k
get blocks gradually included within the k parameter, as already madein GiST with single tuples
WIP…sponsorships are welcome!
ORDER BY geoms <-> point LIMIT N
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
ConclusionsConclusions• BRINs can be successfully used in geo DB based on PostgreSQL
– totally support PostGIS datatype• A patch almost ready for the next PostGIS release (2.3.0)
– easier indexes maintenance, low indexing time, low impact on concurrent INSERTs
– less specific than GiST...but not to much!
• Really small indexes!– GiST performances drop as well as it cannot be totally contained in RAM
• Can be relational approach to LiDAR data valid with PostgreSQL?– Yes, at least for bbox inclusion searches
– make sure that data has the same sequentiality of .las
2ndQuadrant Italia Giuseppe Broccolo – [email protected]
Creative Commons license
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License
http://creativecommons.org/licenses/by-nc-sa/2.5/it/© 2016 2ndQuadrant Italia – http://www.2ndquadrant.it