gbroccolo - use of indexes on geospatial databases with postgresql - foss4g.eu 2015

Post on 15-Aug-2015

302 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Use of indexes on geospatial database with the PostgreSQL

DBMS

Giuseppe Broccolo

www.2ndquadrant.it

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

$~# whoami

• PostgreSQL and PostGIS consultant– Development, Replication, Disaster Recovery, pre-production Benchmark,

Remote DBA, 24/7 Support, Training

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Outline

• Indexes on geospatial DBs

• What does PostgreSQL offer?

• Examples of usage:– Points in PostgreSQL– Points in PostGIS extension– (LiDAR) points in PointCloud extension

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Indexes on geospatial databases

• Binary structure used to speed up accesses to data:–

– In case of trees: balanced/unbalanced structure of nodes

– Theoretical performances:• R/W: ~O(log N) Size: ~O(N)

– Algorithms are not defined by ordering/comparison but placement operators

– Index nodes are defined starting from the MBR containing the whole dataset

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

MBR

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

MBR

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

MBR

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

MBR

Balanced:● R-tree, etc.

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

MBR

Unbalanced:● Kd-tree, Quad-tree, etc.

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

What PostgreSQL offers

• “in core” 2D geometric (not geografic) datatype– Fixed resolution: double precision– point, circle, box– @-@, @@, <->, &&, <<, >>, <<|, |>>, ...

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

What PostgreSQL offers

• PostGIS extension:– geometry, geography

– <@, @>, &&, <<, >>, <<|, |>>, ...– ST_Lenght(), ST_Distance(), ...

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Tree indexes in PostgreSQL• Balanced indexes

– B-Tree– GIN (Generalized Inverted Index) – fast accesses to data – GiST (Generalized Search Tree) – good concurrency, “lossy”

• kNN searches

• Unbalanced index– SP-GiST (Space Partitioned GiST) – low I/O

• Introduced in PostgreSQL 9.2• Usable in PostGIS >2.1

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Tree indexes in PostgreSQL• Balanced indexes

– B-Tree– GIN (Generalized Inverted Index) – fast accesses to data – GiST (Generalized Search Tree) – good concurrency, “lossy”

• kNN searches

• Unbalanced index– SP-GiST (Space Partitioned GiST) – low I/O

• Introduced in PostgreSQL 9.2• Usable in PostGIS >2.1

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Work with 2D points sets

• The test environment: Vagrant VM (Ubuntu 14.04)– Single virtual core 2.26GHz, RAM 512MB, Disco 7.2k

• PostgreSQL 9.4 + PostGIS 2.1– postgresql.conf: default

• ~10M of points– Nearest Neighbours search – Bounding Box inclusion

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Work with 2D points sets

• The test environment: Vagrant VM (Ubuntu 14.04)– Single virtual core 2.26GHz, RAM 512MB, Disco 7.2k

• PostgreSQL 9.4 + PostGIS 2.1– postgresql.conf: default

• ~10M of points– Nearest Neighbours search – Bounding Box inclusion

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Work with 2D points sets

• The test environment: Vagrant VM (Ubuntu 14.04)– Single virtual core 2.26GHz, RAM 512MB, Disco 7.2k

• PostgreSQL 9.4 + PostGIS 2.1– postgresql.conf: default

• ~10M of points– Nearest Neighbours search – Bounding Box inclusion

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Indexes creation on the 2D sample– point datatype supports both GiST and SPGiST indexing

=# CREATE INDEX idx_gist_point ON many_point USING gist(point);

=# CREATE INDEX idx_spgist_point ON many_point USING spgist(point);

– geometry(point,0) datatype supports only GiST indexing

=# CREATE INDEX idx_gist_geom ON many_geom USING gist(point);

=# CREATE INDEX idx_spgist ON many_geom USING spgist(point);

ERROR: data type geometry has no default operator class for access method "spgist"

HINT: You must specify an operator class for the index or define a default operator class for the data type.

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Indexes creation on the 2D sample

index size table size time

idx_gist_point 715MB 653MB 214s

idx_spgist_point 437MB 653MB 137s

idx_gist_geom 523MB 501MB 290s

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Nearest Neighbours search (2D)

– point

SELECT *

FROM many_geom

ORDER BY ST_MakePoint(0.5, 0.5) <-> geom LIMIT 10;

– geometry(point,0)

SELECT *

FROM many_point

ORDER BY point(0.5, 0.5) <-> point LIMIT 10;

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Nearest Neighbours search (2D)• Query timing (without & with indexes):

– point

– geometry(point,0)

planner strategy exec. time

Seq. Scan + Sort 7.3s

planner strategy exec. time

Seq. Scan + Sort 17.2s

planner strategy exec. time

Index Scan (idx_gist_point)

52ms

planner strategy exec. time

Index Scan (idx_gist_geom)

18ms

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Bounding Box inclusion (2D)

– point

SELECT *

FROM many_geom

WHERE point && ST_MakeBox2D(ST_MakePoint(0.4, 0.4), ST_MakePoint(0.6, 0.6));

– geometry(point,0)

SELECT *

FROM many_point

WHERE point <@ box(point(0.4, 0.4), point(0.6, 0.6));

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Bounding Box inclusion (2D)• Query timing (without & with indexes):

– point

– geometry(point,0)

planner strategy exec. time

Seq. Scan + <@ 5.7s

planner strategy exec. time

Seq. Scan + && 2.0s

planner strategy exec. time

Index Scan (idx_spgist_point)

0.4s

planner strategy exec. time

Index Scan (idx_gist_geom)

0.7s

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Bounding Box inclusion (2D)• Query timing (without & with indexes):

– point

– geometry(point,0)

planner strategy exec. time

Seq. Scan + <@ 5.7s

planner strategy exec. time

Seq. Scan + && 2.0s

planner strategy exec. time

Index Scan (idx_spgist_point)

0.4s

planner strategy exec. time

Index Scan (idx_gist_geom)

0.7s

Unbalanced indexes intrinsecally provide boxed sample in their nodes

Used in BB inclusion!!

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Work with (many) 3D points in PostgreSQL

• The OpenGeo suite (Boundless – P. Ramsey)– Include postgis and pointcloud extensions

• Casting between the two points datatype is allowed• pointcloud allows to use the patches to reduce the

whole data size

– No packages available to work with PostgreSQL 9.4– Can import LiDAR data from .LAS files

http://suite.opengeo.org/4.1/whatsnew.html

http://suite.opengeo.org/opengeo-docs/dataadmin/pointcloud/loadingdata.html#loading-with-pdal

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

An example of usage: 1G points cloud

• The test environment:– 16GB RAM, 1TB RAID1 storage, 8 CPU @3.3GHz, PostgreSQL 9.3

• Use the pointcloud extension– one point → one record

• Search points inside a BB and NN

4B 4B 4B 2B

http://suite.opengeo.org/opengeo-docs/dataadmin/pointcloud/schemas.html

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Build the index

table size GiST index size building time

56GB 59GB 6h

CREATE INDEX pc_gist_idx ON pcpoints USING gist(Geometry(pt));

You have to cast to PostGIS point datatype to use GiST index

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

BB inclusion with 1G points cloud

included points execution time(no index)

execution time(with index)

1M 798s 208ms

10M - 9.27s

100M - 99.7s

300M - 682s

SELECT * FROM pcpoint

WHERE Geometry(pt) &&

ST_SetSRID(ST_3DMakeBox(ST_MakePoint(0, 0, 100),

ST_MakePoint(100, 100, 500)), 4326);

Index is always used!

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

BB inclusion with 1G points cloud using patches

WITH sel AS (

SELECT PC_Explode(pa) AS pc FROM pcpatch

WHERE ST_SetSRID(ST_GeomFromEWKB(PC_Envelope(pa)), 4326) &&

ST_SetSRID(ST_3DMakeBox(ST_MakePoint(0, 0, 100),

ST_MakePoint(100, 100, 500)), 4326)

)

SELECT pc FROM sel

WHERE ST_Within(Geometry(pc),

ST_SetSRID(ST_3DMakeBox(ST_MakePoint(0, 0, 100),

ST_MakePoint(100, 100, 500)), 4326));

100k patches 10k points/patch (2h, 9.4GB)

http://suite.opengeo.org/4.1/dataadmin/pointcloud/objects.html#pcpatch

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

BB inclusion with 1G points cloud using patches

included points execution time(search of patches)

execution time(patch explosion)

1M 520ms 3s

10M 3.8s 16.5s

100M 33.8s 150s

So...indexed searchesare faster!

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Nearest neighbours search with 1G points cloud

searched points execution time(no index)

execution time(with index)

1M 2000s 1.41s

10M - 13.8s

SELECT *

FROM pcpoints

ORDER BY ST_SetSRID(ST_MakePoint(0, 0, 0), 4326) ↔ Geometry(pt)LIMIT <searched points>;

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Nearest neighbours search with 1G points cloud

searched points execution time(no index)

execution time(with index)

1M 2000s 1.41s

10M - 13.8s

SELECT *

FROM pcpoints

ORDER BY ST_SetSRID(ST_MakePoint(0, 0, 0), 4326) ↔ Geometry(pt)LIMIT <searched points>;

Index blocks in memory are used,

then SeqScanssearched points execution time

100M 2100s

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Conclusions

• PostgreSQL includes many features to work with geospatial entities– 2D in core geometries, PostGIS, PointCloud (, ...)

• Indexes can be successfully used– Improved performances for geospatial entities introduced with PostGIS

• Waiting for SP-GiST indexes (PostGIS >2.1)

• Performances achievable for higher number of entries show that geospatial features in the PostgreSQL DBMS can be suitable for the range 100M-1G

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Questions?

• giuseppe.broccolo@2ndquadrant.it

• @giubro

• gemini__81

• gbroccolo7

2ndQuadrant Italia Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it

FOSS4G.EU 2015Como, Politecnico di Milano

July 14th-17th 2015

Creative Commons License

Copyright 2012-2015,

2ndQuadrant Italia - http://www.2ndquadrant.it

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

top related