voronoi -based geospatial query processing with mapreduce afsin akdogan , ugur demiryurek ,

25
Voronoi-based Geospatial Query Processing with MapReduce Afsin Akdogan, Ugur Demiryurek, Farnoush Banaei-Kashani, Cyrus Shahabi Integrated Media System Center, University of Southern Califonia Los Angeles, CA 90089 Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on 2014.06.16(MON) Regular seminar TaeHoon Kim

Upload: kylar

Post on 24-Feb-2016

68 views

Category:

Documents


0 download

DESCRIPTION

Voronoi -based Geospatial Query Processing with MapReduce Afsin Akdogan , Ugur Demiryurek , Farnoush Banaei-Kashani , Cyrus Shahabi Integrated Media System Center, University of Southern Califonia Los Angeles, CA 90089 Cloud Computing Technology and Science ( CloudCom), - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

Voronoi-based Geospatial Query Processing with MapReduce

Afsin Akdogan, Ugur Demiryurek, Farnoush Banaei-Kashani, Cyrus Shahabi

Integrated Media System Center, University of Southern Califonia Los Angeles, CA 90089

Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on

2014.06.16(MON)Regular seminar

TaeHoon Kim

Page 2: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

Contents Introduction Background Constructing Voronoi Diagram with MapReduce Query processing Performance Evaluation Conclusions

Page 3: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

1. Introduction With the recent advances in location-based services, the

amount of geospatial data is rapidly growing Geospatial queries are computationally complex problems

which are time consuming to solve, especially with large datasets

On the other hand, we observe that a large variety of geospa-tial queries are intrinsically parallelizable.

Cloud computing enables a considerable reduction in opera-tional expenses

by providing flexible resources that can instantaneously scale up and down

The big IT vendors including Google, IBM, Microsoft and Ama-zon are ramping up parallel programming infrastructures

3ramp something up ~ 을 늘리다 [ 증가시키다 ]

Page 4: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

1. Introduction Google’s MapReduce programing

model Parallelization for processing large

datasets Can do that at a very large scale

Google processes 20 petabytes of data per day with MapReduce

A variety of geospatial queries can be efficiently modeled using MapReduce

Reverse Nearest Neighbor query(RNN)

RNN Given a query point q and a set of

data points P, RNN query retrieves all the data points that have q as their nearest neighbors 4

Example of RNN Query

p1

p2

p4

p3p5

p5: {p2, p3, p4}

Page 5: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

1. Introduction Parallel spatial query processing has been studied in the

contexts of parallel databases, cluster systems, and peer to peer systems as well as cloud platforms

However, the points in each partition are not locally indexed

Meanwhile, several approaches that use distributed hierar-chical index structures

R-tree based P2PR-Tree, SD-Rtree, kd-tree based k-RP, quad-tree based hQT

Problems of distributed hierarchical index structures Do not scale due to the traditional top-down search that over-

loads the nodes near the tree root, and fail to provide full de-centralization

5

Page 6: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

1. Introduction R-Tree index with MapReduce

Without supporting types of query Motivate the problem of geospatial query processing without

specifically explaining how to process the queries Hierarchical indices like R-tree are structurally not suitable for

MapReduce

In this paper, we propose a MapReduce-based approach Both constructs a flat spatial index, Voronoi diagram (VD), and

enables efficient parallel processing of a wide range of spatial queries

Spatial queries reverse nearest neighbor (RNN), maximum reverse nearest neighbor (MaxRNN) k nearest neighbor (kNN) queries 6

Page 7: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

2. Background Voronoi diagrams

A voronoi diagram decom-poses a space into disjoint polygons based on the set of generators(i.e., data points)

Given a set of generators S in the Euclidean space, Voronoi diagram associates all locations in the plane to their closest generator

Each generator s has a Voronoi polygon consisting of all points closer to s than to any other site

7

S1S2

S3

S6S5

S4

: Data point: A set of generators

{S1, S2, S3, S4, S5 } >

Page 8: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

2. Background

8

MapReduce Hadoop first each mapper is assigned to one or more splits de-

pending on the number of machines in the cluster Secondly, each mapper reads inputs provided as a <key,

value> pair at a time, applies the map function to it and gen-erates intermediate <key, value>pairs as the output

Finally, reducers fetch the output of the mappers and merge all of the intermediate values that share the same intermediate key, process them together and generate the final output

Page 9: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

3. Constructing Voronoi Diagram with MapRe-duce Voronoi Diagram construction is inherently suitable for

MapReduce modeling Because a VD can be obtained by merging multiple partial

Voronoi diagrams (PVD) VD is built with MapReduce using divide and conquer method

The main idea behind the divide and conquer method 1. Given a set of data points P as an input in Euclidean space,

first the points are sorted in increasing order by X coordinate 2. P is separated into several subsets of equal size 3. A PVD is generated for the points of each subset, and then

all of the PVDs are merged to obtain the final VD for P

9

Page 10: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

3. Constructing Voronoi Diagram with MapRe-duce Map phase

Given a set of data points P = { p1,p2 … p13,p14 } as input, first the points are sorted, then they are divided into two splits equal in size

The first mapper reads the data points in Split 1, generates a PVD and emits <1, PVD1>

Likewise, the second mapper emits, <1, PVD2>

Reduce phase The reducer aggregates these PVDs and merges them into a

single VD The final output of the reducer is in the following <key, value>

format : <pi, VNs(Pi) > where VNs(Pi) represents the set of Voronoi neighbors of pi

10

Page 11: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/19

3. Voronoi Generation: Map phase

Split 1 Split 2

Generate Partial Voronoi Dia-grams (PVD)

p3

p4p6p2

p1p5

p8

p7

<key, value>: <1, PVD(Split 1)><1>, <p2,p3,p4>

emit emit

right

right

left

left

<key, value>: <1, PVD(Split 2)<1>, <p6,p7,p8>

• mapper reads the data points in Split 1, gen-erates a PVD and emits

• Output : <1, PVD1>

Page 12: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

Split 1 Split 2Remove superfluous edges and generate new edges

p3

p4p6p2

p1p5

p8

p7right

right

left

leftemit

<key, value>: <point, Voronoi Neighbors> <p1, {p2, p3}> <p2, {p1, p3, p4}> <p3, {p1, p2, p4, p5}> …..

3. Voronoi Generation: Reduce phase

• The reducer ag-gregates these PVDs and merges them into a sin-gle VD

• Final Output<pi, VNs(Pi) >

Page 13: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

4. Query Processing : RNN Suppose the input the map function is

<point, Voronoi Neighbors(VNs)>,

Map phase Each point p finds its Nearest Neighbor Emit : <NN(pn), pn>

Reduce phase The reducer aggregates all pairs with the same key pK Emit : <point, Reverse Nearest Neighbors>

13

Page 14: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

4. RNN query processing

14

emit

<p5, p2><p5, p3><p5, p4> Map

Re-duce

Reducer<p5, {p2, p3, p4}>emit

Mapper

p1

p2

p4

p3p5

Page 15: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

4. Query Processing : MaxRNN Nearest Neighbor Circle(NNC)

Given a point p and its nearest neighbor p’, NNC of p is the cir-cle centered at p’ with radius |p, p’|

MapReduce Finds the NN of every point and computes the radiuses of the

circles Finds the overlapping circles first, Then, it finds the intersection

points that represent the optimal region

15

Page 16: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

4. Max RNN query processing

16

emit Map

i1,i2,i3

Mapper 1st stepp1 finds p3 as the nearest neighbor among its VNs {p2,p3} and sets |p1,p3| as the ra-

dius of its NNC<(p1, |p1,p3|)>, <(p2,r2),(p3 r3)>

Mapper 2nd stepp1 checks its VNs<p2,p3>, finds that is over-

laps <p2,p3> and computes the intersection points

{i2,i3,i5,i6}Each intersection point i, then computes its weight by counting the NNCs covering I <1,

i,w(i)>

p1

p3

p2 r2r

3

p1

p3

p2

i5

i1i2

i3i4

i6

ReducerFinds the intersection

points with the heightest weights who are emitted as

the final ouputemit

<1, (i2,3)>, <1, (i3,3)> <1, (i5,2)> <1,(i6,2)>

Page 17: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

4. Query Processing : kNN K Nearest Neighbor Query(kNN)

Given a query point q and a set of data points P, k Nearest Neighbor(kNN) query finds the k closest data point to q where d(q, pi) d(q, p)

Suppose we answer a 3NN query with two mappers and one reducer for query point q1

Map phase Each mapper reads its split Simultaneously process the query point q2 and other queries as

well

Reduce phase Receives the query point as key and its neighbors as value,

and emits them as they are17

Page 18: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

4. kNN query processing

Map NN is among the VNs of p1, { p2, p3, p4, p5, p6, p7, p8 }

Map Output <q1>, <{p1, p2, p5}>Reduce Receives the query point as key and its neighbors as value,

and emits them as they are <q1,> <p1, p2 … pi> <q2,> <p1, p2 … pi> 18

• we answer a 3NN query with two mappers and one re-ducer for query point q1

Page 19: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

Performance Evaluation Competitors

Voronoi Diagram R-tree

Factor Index generation time Query response time

Parameter Node number kNN : K

Datasets BSN: all businesses in the entire U.S., containing approximately

1,300,000 data points. RES: all restaurants in the entire U.S., containing approxi-

mately 450,000 data points.

19

Environment Hadoop on Amazon EC2

0.20.1 64bit Fedora 8 Linux

Operating System with 15GB memory

4 virtual cores, and disks with 1,690 GB storage

Page 20: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

Performance Evaluation

20

VD & R-Tree construction time Construction of the R-tree index takes less time than that of VD Because, the mapper generating PVDs execute expensive

computations and there is only one reducer merging the PVDs in the cluster

Page 21: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

Performance Evaluation

21

NN query on VD & R-Tree Our experiments show that VD outperforms R-tree in query re-

sponse time This is because with VD, points can immediately find there VNs

in the map phase

Page 22: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

Performance Evaluation RNN

The average response time slightly decreases with the increas-ing number of data points; hence the overall throughput in-creases

22

Single Node without parallelism

Multi Node with paral-lelism

Page 23: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

Performance Evaluation MaxRNN

As shown, parallelization reduces the response time at linear rate for both datasets

We observe a performance increase from %30 up to 50%

23Effect of node num-

ber on MaxRNN

Page 24: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

Performance Evaluation kNN

This is because, for given number of data points P and given number of query point Q, (MapReduce based kNN search)MRK always outputs P X Q <key, value> pairs in the map phase in-dependent of k

Due to the massive amount of data generated by the mapper, the response time is high

24

Page 25: Voronoi -based Geospatial Query Processing  with  MapReduce Afsin Akdogan ,  Ugur  Demiryurek ,

/25

Conclusions In this paper, we investigated the problem of handling and

efficient querying of massive spatial data in a cloud archi-tecture

We have presented a distributed Voronoi index and tech-niques to answer three types of geospatial queries

RNN, MaxRNN, kNN

Future Work : Supporting other types of queries Bichromatic reverse nearest neighbor Skyline reverse k nearest neighbor

25