voronoi -based geospatial query processing with mapreduce afsin akdogan , ugur demiryurek ,
DESCRIPTION
Voronoi -based Geospatial Query Processing with MapReduce Afsin Akdogan , Ugur Demiryurek , Farnoush Banaei-Kashani , Cyrus Shahabi Integrated Media System Center, University of Southern Califonia Los Angeles, CA 90089 Cloud Computing Technology and Science ( CloudCom), - PowerPoint PPT PresentationTRANSCRIPT
Voronoi-based Geospatial Query Processing with MapReduce
Afsin Akdogan, Ugur Demiryurek, Farnoush Banaei-Kashani, Cyrus Shahabi
Integrated Media System Center, University of Southern Califonia Los Angeles, CA 90089
Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on
2014.06.16(MON)Regular seminar
TaeHoon Kim
/25
Contents Introduction Background Constructing Voronoi Diagram with MapReduce Query processing Performance Evaluation Conclusions
/25
1. Introduction With the recent advances in location-based services, the
amount of geospatial data is rapidly growing Geospatial queries are computationally complex problems
which are time consuming to solve, especially with large datasets
On the other hand, we observe that a large variety of geospa-tial queries are intrinsically parallelizable.
Cloud computing enables a considerable reduction in opera-tional expenses
by providing flexible resources that can instantaneously scale up and down
The big IT vendors including Google, IBM, Microsoft and Ama-zon are ramping up parallel programming infrastructures
3ramp something up ~ 을 늘리다 [ 증가시키다 ]
/25
1. Introduction Google’s MapReduce programing
model Parallelization for processing large
datasets Can do that at a very large scale
Google processes 20 petabytes of data per day with MapReduce
A variety of geospatial queries can be efficiently modeled using MapReduce
Reverse Nearest Neighbor query(RNN)
RNN Given a query point q and a set of
data points P, RNN query retrieves all the data points that have q as their nearest neighbors 4
Example of RNN Query
p1
p2
p4
p3p5
p5: {p2, p3, p4}
/25
1. Introduction Parallel spatial query processing has been studied in the
contexts of parallel databases, cluster systems, and peer to peer systems as well as cloud platforms
However, the points in each partition are not locally indexed
Meanwhile, several approaches that use distributed hierar-chical index structures
R-tree based P2PR-Tree, SD-Rtree, kd-tree based k-RP, quad-tree based hQT
Problems of distributed hierarchical index structures Do not scale due to the traditional top-down search that over-
loads the nodes near the tree root, and fail to provide full de-centralization
5
/25
1. Introduction R-Tree index with MapReduce
Without supporting types of query Motivate the problem of geospatial query processing without
specifically explaining how to process the queries Hierarchical indices like R-tree are structurally not suitable for
MapReduce
In this paper, we propose a MapReduce-based approach Both constructs a flat spatial index, Voronoi diagram (VD), and
enables efficient parallel processing of a wide range of spatial queries
Spatial queries reverse nearest neighbor (RNN), maximum reverse nearest neighbor (MaxRNN) k nearest neighbor (kNN) queries 6
/25
2. Background Voronoi diagrams
A voronoi diagram decom-poses a space into disjoint polygons based on the set of generators(i.e., data points)
Given a set of generators S in the Euclidean space, Voronoi diagram associates all locations in the plane to their closest generator
Each generator s has a Voronoi polygon consisting of all points closer to s than to any other site
7
S1S2
S3
S6S5
S4
: Data point: A set of generators
{S1, S2, S3, S4, S5 } >
/25
2. Background
8
MapReduce Hadoop first each mapper is assigned to one or more splits de-
pending on the number of machines in the cluster Secondly, each mapper reads inputs provided as a <key,
value> pair at a time, applies the map function to it and gen-erates intermediate <key, value>pairs as the output
Finally, reducers fetch the output of the mappers and merge all of the intermediate values that share the same intermediate key, process them together and generate the final output
/25
3. Constructing Voronoi Diagram with MapRe-duce Voronoi Diagram construction is inherently suitable for
MapReduce modeling Because a VD can be obtained by merging multiple partial
Voronoi diagrams (PVD) VD is built with MapReduce using divide and conquer method
The main idea behind the divide and conquer method 1. Given a set of data points P as an input in Euclidean space,
first the points are sorted in increasing order by X coordinate 2. P is separated into several subsets of equal size 3. A PVD is generated for the points of each subset, and then
all of the PVDs are merged to obtain the final VD for P
9
/25
3. Constructing Voronoi Diagram with MapRe-duce Map phase
Given a set of data points P = { p1,p2 … p13,p14 } as input, first the points are sorted, then they are divided into two splits equal in size
The first mapper reads the data points in Split 1, generates a PVD and emits <1, PVD1>
Likewise, the second mapper emits, <1, PVD2>
Reduce phase The reducer aggregates these PVDs and merges them into a
single VD The final output of the reducer is in the following <key, value>
format : <pi, VNs(Pi) > where VNs(Pi) represents the set of Voronoi neighbors of pi
10
/19
3. Voronoi Generation: Map phase
Split 1 Split 2
Generate Partial Voronoi Dia-grams (PVD)
p3
p4p6p2
p1p5
p8
p7
<key, value>: <1, PVD(Split 1)><1>, <p2,p3,p4>
emit emit
right
right
left
left
<key, value>: <1, PVD(Split 2)<1>, <p6,p7,p8>
• mapper reads the data points in Split 1, gen-erates a PVD and emits
• Output : <1, PVD1>
/25
Split 1 Split 2Remove superfluous edges and generate new edges
p3
p4p6p2
p1p5
p8
p7right
right
left
leftemit
<key, value>: <point, Voronoi Neighbors> <p1, {p2, p3}> <p2, {p1, p3, p4}> <p3, {p1, p2, p4, p5}> …..
3. Voronoi Generation: Reduce phase
• The reducer ag-gregates these PVDs and merges them into a sin-gle VD
• Final Output<pi, VNs(Pi) >
/25
4. Query Processing : RNN Suppose the input the map function is
<point, Voronoi Neighbors(VNs)>,
Map phase Each point p finds its Nearest Neighbor Emit : <NN(pn), pn>
Reduce phase The reducer aggregates all pairs with the same key pK Emit : <point, Reverse Nearest Neighbors>
13
/25
4. RNN query processing
14
emit
<p5, p2><p5, p3><p5, p4> Map
Re-duce
Reducer<p5, {p2, p3, p4}>emit
Mapper
p1
p2
p4
p3p5
/25
4. Query Processing : MaxRNN Nearest Neighbor Circle(NNC)
Given a point p and its nearest neighbor p’, NNC of p is the cir-cle centered at p’ with radius |p, p’|
MapReduce Finds the NN of every point and computes the radiuses of the
circles Finds the overlapping circles first, Then, it finds the intersection
points that represent the optimal region
15
/25
4. Max RNN query processing
16
emit Map
i1,i2,i3
Mapper 1st stepp1 finds p3 as the nearest neighbor among its VNs {p2,p3} and sets |p1,p3| as the ra-
dius of its NNC<(p1, |p1,p3|)>, <(p2,r2),(p3 r3)>
Mapper 2nd stepp1 checks its VNs<p2,p3>, finds that is over-
laps <p2,p3> and computes the intersection points
{i2,i3,i5,i6}Each intersection point i, then computes its weight by counting the NNCs covering I <1,
i,w(i)>
p1
p3
p2 r2r
3
p1
p3
p2
i5
i1i2
i3i4
i6
ReducerFinds the intersection
points with the heightest weights who are emitted as
the final ouputemit
<1, (i2,3)>, <1, (i3,3)> <1, (i5,2)> <1,(i6,2)>
/25
4. Query Processing : kNN K Nearest Neighbor Query(kNN)
Given a query point q and a set of data points P, k Nearest Neighbor(kNN) query finds the k closest data point to q where d(q, pi) d(q, p)
Suppose we answer a 3NN query with two mappers and one reducer for query point q1
Map phase Each mapper reads its split Simultaneously process the query point q2 and other queries as
well
Reduce phase Receives the query point as key and its neighbors as value,
and emits them as they are17
/25
4. kNN query processing
Map NN is among the VNs of p1, { p2, p3, p4, p5, p6, p7, p8 }
Map Output <q1>, <{p1, p2, p5}>Reduce Receives the query point as key and its neighbors as value,
and emits them as they are <q1,> <p1, p2 … pi> <q2,> <p1, p2 … pi> 18
• we answer a 3NN query with two mappers and one re-ducer for query point q1
/25
Performance Evaluation Competitors
Voronoi Diagram R-tree
Factor Index generation time Query response time
Parameter Node number kNN : K
Datasets BSN: all businesses in the entire U.S., containing approximately
1,300,000 data points. RES: all restaurants in the entire U.S., containing approxi-
mately 450,000 data points.
19
Environment Hadoop on Amazon EC2
0.20.1 64bit Fedora 8 Linux
Operating System with 15GB memory
4 virtual cores, and disks with 1,690 GB storage
/25
Performance Evaluation
20
VD & R-Tree construction time Construction of the R-tree index takes less time than that of VD Because, the mapper generating PVDs execute expensive
computations and there is only one reducer merging the PVDs in the cluster
/25
Performance Evaluation
21
NN query on VD & R-Tree Our experiments show that VD outperforms R-tree in query re-
sponse time This is because with VD, points can immediately find there VNs
in the map phase
/25
Performance Evaluation RNN
The average response time slightly decreases with the increas-ing number of data points; hence the overall throughput in-creases
22
Single Node without parallelism
Multi Node with paral-lelism
/25
Performance Evaluation MaxRNN
As shown, parallelization reduces the response time at linear rate for both datasets
We observe a performance increase from %30 up to 50%
23Effect of node num-
ber on MaxRNN
/25
Performance Evaluation kNN
This is because, for given number of data points P and given number of query point Q, (MapReduce based kNN search)MRK always outputs P X Q <key, value> pairs in the map phase in-dependent of k
Due to the massive amount of data generated by the mapper, the response time is high
24
/25
Conclusions In this paper, we investigated the problem of handling and
efficient querying of massive spatial data in a cloud archi-tecture
We have presented a distributed Voronoi index and tech-niques to answer three types of geospatial queries
RNN, MaxRNN, kNN
Future Work : Supporting other types of queries Bichromatic reverse nearest neighbor Skyline reverse k nearest neighbor
25