on top-n reverse top-k queries: variants, algorithms, and applications

On Top-n Reverse Top-k Queries: Variants, Algorithms, and Applications

陳良弼Arbee L.P. Chen

National Chengchi University9/21/2012 at NCHU

IEEE International Conference on Data Engineering (ICDE)

• A premium international conference on databases

• Inaugural conference held at Los Angeles in 1984

• Held in Taiwan in 1995

ICDE2012 Research Papers Distribution

• System Aspects– Privacy and Security 8%– Storage Management and Performance 7%– Entity resolution/Versioning 7%– Query Processing 31%

• Top-k query 9%• Distributed/parallel/map-reduce 8%• Location-aware 5%• Execution Plan 5%• Graph indexing 4%

• Text/Web/Keyword Search 19%• Stream/Trajectory/Sequence/Spatio-Temporal

10%• Social Media 7%• Uncertain Database 6%• Data Mining 5%

Efficient Dual-Resolution Layer Indexing for Top-k Queries, ICDE2012

(price, distance to the airport)

(0.6, 0.2) (0.55,

(0.45, 0.6)

(0.3, 0.7)

(0.55, 0.3)

(0.3, 0.6)

(0.2, 0.7)

(0.7, 0.4)

(0.5, 0.5)

0.50.45

(price, distance to the airport)

(0.6, 0.2) (0.55,

0.4)(0.55, 0.3)

(0.3, 0.6)

(0.2, 0.7)

HotelH7H6H4H5H1

Answering Why-not Questions on Top-k Queries, ICDE2012

• Top-k query(Cleanliness, delicious, Parking spaces)

(95,80,40)

(70,20,30)

(50,90,60)

(75,70,50)

(85,60,60)

(58,20,30)

Top-2(0.4,0.5,0.1)

• Why-not question (Cleanliness, delicious, Parking spaces)

Why p5 is not in my top-2 query list?

p5 does not exist?Should I change my weights?

Should I revise my query to look for

top-5 hotels?

(95,80,40)

(70,20,30)

(50,90,60)

(75,70,50)

(85,60,60)

(58,20,30)

Top-2(0.5,0.4,0.1)

The Min-dist Location Selection Query, ICDE2012c1

Nearest facility distance

Minimize Nearest facility distance

Nearest facility distance

Introduction

• kNN (k-Nearest Neighbors) Queries

Assume k = 3

kNN(q) = {a, b, c}

Introduction

• RkNN (Reverse k-Nearest Neighbors) Queries

Assume k = 3

RkNN(q) = {a, …} d

Introduction• BRkNN (Bi-chromatic Reverse k-Nearest Neighbors)

Queries

Assume k = 3

BRkNN(q) = {a, …} d

Two types of data

Application Ishop

customer

Which location is the best?

Top-n Reverse kNN Queries

Given two types of data G (goal) and C (condition)G:C:

Retrieve n data points from G, which have the largest BRkNN values

Example: n=2, k=2

BR2NN value of g1 = 4

BR2Top-2 = {g2, g3}

Voronoi Diagram of G

: goal point (VD-node): condition point

A Filter-Refinement Frameworkfor Solving BRkNN Queries

Assume k = 2 Lower-bound region of VDi (layer 0)

Upper-bound region of VDi

(layer 0 ~ layer (k-1))

Layer 0

Layer 1

Filter phase

Assume k = 2

Construct bisectors layer by layer to reduce the region

Refinement PhaseAssume k = 2

For a data point p, we want to check VDs at layer 1 ~ layer 2 to make sure whether VDi is one of the 2NN of p

VDi:(VD13, 1.2)(VD26, 1.4)(VD27, 1.7)(VD3, 1.7)(VD4, 1.8)(VD30, 2.1)(VD5, 2.5)

(VD7, 4.8)

dist(p, VD30) ＞ 1.2

VDi:(VD13, 1.2)(VD26, 1.4)(VD27, 1.7)(VD3, 1.7)(VD4, 1.8)(VD30, 2.1)(VD5, 2.5)

(VD7, 4.8)

>1.2dist(VDi, VDj) ＞ 2dist(VDi, p)

Application II

Maximum Coverage BRkNN QueriesRetrieve 2 points from dataset GAssume k = 2

BRkNN value = 9

BRkNN value = 8

total = 12

total = 14

Maximum Coverage BRkNN Queries• Given:

– A set of goal points (G)– A set of condition points (C)– k: the k value of BRkNN

• Goal:– Find n points from G, g1, g2, …, gn, which maximize |

∪i=1~nBRkNN(gi,G,C)|

Application III• Find n Most Favorite Products based on Reverse Top-

k Queries

Airline Fare Food

a1 0.8 0.2

a2 0.6 0.4

a3 0.4 1

a4 0.4 0.8

a5 0.4 0.6

Hotel Location Comfort Cleanness

h1 0.4 0.6 0.4

h2 0.4 0.6 0.6

h3 0.4 0.8 0.2

h4 0.6 0.6 0.2

h5 0.6 0.8 0.4

h6 1 0.2 0.6

Airlines Hotels

Package Fare Food Location Comfort Cleanness

(a1, h1) 0.8 0.2 0.4 0.6 0.4

(a1, h2) 0.8 0.2 0.4 0.6 0.6

(a1, h3) 0.8 0.2 0.4 0.8 0.2…

(a5, h5) 0.4 0.6 0.6 0.8 0.4

(a5, h6) 0.4 0.6 1 0.2 0.6

All candidate packages

Which are the most favorite packages? 31

(a1, h1) 0.8 0.2 0.4 0.6 0.4

(a1, h2) 0.8 0.2 0.4 0.6 0.6

(a1, h3) 0.8 0.2 0.4 0.8 0.2

(a5, h5) 0.4 0.6 0.6 0.8 0.4

(a5, h6) 0.4 0.6 1 0.2 0.6

Customer Fare Food Location Comfort Cleanness

c1 0 0.2 0.5 0.1 0.2

c2 0.1 0.3 0.1 0.3 0.2

c3 0.3 0 0.1 0.3 0.3

c4 0.3 0.1 0.2 0.3 0.1

c5 0 0.1 0.3 0 0.6

Customer preferences

C1- (a1, h1): 0.80+0.20.2+0.40.5+0.60.1+0.40.2 =0.38(a1, h2): 0.80+0.20.2+0.40.5+0.60.1+0.60.2 =0.42 …

C2- (a1, h1): 0.80.1+0.20.3+0.40.1+0.60.3+0.40.2 =0.44(a1, h2): 0.80.1+0.20.3+0.40.1+0.60.3+0.60.2 =0.48 …

Customer Fare Food Location Comfort Cleanness Top-2 favorites

c1 0 0.2 0.5 0.1 0.2 {(a3, h6), (a5, h6)}

c2 0.1 0.3 0.1 0.3 0.2 {(a3, h2), (a3, h5)}

c3 0.3 0 0.1 0.3 0.3 {(a1, h2), (a1, h5)}

c4 0.3 0.1 0.2 0.3 0.1{(a1, h5), (a2, h5), (a3,

c5 0 0.1 0.3 0 0.6 {(a3, h6), (a4, h6)} 32

Top-k Queries (Customer’s View)

(a1, h1) 0.8 0.2 0.4 0.6 0.4

(a1, h2) 0.8 0.2 0.4 0.6 0.6

(a1, h3) 0.8 0.2 0.4 0.8 0.2

(a5, h5) 0.4 0.6 0.6 0.8 0.4

(a5, h6) 0.4 0.6 1 0.2 0.6

Customer preferencesCustomer Fare Food Location Comfort Cleanness Top-2 favorites

c1 0 0.2 0.5 0.1 0.2 {(a3, h6), (a5, h6)}

c2 0.1 0.3 0.1 0.3 0.2 {(a3, h2), (a3, h5)}

c3 0.3 0 0.1 0.3 0.3 {(a1, h2), (a1, h5)}

c4 0.3 0.1 0.2 0.3 0.1{(a1, h5), (a2, h5), (a3,

c5 0 0.1 0.3 0 0.6 {(a3, h6), (a4, h6)}

Retrieve the customers whose top-2 favorites contain (a1, h2)

#customers in the reverse top-k query for a product is a good estimate of the favoring degree of the product in the market

Reverse Top-k Queries (Travel Agency’s View)

(a1, h1) 0.8 0.2 0.4 0.6 0.4

(a1, h2) 0.8 0.2 0.4 0.6 0.6

(a1, h5) 0.8 0.2 0.6 0.8 0.4

(a3, h6) 0.4 1 1 0.2 0.6

(a5, h6) 0.4 0.6 1 0.2 0.6

Customer preferencesCustomer Fare Food Location Comfort Cleanness Top-2 favorites

c1 0 0.2 0.5 0.1 0.2 {(a3, h6), (a5, h6)}

c2 0.1 0.3 0.1 0.3 0.2 {(a3, h2), (a3, h5)}

c3 0.3 0 0.1 0.3 0.3 {(a1, h2), (a1, h5)}

c4 0.3 0.1 0.2 0.3 0.1{(a1, h5), (a2, h5), (a3,

c5 0 0.1 0.3 0 0.6 {(a3, h6), (a4, h6)}

(a1, h2): {c3}(a1, h5): {c3, c4}(a2, h5): {c4}(a3, h2): {c2}(a3, h5): {c2, c4}(a3, h6): {c1, c5}(a4, h6): {c5}(a5, h6): {c1}

k (#packages considered by customers) = 2

(a1, h2): {c3}(a1, h5): {c3, c4}(a2, h5): {c4}(a3, h2): {c2}(a3, h5): {c2, c4}(a3, h6): {c1, c5}(a4, h6): {c5}(a5, h6): {c1}

n (#packages to be offered by the travel agency) = 2

• Given a set of component tables T1, T2, …, and Tx, which form a set of the candidate products P, a set of customers C with different preferences on the products, and two positive integers k and n

• RTOPk(cp, P, C): the set of the customers whose top-k favorites contain the candidate product cp

• Retrieve the minimum subset P’ of P such that |P’| n and is maximized

• Maximum coverage problem: NP-hard

kcp PRTOP cp P C

Problem Definition of n-k MFP

• An object p is said to dominate another object q if and only if p is larger than or equal to q on all dimensions and p is larger than q on at least one dimension

• Given a set of multi-dimensional objects, the skyline consists of the objects which are not dominated by any other object

Skyline

• Only the component tuples dominated by at most (k-1) other tuples in the same component table have the possibility of being a part of a top-k product for a customer c

Airline Fare Food

a3 0.4 1

a4 0.4 0.8

a5 0.4 0.6

AirlinesHotel Location Comfort Cleanness

h1 0.4 0.6 0.4

Hotels

(a3, h1) 0.4 1 0.4 0.6 0.4

(a4, h1) 0.4 0.8 0.4 0.6 0.4

(a5, h1) 0.4 0.6 0.4 0.6 0.4

Airline Fare Food

a1(0) 0.8 0.2

a2(0) 0.6 0.4

a3(0) 0.4 1

a4(1) 0.4 0.8

a5(2) 0.4 0.6

h1(2) 0.4 0.6 0.4

h2(0) 0.4 0.6 0.6

h3(1) 0.4 0.8 0.2

h4(1) 0.6 0.6 0.2

h5(0) 0.6 0.8 0.4

h6(0) 1 0.2 0.6

Airline Fare Food

a1(0) 0.8 0.2

a2(0) 0.6 0.4

a3(0) 0.4 1

a4(1) 0.4 0.8

a5(2) 0.4 0.6

h1(2) 0.4 0.6 0.4

h2(0) 0.4 0.6 0.6

h3(1) 0.4 0.8 0.2

h4(1) 0.6 0.6 0.2

h5(0) 0.6 0.8 0.4

h6(0) 1 0.2 0.6

Airlines HotelsAirline Fare Food

a1(0) 0.8 0.2

a2(0) 0.6 0.4

a3(0) 0.4 1

a4(1) 0.4 0.8

h2(0) 0.4 0.6 0.6

h3(1) 0.4 0.8 0.2

h4(1) 0.6 0.6 0.2

h5(0) 0.6 0.8 0.4

h6(0) 1 0.2 0.6

• For any two candidate products cp1 and cp2 in P, if cp1 dominates cp2, RTOPk(cp2, P, C) RTOPk(cp1, P, C)

• For any candidate product cp in P, if cp Skyline(P), cp n-k MFP

The candidate products in the n-k MFP must be in Skyline(P)

• : the set of candidate products generated from Skyline(T1), Skyline(T2), …, and Skyline(Tx)

• A candidate product cp Skyline(P) if and only if cp [VLDB’09]• Only the skyline tuples of each component table have the possibility

of being a part of a candidate product in the n-k MFP

Airlines HotelsAirline Fare Food

a1(0) 0.8 0.2

a2(0) 0.6 0.4

a3(0) 0.4 1

a4(1) 0.4 0.8

h2(0) 0.4 0.6 0.6

h3(1) 0.4 0.8 0.2

h4(1) 0.6 0.6 0.2

h5(0) 0.6 0.8 0.4

h6(0) 1 0.2 0.6

• Only the customers in RTOPk(cp, Skyline(P), C) possibly become the members in RTOPk(cp, P, C)

Package Upper bound

(a1, h2) {c3}

(a1, h5) {c3, c4}

(a1, h6) {}

(a2, h2) {}

(a2, h5) {c4}

(a2, h6) {c1, c5}

(a3, h2) {c2}

(a3, h5) {c2, c4}

(a3, h6) {c1, c5}

The upper bounds of the remaining candidate packages

RTOPk(cp, Skyline(P), C) is an upper bound of RTOPk(cp, P, C)

Package Upper bound

(a1, h2) {c3}

(a1, h5) {c3, c4}

(a2, h5) {c4}

(a2, h6) {c1, c5}

(a3, h2) {c2}

(a3, h5) {c2, c4}

(a3, h6) {c1, c5}

The top-2 favorites of C3: {(a1, h5), (a1, h2)}The top-2 favorites of C4: {(a1, h5), (a2, h5), (a3, h5)}

P’ : {(a1, h5)}

Package Upper bound

(a2, h6) {c1, c5}

(a3, h2) {c2}

(a3, h5) {c2}

(a3, h6) {c1, c5}

The top-2 favorites of C1: {(a3, h6), (a4, h6)}The top-2 favorites of C5: {(a3, h6), (a4, h6)}

P’ : {(a1, h5), (a3, h6)}P’ : {(a1, h5)}P’ : {(a1, h5)}P’ : {(a1, h5)}P’ : {(a1, h5)}P’ : {(a1, h5)}

Application IV

: user preferences

: products

Mileage

• Find Most Favorite Products by Top-k Reverse Skyline Queries

Thank you for your attention!

on top-n reverse top-k queries: variants, algorithms, and applications

Documents

grade 3.5 aberystwyth variants - tpcfassets

taller de queries / josefina valenciano

efficiently answering top-k typicality queries on large...

supporting visual queries

sturngeschutz its variants

html5 media queries

nested queries lecture

class queries

title on the role of equivalence queries in learning...

adhoc queries jaarafsluiting - aangevuld

the spatial skyline queries

more sql: complex queries - texas southern...

css3 media queries

effective hive queries

réservoirs cellulaires et variants minoritaires

chương 3: queries

xla - gl queries

advanced wordpress queries

manual creacion queries sap (forosap)

manual creacion-queries-sap