指導教授:陳良弼 老師 報告者:鄧雅文 97753034. introduction related work problem...

15
Top-k Queries on Uncertain Data 指指指指 指指指 指指 指指指 指指指 97753034

Upload: anissa-mole

Post on 14-Dec-2015

242 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: 指導教授:陳良弼 老師 報告者:鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work

Top-k Queries on Uncertain Data

指導教授:陳良弼 老師報告者:鄧雅文 97753034

Page 2: 指導教授:陳良弼 老師 報告者:鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work

Introduction Related Work Problem Formulation Future Work

Outline

Page 3: 指導教授:陳良弼 老師 報告者:鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work

Top-k query on certain data◦ Rank results according to a user-defined score◦ Important for explore large databases◦ E.g., top-2 = {T1, T2}

Introduction

TID PID Score

T1 A 100

T2 B 90

T3 C 80

T4 D 70

Page 4: 指導教授:陳良弼 老師 報告者:鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work

Uncertain database◦ How to define top-k on uncertain data?◦ Mutually exclusive rules

E.g., T1♁T4

Introduction (cont.)

TID PID Score Pr.

T1 A 100 0.2

T2 B 90 0.9

T3 C 80 0.6

T4 A 70 0.8

… … … …

Page 5: 指導教授:陳良弼 老師 報告者:鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work

C. C. Aggarwal and P. S. Yu. A Survey of Uncertain Data Algorithms and Applications. In TKDE, 2009.

◦ Causes: Sensor networks, privacy, trajectories prediction…

◦ The main areas of research on the uncertain data: Modeling of uncertain data Uncertain data management

Top-k query, range query, NN query… Uncertain data mining

Clustering, classification, frequent pattern, outliers…

Related Work

Page 6: 指導教授:陳良弼 老師 報告者:鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work

M. Soliman, I. Ilyas, and K. Chang. Top-k Query Processing in Uncertain Databases. In ICDE, 2007.

◦ Possible Worlds

Related Work (cont.)

Page 7: 指導教授:陳良弼 老師 報告者:鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work

◦ U-Topk query Return k tuples that can

co-exist in a possible worldwith the highest probability

E.g., {T1, T2} as U-Top2

◦ U-kRanks query Return k tuples each of which

is a clear winner in its rankover all possible worlds

E.g., {T2, T6} as U-2Ranks

Related Work (cont.)

Page 8: 指導教授:陳良弼 老師 報告者:鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work

M. Hua, J. Pei, W. Zhang, X. Lin. Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach. In SIGMOD, 2008.

◦ PT-k query Return a set of all tuples

whose top-k probabilityvalues are at least p

E.g., {T1, T2, T5} as PT-2

(with p=0.4)

Related Work (cont.)

Page 9: 指導教授:陳良弼 老師 報告者:鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work

T. Ge, S. Zdonik, and S. Madden. Top-k Queries on Uncertain Data: On Score Distribution and Typical Answers. In SIGMOD, 2009.

◦ The tradeoff between reporting high-scoring tuples and tuples with a high probability of being in the top-k

◦ Return a number of typical vectors that efficiently sample the distribution of all potential top-k tuple vectors

Related Work (cont.)

Page 10: 指導教授:陳良弼 老師 報告者:鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work

Example:◦ In an International Tenpin Bowling Championship,

the events include single, double, and trio. Due to the budget, the coach can only choose 3 players to attend. Therefore, we hope these 3 players can have relatively high probability to perform well over these 3 types of events.

Problem Formulation

Page 11: 指導教授:陳良弼 老師 報告者:鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work

◦ U-Top3={T2, T5, T6}

◦ But U-Top2={T1, T2}, U-Top1={T1}

◦ How about also considering {T1, T2, T5} as top-3?

Problem Formulation (cont.)

TID Player Pr.

T1 A 0.4100

T2 D 0.6200

T3 B 0.1400

T4 C 0.3400

T5 C 0.6600

T6 B 0.8600

T7 D 0.3800

T8 A 0.5900

Possible World Pr. Possible World Pr.

PW1 T1, T2, T3, T4 0.0121 PW9 T2, T3, T4, T8 0.0174

PW2 T1, T2, T3, T5 0.0235 PW10 T2, T3, T5, T8 0.0338

PW3 T1, T2, T4, T6 0.0743 PW11 T2, T4, T6, T8 0.1070

PW4 T1, T2, T5, T6 0.1443 PW12 T2, T5, T6, T8 0.2076

PW5 T1, T3, T4, T7 0.0074 PW13 T3, T4, T7, T8 0.0107

PW6 T1, T3, T5, T7 0.0144 PW14 T3, T5, T7, T8 0.0207

PW7 T1, T4, T6, T7 0.0456 PW15 T4, T6, T7, T8 0.0656

PW8 T1, T5, T6, T7 0.0884 PW16 T5, T6, T7, T8 0.1273

Page 12: 指導教授:陳良弼 老師 報告者:鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work

We choose the answers of a top-k query not only depending on the probability (P) but also on the confidence (C).◦ Confidence: to express the top-(k-1) probabilities of

the sets formed by k-1 tuples of this possible top-k answer E.g., k=3

{T1, T2, T3} as a possible top-k with P=0.0356C is composed in some way of  Pr({T1, T2}) to be top-2=0.2542 and its confidence,  Pr({T1, T3}) to be top-2=0.0218 and its confidence,  Pr({T2, T3}) to be top-2=0.0512 and its confidence

Problem Formulation (cont.)

Page 13: 指導教授:陳良弼 老師 報告者:鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work

Since every possible top-k answer has two features—probability (P) and confidence (C), we only return those non-dominated ones as a result set.◦ E.g.,

  {T1, T3, T5}: P=0.8, C=0.4

  {T1, T4, T7}: P=0.5, C=0.7

  {T2, T6, T7}: P=0.3, C=0.2 this will not be returned

Problem Formulation (cont.)

Page 14: 指導教授:陳良弼 老師 報告者:鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work

Formulate the confidence function Find an algorithm to generate the result set Try to calculate the confidence in an

efficient way Carry out an empirical study on datasets

Future Work

Page 15: 指導教授:陳良弼 老師 報告者:鄧雅文 97753034.  Introduction  Related Work  Problem Formulation  Future Work

Thank you!