ranged queries using bloom filters final

19
Range Queries using Bloom Filters Basim Baig, Hau Chan, Samuel McCauley, Alice Wong Computer Science Department, Stony Brook University

Upload: alice-qing-wong

Post on 05-Sep-2014

116 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Ranged Queries Using Bloom Filters Final

Range Queries using Bloom Filters

Basim Baig, Hau Chan, Samuel McCauley, Alice WongComputer Science Department, Stony Brook University

Page 2: Ranged Queries Using Bloom Filters Final

Bloom Filter (Review)• An efficient data structure to represent a set S

(subset of U) to efficiently answer membership queries such that– Given: if x in U– Output: return No => if x not in S return Yes => if x in S (with prob. >= (1-ε)) (false positive)

Page 3: Ranged Queries Using Bloom Filters Final

Goal• Let S be a subset of a universe U (containing

strings) that supports the following operations:– Insert(x): S <- S U {x}– Query(x, y): Is there a string between x and y? • return No => if nothing between x and y• return Yes => if there is a string between x and y with

false positive probability (small)

Page 4: Ranged Queries Using Bloom Filters Final

Our Result• C*nk space • O(k) time for range queries/inserts• An Optimized version that reduces the

space to C*nk/log(k) while retaining the same query time

Page 5: Ranged Queries Using Bloom Filters Final

Idea• Let S = B be our bloom filter structure. • For each insert(K), K in U, we insert each

substring/prefix of K[0, pi], i=1,…, |K|/p, we insert it into B. (We assume that |K| is divisible by p.)

Page 6: Ranged Queries Using Bloom Filters Final
Page 7: Ranged Queries Using Bloom Filters Final

Algorithm for Range Queries

• We are assuming that all the strings in U is maximum length K.

• For query between X and Y, query(X, Y), if X and Y are uneven length, we can pad “wildcard” characters to the shorter one.

• Procedures:1. Check pi > K => return yes2. For any substring x in between X[0,pi] and Y[0,pi] (inclusively)

1. if bloom filter query(x) returns true for more than one children => return yes

2. if bloom filter query(x) returns true for only the left most=> then increment and repeat

3. If every query(x) return no => then return no

Page 8: Ranged Queries Using Bloom Filters Final
Page 9: Ranged Queries Using Bloom Filters Final
Page 10: Ranged Queries Using Bloom Filters Final

Space Analysis• The size of the structure would be the number of

inserted strings, say N, times the number of inserts requires to insert the string with the maximum length. Suppose the maximum length of the string inserted is K, then we insert K/p times for this particular string. We need at most O(NK/p) inserts to the bloom filter.

Space:

Page 11: Ranged Queries Using Bloom Filters Final

Query Analysis• Since bloom filter has look up time of O(1), we

need to look up at most all the brute force elements at each level

• Hence, the range query time of our structure is:

Page 12: Ranged Queries Using Bloom Filters Final

Error analysis:• You can set the appropriate value of error that

you desire.• But less error means you need more space.• However this does not impact the space as

much as you would think.• The dominant factor in the space is still the

k/p factor outside the log.

Page 13: Ranged Queries Using Bloom Filters Final

Optimization

• Instead of brute forcing down each of the paths that matches the input range strings we will just brute force selected nodes.

Page 14: Ranged Queries Using Bloom Filters Final

Modified Bloom filter

Page 15: Ranged Queries Using Bloom Filters Final

Modified Query Algorithm

Ex 2: [cbcaa-cbcc]LCA = <cbc>

Page 16: Ranged Queries Using Bloom Filters Final
Page 17: Ranged Queries Using Bloom Filters Final

Modified costs

Query cost:

Space cost:

Page 18: Ranged Queries Using Bloom Filters Final

How to get the labels• Have four bloom filters labeled left, right,

middle and both.• Preprocessing allows you to put all the nodes

in the appropriate bloom filters.• But a downside is this makes the structure

very static.• In the worst case, you need to revert to

unmodified algorithm.

Page 19: Ranged Queries Using Bloom Filters Final

Thank you