page 1 online aggregation for large mapreduce jobs niketan pansare, vinayak borkar, chris jermaine,...

Online Aggregation for Large MapReduce Jobs

Niketan Pansare, Vinayak Borkar, Chris Jermaine, Tyson Condie

VLDB 2011

IDS Fall Seminar2011. 11. 11.

Presented by Yang Byoung Ju

Online Aggregation (OLA)

▶ select avg(stock_price) from nasdaq_db where company = 'xyz';

▶ Conventional DB:

▶ With OLA Extension: [0, 2000] with 95% probabil-ity

After 1 seconds




▶ With OLA Extension: [900, 1100] with 95% proba-bility

After 2 minutes




▶ With OLA Extension: [995, 1005] with 95% proba-bility

After 10 minutes



▶ Conventional DB: 1000

▶ With OLA Extension: 1000

After 2 hours


▶ User gets estimates of an aggregate query

▶ At all times during the query processing, a database sys-tem gives user a statistically valid estimate for the final answer

(Ex. Output range estimate: [990, 1010] with 95% probabil-ity)

▶ Advantages Can get reasonable answer very quickly (depends of application) Can save time and computing resourse

▶ Distavantages Implementation requires changes to the database ker-

nel In a self-managed system, decreased resource cost

may not benefit the user directly

Why ‘Online Aggregation’?

▶ OLA was proposed in 1997, but its commercial impact has been limited or even non-existent due to two reasons OLA require extensive changes to the database kernel Saving resources has never been compelling

▶ Why OLA now? People are implementing all sorts of new databases

thesedays Given the current move into the cloud, as a query runs,

dollars flow from the end-user’s pocket to the cloud

OLA in a distributed environment

▶ Classic OLA Set of data(tuples) at any point in the computation is a

random subset of the data in the system Easy to estimate the final answer using statistics

method

▶ OLA for Large-scale The basic unit of data that is processed is a block (Ex.

64MB) A lot of variation in the time taken to process each

block This variation in processing time is tremendously im-

portant, if it is correlated with the aggregate value of the block

OLA in a distributed environment

▶ OLA for Large-scale (Cond.) Blocks with a lot of data may have greater aggregate

value, and takes longer to process So, the set of blocks completed at any particular point

are more likely to have small values, leading to biased estimates-> “Inspection Paradox”

This paper solved the ‘inspection paradox’ problem, consequently making OLA possible

in a distributed environment

Inspection Paradox

▶ In a renewal process, if we wait some predetermined time t and then observe how large the renewal interval contain-ing t is, we should expect it to be typically larger than a renewal interval of average size.

Inspection Paradox

▶ Explanation #1 If we randomly shot arrows to the target below, there

would be more arrows on larger target

Inspection Paradox

▶ Explanation #2 There are buses that has an average interval as 10

minutes. How long you wait, when you get to the busstop ran-domly? 5 minutes?

Yes. If bus arrives every 10 minutes

What if arrival intervals are not uniform(random)?Ex. 5min, 15min, 5min, 15min (average 10min)

Waiting time: 1/4 X 2.5min + 3/4 X 7.5 min = 6.25 min

10 min 20 min 30 min 40 min


Inspection Paradox

▶ Explanation #2 (Cond.) Waiting time – Area of the triangle is the waiting time

Different even if their avg. interval is same

In the latter case, if the inspector sit down at the busstop all day and average intervals of all buses, he can get 10 minutes

But, if the inspector get to the busstop at particular point and estimates avg. interval based on his waiting time(6.25 min), he will get 12.5 minutes

“Inspection Paradox”


Inspection Paradox

▶ If someone tries to get information from randomly inter-valed data at a particular point, he will be at the larger in-terval, consequently he will get biased(wrong) estimation

▶ Explanation #3 On a machine of the distributed system, block process-

ing time will be different depending on its data, even if every block’s size is same

If we take snapshot at a particular point to get an esti-mation, it will be the time that larger block is being pro-cessed.

It means that we just get the information of the smaller blocks which contain less information while we cannot include the information of a larger block to the estima-tion.

completed

Block 1Block

2Block 3 Block 4

processing waiting

snapshot

Inspection Paradox

▶ Let’s make ‘inspection paradox’ go away Take 3 parameters of the block for estimation

- x : aggregate value of the block- tsch : waiting time of the block to be scheduled

- tproc : processing time of the block

tsch and tproc will allow us to make predictions about the x value that we have not seen.

For example, if we have a particular block that has been processed for 125 seconds (not completed yet), where it took 5 seconds to be scheduled, we can cor-rectly view x as a random sample from the distribution,

f( x | tsch = 5, tproc >= 125)

Implementation

▶ Implemented OLA mode in Hyracks

▶ Hyracks Open source project that supports Map and Reduce op-

eration Relational operations such as selection, projection, and

join Architecture is similar to Hadoop

▶ Modification of the Hyracks Logical block queue to make their order statistically ran-

dom Estimator in the reduce task during the shuffle phase

- Completed map tasks are gathered in the shuffle phase

- The estimator receives aggregate value (x) and meta-data (tsch and tproc)

Estimation

▶ Bayesian approach is applied for estimation Z is randomly sampled from blocks Z produces observed data, X and hidden data, Y Θ includes any data that is unobserved Process below is repeated to get an estimation

Experiments

▶ 6 months of data from Wikipedia page traffic data Counting the # of page per language 220GB, 3960 blocks On 11 nodes (1 master, 10 slaves) 80 mappers and 10 reducers Took 46 minutes to run to completion

▶ Experimented on 3 different versions w/ random block order, w/ correlation (inspection para-

dox) w/o random block order, w/ correlation (inspection para-

dox) w/ random block order, w/o correlation (inspection para-

dox)

Experiments

(a) Posterior query result distribution for number of English language page at various time,

using both randomized and arbitrary block ordering (actual result: black verti-cal line)

(b) Posterior query result distribution for number of English language page at various time,

taking into account and ignoring correlation between aggregate value and pro-cessing time

Conclusion

▶ The authors proposed a system model that is appropriate for OLA over MapReduce in a large-scale, distributed envi-ronment

▶ The model accounts for biases that can arise when esti-mating aggregates in a cluster environment

(deals with ‘inspection paradox’)

▶ This model allows us to export “early returns” of query ag-gregates that are statistically robust

Q & A

Thank you

page 1 online aggregation for large mapreduce jobs niketan pansare, vinayak borkar, chris jermaine,...

Documents